Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend
@ 2024-05-30 19:43 averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory averne
                   ` (15 more replies)
  0 siblings, 16 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel

Hi all,

This patch series implements a hardware decoding backend for nvidia
Tegra devices, notably the Nintendo Switch. It was primarily written
for HorizonOS (Nintendo Switch OS), but also supports nvidia's
Linux4Tegra distro. As for hardware, all Tegras later than the X1
(T210) should be supported, although the patch does not implement
features that were added to subsequent revisions of multimedia
engines (eg. 12-bit HEVC). However, since I only own T210 devices
(Switch and jetson nano), I was not able to verify this.

The backend is essentially a userspace NVDEC driver, as due to the
OS design of the Switch, we cannot link to nvidia's system libraries.
It notably uses (sparse) hardware documentation released by nvidia
here: https://github.com/NVIDIA/open-gpu-doc/tree/master/classes/video.
It supports all codecs available in hardware (MPEG1/2/4, VC1, H264,
HEVC, VP8, VP9 and JPEG), with dynamic frequency scaling, and
hardware-accelerated frame transfer.

At the moment I'm submitting the series with some nvidia headers
pulled from various sources, but I do think they should rather be
put in nv-codec-headers, let me know.

The code was tested for memory bugs and leaks with valgrind and
asan on L4T. Some quick performance testing (decoding with -f null -)
showed results in line with official software, tested against the 
nvv4l2 backend that was posted here a while ago:
https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2020-June/263759.html.
Note that the numbers are skewed because frame transfer cannot be 
disabled in nvidia's backend.
- HEVC Main 10 @ 4k   (~80Mbps): nvtegra  79fps, nvv4l2  66fps
- HEVC Main 10 @ 1080p (~5Mbps): nvtegra 402fps, nvv4l2 229fps
- H264         @ 1080p (~3Mbps): nvtegra 286fps, nvv4l2 260fps

Several homebrew applications have been using this backend for some
time, with no bugs reported. As far as I'm aware, this is the
complete list of them:
- NXMP, a media player based on mpv: https://github.com/proconsule/nxmp
- WiliWili, a bilibili client: https://github.com/xfangfang/wiliwili
- Switchfin, a Jellyfin client: https://github.com/dragonflylee/switchfin
- Moonlight-Switch, a Moonlight client: https://github.com/XITRIX/Moonlight-Switch
- chiaki: https://git.sr.ht/~kkwong/chiaki/
- My own media player, unreleased at this time

Nintendo Switch support assumes a working devkitA64 homebrew
environment, instructions regarding setup can be found here:
https://devkitpro.org/wiki/devkitPro_pacman. The hwaccel can then be
configured by eg.:
```
source /opt/devkitpro/switchvars.sh && ./configure
--cross-prefix=aarch64-none-elf- --enable-cross-compile --arch=aarch64
--cpu=cortex-a57 --target-os=horizon --enable-pic --enable-gpl
--enable-nvtegra
```

It should probably be noted that NVDEC usage on discrete gpus is
very similar. As far as I know, the main difference is that the 
interfacing is done through the GPFIFO block (same block that 
manages the 3D engine), instead of host1x.

Thank you for your consideration.


averne (16):
  avutil/buffer: add helper to allocate aligned memory
  configure,avutil: add support for HorizonOS
  avutil: add ioctl definitions for tegra devices
  avutil: add hardware definitions for NVDEC, NVJPG and VIC
  avutil: add common code for nvtegra
  avutil: add nvtegra hwcontext
  hwcontext_nvtegra: add dynamic frequency scaling routines
  nvtegra: add common hardware decoding code
  nvtegra: add mpeg1/2 hardware decoding
  nvtegra: add mpeg4 hardware decoding
  nvtegra: add vc1 hardware decoding
  nvtegra: add h264 hardware decoding
  nvtegra: add hevc hardware decoding
  nvtegra: add vp8 hardware decoding
  nvtegra: add vp9 hardware decoding
  nvtegra: add mjpeg hardware decoding

 configure                      |   30 +
 libavcodec/Makefile            |   11 +
 libavcodec/h263dec.c           |    6 +
 libavcodec/h264_slice.c        |    6 +-
 libavcodec/h264dec.c           |    3 +
 libavcodec/hevcdec.c           |   17 +-
 libavcodec/hevcdec.h           |    2 +
 libavcodec/hwaccels.h          |   10 +
 libavcodec/hwconfig.h          |    2 +
 libavcodec/mjpegdec.c          |    6 +
 libavcodec/mpeg12dec.c         |   12 +
 libavcodec/mpeg4videodec.c     |    3 +
 libavcodec/nvtegra_decode.c    |  517 +++++++++
 libavcodec/nvtegra_decode.h    |   94 ++
 libavcodec/nvtegra_h264.c      |  506 +++++++++
 libavcodec/nvtegra_hevc.c      |  633 +++++++++++
 libavcodec/nvtegra_mjpeg.c     |  336 ++++++
 libavcodec/nvtegra_mpeg12.c    |  319 ++++++
 libavcodec/nvtegra_mpeg4.c     |  344 ++++++
 libavcodec/nvtegra_vc1.c       |  455 ++++++++
 libavcodec/nvtegra_vp8.c       |  334 ++++++
 libavcodec/nvtegra_vp9.c       |  665 ++++++++++++
 libavcodec/vc1dec.c            |    9 +
 libavcodec/vp8.c               |    6 +
 libavcodec/vp9.c               |   10 +-
 libavutil/Makefile             |    9 +
 libavutil/buffer.c             |   31 +
 libavutil/buffer.h             |    7 +
 libavutil/clb0b6.h             |  303 ++++++
 libavutil/clc5b0.h             |  436 ++++++++
 libavutil/cle7d0.h             |  129 +++
 libavutil/cpu.c                |    7 +
 libavutil/hwcontext.c          |    4 +
 libavutil/hwcontext.h          |    1 +
 libavutil/hwcontext_internal.h |    1 +
 libavutil/hwcontext_nvtegra.c  | 1045 ++++++++++++++++++
 libavutil/hwcontext_nvtegra.h  |   92 ++
 libavutil/nvdec_drv.h          | 1858 ++++++++++++++++++++++++++++++++
 libavutil/nvhost_ioctl.h       |  511 +++++++++
 libavutil/nvjpg_drv.h          |  189 ++++
 libavutil/nvmap_ioctl.h        |  451 ++++++++
 libavutil/nvtegra.c            | 1035 ++++++++++++++++++
 libavutil/nvtegra.h            |  258 +++++
 libavutil/nvtegra_host1x.h     |   94 ++
 libavutil/pixdesc.c            |    4 +
 libavutil/pixfmt.h             |    8 +
 libavutil/vic_drv.h            |  279 +++++
 47 files changed, 11085 insertions(+), 3 deletions(-)
 create mode 100644 libavcodec/nvtegra_decode.c
 create mode 100644 libavcodec/nvtegra_decode.h
 create mode 100644 libavcodec/nvtegra_h264.c
 create mode 100644 libavcodec/nvtegra_hevc.c
 create mode 100644 libavcodec/nvtegra_mjpeg.c
 create mode 100644 libavcodec/nvtegra_mpeg12.c
 create mode 100644 libavcodec/nvtegra_mpeg4.c
 create mode 100644 libavcodec/nvtegra_vc1.c
 create mode 100644 libavcodec/nvtegra_vp8.c
 create mode 100644 libavcodec/nvtegra_vp9.c
 create mode 100644 libavutil/clb0b6.h
 create mode 100644 libavutil/clc5b0.h
 create mode 100644 libavutil/cle7d0.h
 create mode 100644 libavutil/hwcontext_nvtegra.c
 create mode 100644 libavutil/hwcontext_nvtegra.h
 create mode 100644 libavutil/nvdec_drv.h
 create mode 100644 libavutil/nvhost_ioctl.h
 create mode 100644 libavutil/nvjpg_drv.h
 create mode 100644 libavutil/nvmap_ioctl.h
 create mode 100644 libavutil/nvtegra.c
 create mode 100644 libavutil/nvtegra.h
 create mode 100644 libavutil/nvtegra_host1x.h
 create mode 100644 libavutil/vic_drv.h

--
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 20:38   ` Rémi Denis-Courmont
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 02/16] configure, avutil: add support for HorizonOS averne
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

This is useful eg. for memory-mapped buffers that need page-aligned memory, when dealing with hardware devices

Signed-off-by: averne <averne381@gmail.com>
---
 libavutil/buffer.c | 31 +++++++++++++++++++++++++++++++
 libavutil/buffer.h |  7 +++++++
 2 files changed, 38 insertions(+)

diff --git a/libavutil/buffer.c b/libavutil/buffer.c
index e4562a79b1..b8e357f540 100644
--- a/libavutil/buffer.c
+++ b/libavutil/buffer.c
@@ -16,9 +16,14 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+#include "config.h"
+
 #include <stdatomic.h>
 #include <stdint.h>
 #include <string.h>
+#if HAVE_MALLOC_H
+#include <malloc.h>
+#endif
 
 #include "avassert.h"
 #include "buffer_internal.h"
@@ -100,6 +105,32 @@ AVBufferRef *av_buffer_allocz(size_t size)
     return ret;
 }
 
+AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align)
+{
+    AVBufferRef *ret = NULL;
+    uint8_t    *data = NULL;
+
+#if HAVE_POSIX_MEMALIGN
+    if (posix_memalign((void **)&data, align, size))
+        return NULL;
+#elif HAVE_ALIGNED_MALLOC
+    data = aligned_alloc(align, size);
+#elif HAVE_MEMALIGN
+    data = memalign(align, size);
+#else
+    return NULL;
+#endif
+
+    if (!data)
+        return NULL;
+
+    ret = av_buffer_create(data, size, av_buffer_default_free, NULL, 0);
+    if (!ret)
+        av_freep(&data);
+
+    return ret;
+}
+
 AVBufferRef *av_buffer_ref(const AVBufferRef *buf)
 {
     AVBufferRef *ret = av_mallocz(sizeof(*ret));
diff --git a/libavutil/buffer.h b/libavutil/buffer.h
index e1ef5b7f07..8422ec3453 100644
--- a/libavutil/buffer.h
+++ b/libavutil/buffer.h
@@ -107,6 +107,13 @@ AVBufferRef *av_buffer_alloc(size_t size);
  */
 AVBufferRef *av_buffer_allocz(size_t size);
 
+/**
+ * Allocate an AVBuffer of the given size and alignment.
+ *
+ * @return an AVBufferRef of given size or NULL when out of memory
+ */
+AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align);
+
 /**
  * Always treat the buffer as read-only, even when it has only one
  * reference.
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 02/16] configure, avutil: add support for HorizonOS
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 20:37   ` Rémi Denis-Courmont
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices averne
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

HorizonOS (HOS) is the operating system of the Nintendo Switch.
This patch enables integration with the homebrew toolchain developped by the devkitPro team. Its two main components are devkitA64 (common toolchain for aarch64 targets) and libnx (library implementing interaction with the HOS kernel and system daemons, termed sysmodules).

Signed-off-by: averne <averne381@gmail.com>
---
 configure       | 8 ++++++++
 libavutil/cpu.c | 7 +++++++
 2 files changed, 15 insertions(+)

diff --git a/configure b/configure
index 96b181fd21..09fb2aed1b 100755
--- a/configure
+++ b/configure
@@ -5967,6 +5967,10 @@ case $target_os in
         ;;
     minix)
         ;;
+    horizon)
+        enable section_data_rel_ro
+        add_extralibs -lnx
+        ;;
     none)
         ;;
     *)
@@ -7710,6 +7714,10 @@ haiku)
         disable memalign
     fi
     ;;
+horizon)
+    disable sysctl
+    disable sysctlbyname
+    ;;
 esac
 
 flatten_extralibs(){
diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index 9ac2f01c20..6a77df5e34 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -48,6 +48,9 @@
 #if HAVE_UNISTD_H
 #include <unistd.h>
 #endif
+#ifdef __SWITCH__
+#include <switch.h>
+#endif
 
 static atomic_int cpu_flags = -1;
 static atomic_int cpu_count = -1;
@@ -247,6 +250,10 @@ int av_cpu_count(void)
 #elif HAVE_WINRT
     GetNativeSystemInfo(&sysinfo);
     nb_cpus = sysinfo.dwNumberOfProcessors;
+#elif defined(__SWITCH__)
+    u64 core_mask = 0;
+    Result rc = svcGetInfo(&core_mask, InfoType_CoreMask, CUR_PROCESS_HANDLE, 0);
+    nb_cpus = R_SUCCEEDED(rc) ? av_popcount64(core_mask) : 3;
 #endif
 
     if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 02/16] configure, avutil: add support for HorizonOS averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 20:42   ` Rémi Denis-Courmont
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 04/16] avutil: add hardware definitions for NVDEC, NVJPG and VIC averne
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

These files are taken with minimal modifications from nvidia's Linux4Tegra (L4T) tree.
nvmap enables management of memory-mapped buffers for hardware devices.
nvhost enables interaction with different hardware modules (multimedia engines, display engine, ...), through a common block, host1x.

Signed-off-by: averne <averne381@gmail.com>
---
 libavutil/Makefile       |   2 +
 libavutil/nvhost_ioctl.h | 511 +++++++++++++++++++++++++++++++++++++++
 libavutil/nvmap_ioctl.h  | 451 ++++++++++++++++++++++++++++++++++
 3 files changed, 964 insertions(+)
 create mode 100644 libavutil/nvhost_ioctl.h
 create mode 100644 libavutil/nvmap_ioctl.h

diff --git a/libavutil/Makefile b/libavutil/Makefile
index 6e6fa8d800..9c112bc58a 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -52,6 +52,8 @@ HEADERS = adler32.h                                                     \
           hwcontext_videotoolbox.h                                      \
           hwcontext_vdpau.h                                             \
           hwcontext_vulkan.h                                            \
+          nvhost_ioctl.h                                                \
+          nvmap_ioctl.h                                                 \
           iamf.h                                                        \
           imgutils.h                                                    \
           intfloat.h                                                    \
diff --git a/libavutil/nvhost_ioctl.h b/libavutil/nvhost_ioctl.h
new file mode 100644
index 0000000000..b0bf3e3ae6
--- /dev/null
+++ b/libavutil/nvhost_ioctl.h
@@ -0,0 +1,511 @@
+/*
+ * include/uapi/linux/nvhost_ioctl.h
+ *
+ * Tegra graphics host driver
+ *
+ * Copyright (c) 2016-2020, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef AVUTIL_NVHOST_IOCTL_H
+#define AVUTIL_NVHOST_IOCTL_H
+
+#ifndef __SWITCH__
+#   include <linux/ioctl.h>
+#   include <linux/types.h>
+#else
+#   include <switch.h>
+
+#   define _IO       _NV_IO
+#   define _IOR      _NV_IOR
+#   define _IOW      _NV_IOW
+#   define _IOWR     _NV_IOWR
+
+#   define _IOC_DIR  _NV_IOC_DIR
+#   define _IOC_TYPE _NV_IOC_TYPE
+#   define _IOC_NR   _NV_IOC_NR
+#   define _IOC_SIZE _NV_IOC_SIZE
+#endif
+
+#define __user
+
+#define NVHOST_INVALID_SYNCPOINT 0xFFFFFFFF
+#define NVHOST_NO_TIMEOUT (-1)
+#define NVHOST_NO_CONTEXT 0x0
+#define NVHOST_IOCTL_MAGIC 'H'
+#define NVHOST_PRIORITY_LOW 50
+#define NVHOST_PRIORITY_MEDIUM 100
+#define NVHOST_PRIORITY_HIGH 150
+
+#define NVHOST_TIMEOUT_FLAG_DISABLE_DUMP    0
+
+#define NVHOST_SUBMIT_VERSION_V0            0x0
+#define NVHOST_SUBMIT_VERSION_V1            0x1
+#define NVHOST_SUBMIT_VERSION_V2            0x2
+#define NVHOST_SUBMIT_VERSION_MAX_SUPPORTED NVHOST_SUBMIT_VERSION_V2
+
+struct nvhost_cmdbuf {
+    uint32_t mem;
+    uint32_t offset;
+    uint32_t words;
+} __attribute__((packed));
+
+struct nvhost_cmdbuf_ext {
+    int32_t  pre_fence;
+    uint32_t reserved;
+};
+
+struct nvhost_reloc {
+    uint32_t cmdbuf_mem;
+    uint32_t cmdbuf_offset;
+    uint32_t target;
+    uint32_t target_offset;
+};
+
+struct nvhost_reloc_shift {
+    uint32_t shift;
+} __attribute__((packed));
+
+#define NVHOST_RELOC_TYPE_DEFAULT    0
+#define NVHOST_RELOC_TYPE_PITCH_LINEAR    1
+#define NVHOST_RELOC_TYPE_BLOCK_LINEAR    2
+#define NVHOST_RELOC_TYPE_NVLINK    3
+struct nvhost_reloc_type {
+    uint32_t reloc_type;
+    uint32_t padding;
+};
+
+struct nvhost_waitchk {
+    uint32_t mem;
+    uint32_t offset;
+    uint32_t syncpt_id;
+    uint32_t thresh;
+};
+
+struct nvhost_syncpt_incr {
+    uint32_t syncpt_id;
+    uint32_t syncpt_incrs;
+};
+
+struct nvhost_get_param_args {
+    uint32_t value;
+} __attribute__((packed));
+
+struct nvhost_get_param_arg {
+    uint32_t param;
+    uint32_t value;
+};
+
+struct nvhost_get_client_managed_syncpt_arg {
+    uint64_t name;
+    uint32_t param;
+    uint32_t value;
+};
+
+struct nvhost_free_client_managed_syncpt_arg {
+    uint32_t param;
+    uint32_t value;
+};
+
+struct nvhost_channel_open_args {
+    int32_t  channel_fd;
+};
+
+struct nvhost_set_syncpt_name_args {
+    uint64_t name;
+    uint32_t syncpt_id;
+    uint32_t padding;
+};
+
+struct nvhost_set_nvmap_fd_args {
+    uint32_t fd;
+} __attribute__((packed));
+
+enum nvhost_clk_attr {
+    NVHOST_CLOCK = 0,
+    NVHOST_BW,
+    NVHOST_PIXELRATE,
+    NVHOST_BW_KHZ,
+};
+
+/*
+ * moduleid[15:0]  => module id
+ * moduleid[24:31] => nvhost_clk_attr
+ */
+#define NVHOST_MODULE_ID_BIT_POS    0
+#define NVHOST_MODULE_ID_BIT_WIDTH  16
+#define NVHOST_CLOCK_ATTR_BIT_POS   24
+#define NVHOST_CLOCK_ATTR_BIT_WIDTH 8
+struct nvhost_clk_rate_args {
+    uint32_t rate;
+    uint32_t moduleid;
+};
+
+struct nvhost_set_timeout_args {
+    uint32_t timeout;
+} __attribute__((packed));
+
+struct nvhost_set_timeout_ex_args {
+    uint32_t timeout;
+    uint32_t flags;
+};
+
+struct nvhost_set_priority_args {
+    uint32_t priority;
+} __attribute__((packed));
+
+struct nvhost_set_error_notifier {
+    uint64_t offset;
+    uint64_t size;
+    uint32_t mem;
+    uint32_t padding;
+};
+
+struct nvhost32_ctrl_module_regrdwr_args {
+    uint32_t id;
+    uint32_t num_offsets;
+    uint32_t block_size;
+    uint32_t offsets;
+    uint32_t values;
+    uint32_t write;
+};
+
+struct nvhost_ctrl_module_regrdwr_args {
+    uint32_t id;
+    uint32_t num_offsets;
+    uint32_t block_size;
+    uint32_t write;
+    uint64_t offsets;
+    uint64_t values;
+};
+
+struct nvhost32_submit_args {
+    uint32_t submit_version;
+    uint32_t num_syncpt_incrs;
+    uint32_t num_cmdbufs;
+    uint32_t num_relocs;
+    uint32_t num_waitchks;
+    uint32_t timeout;
+    uint32_t syncpt_incrs;
+    uint32_t cmdbufs;
+    uint32_t relocs;
+    uint32_t reloc_shifts;
+    uint32_t waitchks;
+    uint32_t waitbases;
+    uint32_t class_ids;
+
+    uint32_t pad[2];        /* future expansion */
+
+    uint32_t fences;
+    uint32_t fence;        /* Return value */
+} __attribute__((packed));
+
+#define NVHOST_SUBMIT_FLAG_SYNC_FENCE_FD    0
+#define NVHOST_SUBMIT_MAX_NUM_SYNCPT_INCRS    10
+
+struct nvhost_submit_args {
+    uint32_t submit_version;
+    uint32_t num_syncpt_incrs;
+    uint32_t num_cmdbufs;
+    uint32_t num_relocs;
+    uint32_t num_waitchks;
+    uint32_t timeout;
+    uint32_t flags;
+    uint32_t fence;        /* Return value */
+    uint64_t syncpt_incrs;
+    uint64_t cmdbuf_exts;
+
+    uint32_t checksum_methods;
+    uint32_t checksum_falcon_methods;
+
+    uint64_t pad[1];        /* future expansion */
+
+    uint64_t reloc_types;
+    uint64_t cmdbufs;
+    uint64_t relocs;
+    uint64_t reloc_shifts;
+    uint64_t waitchks;
+    uint64_t waitbases;
+    uint64_t class_ids;
+    uint64_t fences;
+};
+
+struct nvhost_set_ctxswitch_args {
+    uint32_t num_cmdbufs_save;
+    uint32_t num_save_incrs;
+    uint32_t save_incrs;
+    uint32_t save_waitbases;
+    uint32_t cmdbuf_save;
+    uint32_t num_cmdbufs_restore;
+    uint32_t num_restore_incrs;
+    uint32_t restore_incrs;
+    uint32_t restore_waitbases;
+    uint32_t cmdbuf_restore;
+    uint32_t num_relocs;
+    uint32_t relocs;
+    uint32_t reloc_shifts;
+
+    uint32_t pad;
+};
+
+struct nvhost_channel_buffer {
+    uint32_t dmabuf_fd;    /* in */
+    uint32_t reserved0;    /* reserved, must be 0 */
+    uint64_t reserved1[2];    /* reserved, must be 0 */
+    uint64_t address;        /* out, device view to the buffer */
+};
+
+struct nvhost_channel_unmap_buffer_args {
+    uint32_t num_buffers;    /* in, number of buffers to unmap */
+    uint32_t reserved;        /* reserved, must be 0 */
+    uint64_t table_address;    /* pointer to beginning of buffer */
+};
+
+struct nvhost_channel_map_buffer_args {
+    uint32_t num_buffers;    /* in, number of buffers to map */
+    uint32_t reserved;        /* reserved, must be 0 */
+    uint64_t table_address;    /* pointer to beginning of buffer */
+};
+
+#define NVHOST_IOCTL_CHANNEL_GET_SYNCPOINTS    \
+    _IOR(NVHOST_IOCTL_MAGIC, 2, struct nvhost_get_param_args)
+#define NVHOST_IOCTL_CHANNEL_GET_WAITBASES    \
+    _IOR(NVHOST_IOCTL_MAGIC, 3, struct nvhost_get_param_args)
+#define NVHOST_IOCTL_CHANNEL_GET_MODMUTEXES    \
+    _IOR(NVHOST_IOCTL_MAGIC, 4, struct nvhost_get_param_args)
+#define NVHOST_IOCTL_CHANNEL_SET_NVMAP_FD    \
+    _IOW(NVHOST_IOCTL_MAGIC, 5, struct nvhost_set_nvmap_fd_args)
+#define NVHOST_IOCTL_CHANNEL_NULL_KICKOFF    \
+    _IOR(NVHOST_IOCTL_MAGIC, 6, struct nvhost_get_param_args)
+#define NVHOST_IOCTL_CHANNEL_GET_CLK_RATE        \
+    _IOWR(NVHOST_IOCTL_MAGIC, 9, struct nvhost_clk_rate_args)
+#define NVHOST_IOCTL_CHANNEL_SET_CLK_RATE        \
+    _IOW(NVHOST_IOCTL_MAGIC, 10, struct nvhost_clk_rate_args)
+#define NVHOST_IOCTL_CHANNEL_SET_TIMEOUT    \
+    _IOW(NVHOST_IOCTL_MAGIC, 11, struct nvhost_set_timeout_args)
+#define NVHOST_IOCTL_CHANNEL_GET_TIMEDOUT    \
+    _IOR(NVHOST_IOCTL_MAGIC, 12, struct nvhost_get_param_args)
+#define NVHOST_IOCTL_CHANNEL_SET_PRIORITY    \
+    _IOW(NVHOST_IOCTL_MAGIC, 13, struct nvhost_set_priority_args)
+#define    NVHOST32_IOCTL_CHANNEL_MODULE_REGRDWR    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 14, struct nvhost32_ctrl_module_regrdwr_args)
+#define NVHOST32_IOCTL_CHANNEL_SUBMIT        \
+    _IOWR(NVHOST_IOCTL_MAGIC, 15, struct nvhost32_submit_args)
+#define NVHOST_IOCTL_CHANNEL_GET_SYNCPOINT    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 16, struct nvhost_get_param_arg)
+#define NVHOST_IOCTL_CHANNEL_GET_WAITBASE    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 17, struct nvhost_get_param_arg)
+#define NVHOST_IOCTL_CHANNEL_SET_TIMEOUT_EX    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 18, struct nvhost_set_timeout_ex_args)
+#define NVHOST_IOCTL_CHANNEL_GET_CLIENT_MANAGED_SYNCPOINT    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 19, struct nvhost_get_client_managed_syncpt_arg)
+#define NVHOST_IOCTL_CHANNEL_FREE_CLIENT_MANAGED_SYNCPOINT    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 20, struct nvhost_free_client_managed_syncpt_arg)
+#define NVHOST_IOCTL_CHANNEL_GET_MODMUTEX    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 23, struct nvhost_get_param_arg)
+#define NVHOST_IOCTL_CHANNEL_SET_CTXSWITCH    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 25, struct nvhost_set_ctxswitch_args)
+
+/* ioctls added for 64bit compatibility */
+#define NVHOST_IOCTL_CHANNEL_SUBMIT    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 26, struct nvhost_submit_args)
+#define    NVHOST_IOCTL_CHANNEL_MODULE_REGRDWR    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 27, struct nvhost_ctrl_module_regrdwr_args)
+
+#define    NVHOST_IOCTL_CHANNEL_MAP_BUFFER    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 28, struct nvhost_channel_map_buffer_args)
+#define    NVHOST_IOCTL_CHANNEL_UNMAP_BUFFER    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 29, struct nvhost_channel_unmap_buffer_args)
+
+#define NVHOST_IOCTL_CHANNEL_SET_SYNCPOINT_NAME    \
+    _IOW(NVHOST_IOCTL_MAGIC, 30, struct nvhost_set_syncpt_name_args)
+
+#define NVHOST_IOCTL_CHANNEL_SET_ERROR_NOTIFIER  \
+    _IOWR(NVHOST_IOCTL_MAGIC, 111, struct nvhost_set_error_notifier)
+#define NVHOST_IOCTL_CHANNEL_OPEN    \
+    _IOR(NVHOST_IOCTL_MAGIC,  112, struct nvhost_channel_open_args)
+
+#define NVHOST_IOCTL_CHANNEL_LAST    \
+    _IOC_NR(NVHOST_IOCTL_CHANNEL_OPEN)
+#define NVHOST_IOCTL_CHANNEL_MAX_ARG_SIZE sizeof(struct nvhost_submit_args)
+
+struct nvhost_ctrl_syncpt_read_args {
+    uint32_t id;
+    uint32_t value;
+};
+
+struct nvhost_ctrl_syncpt_incr_args {
+    uint32_t id;
+} __attribute__((packed));
+
+struct nvhost_ctrl_syncpt_wait_args {
+    uint32_t id;
+    uint32_t thresh;
+    int32_t  timeout;
+} __attribute__((packed));
+
+struct nvhost_ctrl_syncpt_waitex_args {
+    uint32_t id;
+    uint32_t thresh;
+    int32_t  timeout;
+    uint32_t value;
+};
+
+struct nvhost_ctrl_syncpt_waitmex_args {
+    uint32_t id;
+    uint32_t thresh;
+    int32_t  timeout;
+    uint32_t value;
+    uint32_t tv_sec;
+    uint32_t tv_nsec;
+    uint32_t clock_id;
+    uint32_t reserved;
+};
+
+struct nvhost_ctrl_sync_fence_info {
+    uint32_t id;
+    uint32_t thresh;
+};
+
+struct nvhost32_ctrl_sync_fence_create_args {
+    uint32_t num_pts;
+    uint64_t pts; /* struct nvhost_ctrl_sync_fence_info* */
+    uint64_t name; /* const char* */
+    int32_t  fence_fd; /* fd of new fence */
+};
+
+struct nvhost_ctrl_sync_fence_create_args {
+    uint32_t num_pts;
+    int32_t  fence_fd; /* fd of new fence */
+    uint64_t pts; /* struct nvhost_ctrl_sync_fence_info* */
+    uint64_t name; /* const char* */
+};
+
+struct nvhost_ctrl_sync_fence_name_args {
+    uint64_t name; /* const char* for name */
+    int32_t  fence_fd; /* fd of fence */
+};
+
+struct nvhost_ctrl_module_mutex_args {
+    uint32_t id;
+    uint32_t lock;
+};
+
+enum nvhost_module_id {
+    NVHOST_MODULE_NONE = -1,
+    NVHOST_MODULE_DISPLAY_A = 0,
+    NVHOST_MODULE_DISPLAY_B,
+    NVHOST_MODULE_VI,
+    NVHOST_MODULE_ISP,
+    NVHOST_MODULE_MPE,
+    NVHOST_MODULE_MSENC,
+    NVHOST_MODULE_TSEC,
+    NVHOST_MODULE_GPU,
+    NVHOST_MODULE_VIC,
+    NVHOST_MODULE_NVDEC,
+    NVHOST_MODULE_NVJPG,
+    NVHOST_MODULE_VII2C,
+    NVHOST_MODULE_NVENC1,
+    NVHOST_MODULE_NVDEC1,
+    NVHOST_MODULE_NVCSI,
+    NVHOST_MODULE_TSECB = (1<<16) | NVHOST_MODULE_TSEC,
+};
+
+struct nvhost_characteristics {
+#define NVHOST_CHARACTERISTICS_GFILTER (1 << 0)
+#define NVHOST_CHARACTERISTICS_RESOURCE_PER_CHANNEL_INSTANCE (1 << 1)
+#define NVHOST_CHARACTERISTICS_SUPPORT_PREFENCES (1 << 2)
+    uint64_t flags;
+
+    uint32_t num_mlocks;
+    uint32_t num_syncpts;
+
+    uint32_t syncpts_base;
+    uint32_t syncpts_limit;
+
+    uint32_t num_hw_pts;
+    uint32_t padding;
+};
+
+struct nvhost_ctrl_get_characteristics {
+    uint64_t nvhost_characteristics_buf_size;
+    uint64_t nvhost_characteristics_buf_addr;
+};
+
+struct nvhost_ctrl_check_module_support_args {
+    uint32_t module_id;
+    uint32_t value;
+};
+
+struct nvhost_ctrl_poll_fd_create_args {
+    int32_t  fd;
+    uint32_t padding;
+};
+
+struct nvhost_ctrl_poll_fd_trigger_event_args {
+    int32_t  fd;
+    uint32_t id;
+    uint32_t thresh;
+    uint32_t padding;
+};
+
+#define NVHOST_IOCTL_CTRL_SYNCPT_READ        \
+    _IOWR(NVHOST_IOCTL_MAGIC, 1, struct nvhost_ctrl_syncpt_read_args)
+#define NVHOST_IOCTL_CTRL_SYNCPT_INCR        \
+    _IOW(NVHOST_IOCTL_MAGIC, 2, struct nvhost_ctrl_syncpt_incr_args)
+#define NVHOST_IOCTL_CTRL_SYNCPT_WAIT        \
+    _IOW(NVHOST_IOCTL_MAGIC, 3, struct nvhost_ctrl_syncpt_wait_args)
+
+#define NVHOST_IOCTL_CTRL_MODULE_MUTEX        \
+    _IOWR(NVHOST_IOCTL_MAGIC, 4, struct nvhost_ctrl_module_mutex_args)
+#define NVHOST32_IOCTL_CTRL_MODULE_REGRDWR    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 5, struct nvhost32_ctrl_module_regrdwr_args)
+
+#define NVHOST_IOCTL_CTRL_SYNCPT_WAITEX        \
+    _IOWR(NVHOST_IOCTL_MAGIC, 6, struct nvhost_ctrl_syncpt_waitex_args)
+
+#define NVHOST_IOCTL_CTRL_GET_VERSION    \
+    _IOR(NVHOST_IOCTL_MAGIC, 7, struct nvhost_get_param_args)
+
+#define NVHOST_IOCTL_CTRL_SYNCPT_READ_MAX    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 8, struct nvhost_ctrl_syncpt_read_args)
+
+#define NVHOST_IOCTL_CTRL_SYNCPT_WAITMEX    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 9, struct nvhost_ctrl_syncpt_waitmex_args)
+
+#define NVHOST32_IOCTL_CTRL_SYNC_FENCE_CREATE    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 10, struct nvhost32_ctrl_sync_fence_create_args)
+#define NVHOST_IOCTL_CTRL_SYNC_FENCE_CREATE    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 11, struct nvhost_ctrl_sync_fence_create_args)
+#define NVHOST_IOCTL_CTRL_MODULE_REGRDWR    \
+    _IOWR(NVHOST_IOCTL_MAGIC, 12, struct nvhost_ctrl_module_regrdwr_args)
+#define NVHOST_IOCTL_CTRL_SYNC_FENCE_SET_NAME  \
+    _IOWR(NVHOST_IOCTL_MAGIC, 13, struct nvhost_ctrl_sync_fence_name_args)
+#define NVHOST_IOCTL_CTRL_GET_CHARACTERISTICS  \
+    _IOWR(NVHOST_IOCTL_MAGIC, 14, struct nvhost_ctrl_get_characteristics)
+#define NVHOST_IOCTL_CTRL_CHECK_MODULE_SUPPORT  \
+    _IOWR(NVHOST_IOCTL_MAGIC, 15, struct nvhost_ctrl_check_module_support_args)
+#define NVHOST_IOCTL_CTRL_POLL_FD_CREATE    \
+    _IOR(NVHOST_IOCTL_MAGIC, 16, struct nvhost_ctrl_poll_fd_create_args)
+#define NVHOST_IOCTL_CTRL_POLL_FD_TRIGGER_EVENT        \
+    _IOW(NVHOST_IOCTL_MAGIC, 17, struct nvhost_ctrl_poll_fd_trigger_event_args)
+
+#define NVHOST_IOCTL_CTRL_LAST            \
+    _IOC_NR(NVHOST_IOCTL_CTRL_POLL_FD_TRIGGER_EVENT)
+#define NVHOST_IOCTL_CTRL_MAX_ARG_SIZE    \
+    sizeof(struct nvhost_ctrl_syncpt_waitmex_args)
+
+#endif /* AVUTIL_NVHOST_IOCTL_H */
diff --git a/libavutil/nvmap_ioctl.h b/libavutil/nvmap_ioctl.h
new file mode 100644
index 0000000000..55e0bea4dc
--- /dev/null
+++ b/libavutil/nvmap_ioctl.h
@@ -0,0 +1,451 @@
+/*
+ * include/uapi/linux/nvmap.h
+ *
+ * structure declarations for nvmem and nvmap user-space ioctls
+ *
+ * Copyright (c) 2009-2020, NVIDIA CORPORATION. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef __SWITCH__
+#include <linux/ioctl.h>
+#include <linux/types.h>
+#else
+#   include <switch.h>
+
+#   define _IO       _NV_IO
+#   define _IOR      _NV_IOR
+#   define _IOW      _NV_IOW
+#   define _IOWR     _NV_IOWR
+
+#   define _IOC_DIR  _NV_IOC_DIR
+#   define _IOC_TYPE _NV_IOC_TYPE
+#   define _IOC_NR   _NV_IOC_NR
+#   define _IOC_SIZE _NV_IOC_SIZE
+#endif
+
+#ifndef AVUTIL_NVMAP_IOCTL_H
+#define AVUTIL_NVMAP_IOCTL_H
+
+/*
+ * From linux-headers nvidia/include/linux/nvmap.h
+ */
+#define NVMAP_HEAP_IOVMM   (1ul<<30)
+
+/* common carveout heaps */
+#define NVMAP_HEAP_CARVEOUT_IRAM    (1ul<<29)
+#define NVMAP_HEAP_CARVEOUT_VPR     (1ul<<28)
+#define NVMAP_HEAP_CARVEOUT_TSEC    (1ul<<27)
+#define NVMAP_HEAP_CARVEOUT_VIDMEM  (1ul<<26)
+#define NVMAP_HEAP_CARVEOUT_IVM     (1ul<<1)
+#define NVMAP_HEAP_CARVEOUT_GENERIC (1ul<<0)
+
+#define NVMAP_HEAP_CARVEOUT_MASK    (NVMAP_HEAP_IOVMM - 1)
+
+/* allocation flags */
+#define NVMAP_HANDLE_UNCACHEABLE     (0x0ul << 0)
+#define NVMAP_HANDLE_WRITE_COMBINE   (0x1ul << 0)
+#define NVMAP_HANDLE_INNER_CACHEABLE (0x2ul << 0)
+#define NVMAP_HANDLE_CACHEABLE       (0x3ul << 0)
+#define NVMAP_HANDLE_CACHE_FLAG      (0x3ul << 0)
+
+#define NVMAP_HANDLE_SECURE          (0x1ul << 2)
+#define NVMAP_HANDLE_KIND_SPECIFIED  (0x1ul << 3)
+#define NVMAP_HANDLE_COMPR_SPECIFIED (0x1ul << 4)
+#define NVMAP_HANDLE_ZEROED_PAGES    (0x1ul << 5)
+#define NVMAP_HANDLE_PHYS_CONTIG     (0x1ul << 6)
+#define NVMAP_HANDLE_CACHE_SYNC      (0x1ul << 7)
+#define NVMAP_HANDLE_CACHE_SYNC_AT_RESERVE      (0x1ul << 8)
+#define NVMAP_HANDLE_RO	             (0x1ul << 9)
+
+/*
+ * DOC: NvMap Userspace API
+ *
+ * create a client by opening /dev/nvmap
+ * most operations handled via following ioctls
+ *
+ */
+enum {
+    NVMAP_HANDLE_PARAM_SIZE = 1,
+    NVMAP_HANDLE_PARAM_ALIGNMENT,
+    NVMAP_HANDLE_PARAM_BASE,
+    NVMAP_HANDLE_PARAM_HEAP,
+    NVMAP_HANDLE_PARAM_KIND,
+    NVMAP_HANDLE_PARAM_COMPR, /* ignored, to be removed */
+};
+
+enum {
+    NVMAP_CACHE_OP_WB = 0,
+    NVMAP_CACHE_OP_INV,
+    NVMAP_CACHE_OP_WB_INV,
+};
+
+enum {
+    NVMAP_PAGES_UNRESERVE = 0,
+    NVMAP_PAGES_RESERVE,
+    NVMAP_INSERT_PAGES_ON_UNRESERVE,
+    NVMAP_PAGES_PROT_AND_CLEAN,
+};
+
+#define NVMAP_ELEM_SIZE_U64 (1 << 31)
+
+struct nvmap_create_handle {
+    union {
+        struct {
+            union {
+                /* size will be overwritten */
+                uint32_t size; /* CreateHandle */
+                int32_t  fd;   /* DmaBufFd or FromFd */
+            };
+            uint32_t handle;   /* returns nvmap handle */
+        };
+        struct {
+            /* one is input parameter, and other is output parameter
+             * since its a union please note that input parameter
+             * will be overwritten once ioctl returns
+             */
+            union {
+                uint64_t ivm_id;     /* CreateHandle from ivm*/
+                int32_t  ivm_handle; /* Get ivm_id from handle */
+            };
+        };
+        struct {
+            union {
+                /* size64 will be overwritten */
+                uint64_t size64;   /* CreateHandle */
+                uint32_t handle64; /* returns nvmap handle */
+            };
+        };
+    };
+};
+
+struct nvmap_create_handle_from_va {
+    uint64_t va;                /* FromVA*/
+    uint32_t size;              /* non-zero for partial memory VMA. zero for end of VMA */
+    uint32_t flags;             /* wb/wc/uc/iwb, tag etc. */
+    union {
+        uint32_t handle;        /* returns nvmap handle */
+        uint64_t size64;        /* used when size is 0 */
+    };
+};
+
+struct nvmap_gup_test {
+    uint64_t va;       /* FromVA*/
+    uint32_t handle;   /* returns nvmap handle */
+    uint32_t result;   /* result=1 for pass, result=-err for failure */
+};
+
+struct nvmap_alloc_handle {
+    uint32_t handle;       /* nvmap handle */
+    uint32_t heap_mask;    /* heaps to allocate from */
+    uint32_t flags;        /* wb/wc/uc/iwb etc. */
+    uint32_t align;        /* min alignment necessary */
+};
+
+struct nvmap_alloc_ivm_handle {
+    uint32_t handle;       /* nvmap handle */
+    uint32_t heap_mask;    /* heaps to allocate from */
+    uint32_t flags;        /* wb/wc/uc/iwb etc. */
+    uint32_t align;        /* min alignment necessary */
+    uint32_t peer;         /* peer with whom handle must be shared. Used
+                           *  only for NVMAP_HEAP_CARVEOUT_IVM
+                           */
+};
+
+struct nvmap_alloc_kind_handle {
+    uint32_t handle;       /* nvmap handle */
+    uint32_t heap_mask;
+    uint32_t flags;
+    uint32_t align;
+    uint8_t  kind;
+    uint8_t  comp_tags;
+};
+
+struct nvmap_map_caller {
+    uint32_t handle;       /* nvmap handle */
+    uint32_t offset;       /* offset into hmem; should be page-aligned */
+    uint32_t length;       /* number of bytes to map */
+    uint32_t flags;        /* maps as wb/iwb etc. */
+    unsigned long addr;    /* user pointer */
+};
+
+#ifdef CONFIG_COMPAT
+struct nvmap_map_caller_32 {
+    uint32_t handle;       /* nvmap handle */
+    uint32_t offset;       /* offset into hmem; should be page-aligned */
+    uint32_t length;       /* number of bytes to map */
+    uint32_t flags;        /* maps as wb/iwb etc. */
+    uint32_t addr;         /* user pointer*/
+};
+#endif
+
+struct nvmap_rw_handle {
+    unsigned long addr;    /* user pointer*/
+    uint32_t handle;       /* nvmap handle */
+    uint32_t offset;       /* offset into hmem */
+    uint32_t elem_size;    /* individual atom size */
+    uint32_t hmem_stride;  /* delta in bytes between atoms in hmem */
+    uint32_t user_stride;  /* delta in bytes between atoms in user */
+    uint32_t count;        /* number of atoms to copy */
+};
+
+struct nvmap_rw_handle_64 {
+    unsigned long addr;    /* user pointer*/
+    uint32_t handle;       /* nvmap handle */
+    uint64_t offset;       /* offset into hmem */
+    uint64_t elem_size;    /* individual atom size */
+    uint64_t hmem_stride;  /* delta in bytes between atoms in hmem */
+    uint64_t user_stride;  /* delta in bytes between atoms in user */
+    uint64_t count;        /* number of atoms to copy */
+};
+
+#ifdef CONFIG_COMPAT
+struct nvmap_rw_handle_32 {
+    uint32_t addr;         /* user pointer */
+    uint32_t handle;       /* nvmap handle */
+    uint32_t offset;       /* offset into hmem */
+    uint32_t elem_size;    /* individual atom size */
+    uint32_t hmem_stride;  /* delta in bytes between atoms in hmem */
+    uint32_t user_stride;  /* delta in bytes between atoms in user */
+    uint32_t count;        /* number of atoms to copy */
+};
+#endif
+
+struct nvmap_pin_handle {
+    uint32_t *handles;         /* array of handles to pin/unpin */
+    unsigned long *addr;       /* array of addresses to return */
+    uint32_t count;            /* number of entries in handles */
+};
+
+#ifdef CONFIG_COMPAT
+struct nvmap_pin_handle_32 {
+    uint32_t handles;          /* array of handles to pin/unpin */
+    uint32_t addr;             /*  array of addresses to return */
+    uint32_t count;            /* number of entries in handles */
+};
+#endif
+
+struct nvmap_handle_param {
+    uint32_t handle;           /* nvmap handle */
+    uint32_t param;            /* size/align/base/heap etc. */
+    unsigned long result;      /* returns requested info*/
+};
+
+#ifdef CONFIG_COMPAT
+struct nvmap_handle_param_32 {
+    uint32_t handle;           /* nvmap handle */
+    uint32_t param;            /* size/align/base/heap etc. */
+    uint32_t result;           /* returns requested info*/
+};
+#endif
+
+struct nvmap_cache_op {
+    unsigned long addr;    /* user pointer*/
+    uint32_t handle;       /* nvmap handle */
+    uint32_t len;          /* bytes to flush */
+    int32_t  op;           /* wb/wb_inv/inv */
+};
+
+struct nvmap_cache_op_64 {
+    unsigned long addr;    /* user pointer*/
+    uint32_t handle;       /* nvmap handle */
+    uint64_t len;          /* bytes to flush */
+    int32_t  op;           /* wb/wb_inv/inv */
+};
+
+#ifdef CONFIG_COMPAT
+struct nvmap_cache_op_32 {
+    uint32_t addr;         /* user pointer*/
+    uint32_t handle;       /* nvmap handle */
+    uint32_t len;          /* bytes to flush */
+    int32_t  op;           /* wb/wb_inv/inv */
+};
+#endif
+
+struct nvmap_cache_op_list {
+    uint64_t handles;      /* Ptr to u32 type array, holding handles */
+    uint64_t offsets;      /* Ptr to u32 type array, holding offsets
+                           * into handle mem */
+    uint64_t sizes;        /* Ptr to u32 type array, holindg sizes of memory
+                           * regions within each handle */
+    uint32_t nr;           /* Number of handles */
+    int32_t  op;           /* wb/wb_inv/inv */
+};
+
+struct nvmap_debugfs_handles_header {
+    uint8_t version;
+};
+
+struct nvmap_debugfs_handles_entry {
+    uint64_t base;
+    uint64_t size;
+    uint32_t flags;
+    uint32_t share_count;
+    uint64_t mapped_size;
+};
+
+struct nvmap_set_tag_label {
+    uint32_t tag;
+    uint32_t len;          /* in: label length
+                           out: number of characters copied */
+    uint64_t addr;         /* in: pointer to label or NULL to remove */
+};
+
+struct nvmap_available_heaps {
+    uint64_t heaps;        /* heaps bitmask */
+};
+
+struct nvmap_heap_size {
+    uint32_t heap;
+    uint64_t size;
+};
+
+/**
+ * Struct used while querying heap parameters
+ */
+struct nvmap_query_heap_params {
+    uint32_t heap_mask;
+    uint32_t flags;
+    uint8_t contig;
+    uint64_t total;
+    uint64_t free;
+    uint64_t largest_free_block;
+};
+
+struct nvmap_handle_parameters {
+    uint8_t contig;
+    uint32_t import_id;
+    uint32_t handle;
+    uint32_t heap_number;
+    uint32_t access_flags;
+    uint64_t heap;
+    uint64_t align;
+    uint64_t coherency;
+    uint64_t size;
+};
+
+#define NVMAP_IOC_MAGIC 'N'
+
+/* Creates a new memory handle. On input, the argument is the size of the new
+ * handle; on return, the argument is the name of the new handle
+ */
+#define NVMAP_IOC_CREATE  _IOWR(NVMAP_IOC_MAGIC, 0, struct nvmap_create_handle)
+#define NVMAP_IOC_CREATE_64 \
+    _IOWR(NVMAP_IOC_MAGIC, 1, struct nvmap_create_handle)
+#define NVMAP_IOC_FROM_ID _IOWR(NVMAP_IOC_MAGIC, 2, struct nvmap_create_handle)
+
+/* Actually allocates memory for the specified handle */
+#define NVMAP_IOC_ALLOC    _IOW(NVMAP_IOC_MAGIC, 3, struct nvmap_alloc_handle)
+
+/* Frees a memory handle, unpinning any pinned pages and unmapping any mappings
+ */
+#define NVMAP_IOC_FREE       _IO(NVMAP_IOC_MAGIC, 4)
+
+/* Maps the region of the specified handle into a user-provided virtual address
+ * that was previously created via an mmap syscall on this fd */
+#define NVMAP_IOC_MMAP       _IOWR(NVMAP_IOC_MAGIC, 5, struct nvmap_map_caller)
+#ifdef CONFIG_COMPAT
+#define NVMAP_IOC_MMAP_32    _IOWR(NVMAP_IOC_MAGIC, 5, struct nvmap_map_caller_32)
+#endif
+
+/* Reads/writes data (possibly strided) from a user-provided buffer into the
+ * hmem at the specified offset */
+#define NVMAP_IOC_WRITE      _IOW(NVMAP_IOC_MAGIC, 6, struct nvmap_rw_handle)
+#define NVMAP_IOC_READ       _IOW(NVMAP_IOC_MAGIC, 7, struct nvmap_rw_handle)
+#ifdef CONFIG_COMPAT
+#define NVMAP_IOC_WRITE_32   _IOW(NVMAP_IOC_MAGIC, 6, struct nvmap_rw_handle_32)
+#define NVMAP_IOC_READ_32    _IOW(NVMAP_IOC_MAGIC, 7, struct nvmap_rw_handle_32)
+#endif
+#define NVMAP_IOC_WRITE_64 \
+    _IOW(NVMAP_IOC_MAGIC, 6, struct nvmap_rw_handle_64)
+#define NVMAP_IOC_READ_64 \
+    _IOW(NVMAP_IOC_MAGIC, 7, struct nvmap_rw_handle_64)
+
+#define NVMAP_IOC_PARAM _IOWR(NVMAP_IOC_MAGIC, 8, struct nvmap_handle_param)
+#ifdef CONFIG_COMPAT
+#define NVMAP_IOC_PARAM_32 _IOWR(NVMAP_IOC_MAGIC, 8, struct nvmap_handle_param_32)
+#endif
+
+/* Pins a list of memory handles into IO-addressable memory (either IOVMM
+ * space or physical memory, depending on the allocation), and returns the
+ * address. Handles may be pinned recursively. */
+#define NVMAP_IOC_PIN_MULT      _IOWR(NVMAP_IOC_MAGIC, 10, struct nvmap_pin_handle)
+#define NVMAP_IOC_UNPIN_MULT    _IOW(NVMAP_IOC_MAGIC, 11, struct nvmap_pin_handle)
+#ifdef CONFIG_COMPAT
+#define NVMAP_IOC_PIN_MULT_32   _IOWR(NVMAP_IOC_MAGIC, 10, struct nvmap_pin_handle_32)
+#define NVMAP_IOC_UNPIN_MULT_32 _IOW(NVMAP_IOC_MAGIC, 11, struct nvmap_pin_handle_32)
+#endif
+
+#define NVMAP_IOC_CACHE      _IOW(NVMAP_IOC_MAGIC, 12, struct nvmap_cache_op)
+#define NVMAP_IOC_CACHE_64   _IOW(NVMAP_IOC_MAGIC, 12, struct nvmap_cache_op_64)
+#ifdef CONFIG_COMPAT
+#define NVMAP_IOC_CACHE_32  _IOW(NVMAP_IOC_MAGIC, 12, struct nvmap_cache_op_32)
+#endif
+
+/* Returns a global ID usable to allow a remote process to create a handle
+ * reference to the same handle */
+#define NVMAP_IOC_GET_ID  _IOWR(NVMAP_IOC_MAGIC, 13, struct nvmap_create_handle)
+
+/* Returns a dma-buf fd usable to allow a remote process to create a handle
+ * reference to the same handle */
+#define NVMAP_IOC_SHARE  _IOWR(NVMAP_IOC_MAGIC, 14, struct nvmap_create_handle)
+
+/* Returns a file id that allows a remote process to create a handle
+ * reference to the same handle */
+#define NVMAP_IOC_GET_FD  _IOWR(NVMAP_IOC_MAGIC, 15, struct nvmap_create_handle)
+
+/* Create a new memory handle from file id passed */
+#define NVMAP_IOC_FROM_FD _IOWR(NVMAP_IOC_MAGIC, 16, struct nvmap_create_handle)
+
+/* Perform cache maintenance on a list of handles. */
+#define NVMAP_IOC_CACHE_LIST _IOW(NVMAP_IOC_MAGIC, 17,    \
+                  struct nvmap_cache_op_list)
+/* Perform reserve operation on a list of handles. */
+#define NVMAP_IOC_RESERVE _IOW(NVMAP_IOC_MAGIC, 18,    \
+                  struct nvmap_cache_op_list)
+
+#define NVMAP_IOC_FROM_IVC_ID _IOWR(NVMAP_IOC_MAGIC, 19, struct nvmap_create_handle)
+#define NVMAP_IOC_GET_IVC_ID _IOWR(NVMAP_IOC_MAGIC, 20, struct nvmap_create_handle)
+#define NVMAP_IOC_GET_IVM_HEAPS _IOR(NVMAP_IOC_MAGIC, 21, unsigned int)
+
+/* Create a new memory handle from VA passed */
+#define NVMAP_IOC_FROM_VA _IOWR(NVMAP_IOC_MAGIC, 22, struct nvmap_create_handle_from_va)
+
+#define NVMAP_IOC_GUP_TEST _IOWR(NVMAP_IOC_MAGIC, 23, struct nvmap_gup_test)
+
+/* Define a label for allocation tag */
+#define NVMAP_IOC_SET_TAG_LABEL    _IOW(NVMAP_IOC_MAGIC, 24, struct nvmap_set_tag_label)
+
+#define NVMAP_IOC_GET_AVAILABLE_HEAPS \
+    _IOR(NVMAP_IOC_MAGIC, 25, struct nvmap_available_heaps)
+
+#define NVMAP_IOC_GET_HEAP_SIZE \
+    _IOR(NVMAP_IOC_MAGIC, 26, struct nvmap_heap_size)
+
+#define NVMAP_IOC_PARAMETERS \
+    _IOR(NVMAP_IOC_MAGIC, 27, struct nvmap_handle_parameters)
+/* START of T124 IOCTLS */
+/* Actually allocates memory for the specified handle, with kind */
+#define NVMAP_IOC_ALLOC_KIND _IOW(NVMAP_IOC_MAGIC, 100, struct nvmap_alloc_kind_handle)
+
+/* Actually allocates memory from IVM heaps */
+#define NVMAP_IOC_ALLOC_IVM _IOW(NVMAP_IOC_MAGIC, 101, struct nvmap_alloc_ivm_handle)
+
+/* Allocate seperate memory for VPR */
+#define NVMAP_IOC_VPR_FLOOR_SIZE _IOW(NVMAP_IOC_MAGIC, 102, uint32_t)
+
+/* Get heap parameters such as total and frre size */
+#define NVMAP_IOC_QUERY_HEAP_PARAMS _IOR(NVMAP_IOC_MAGIC, 105, \
+        struct nvmap_query_heap_params)
+
+#define NVMAP_IOC_MAXNR (_IOC_NR(NVMAP_IOC_QUERY_HEAP_PARAMS))
+
+#endif /* AVUTIL_NVMAP_IOCTL_H */
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 04/16] avutil: add hardware definitions for NVDEC, NVJPG and VIC
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (2 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra averne
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

These files are taken with minimal modification from nvidia's open-gpu-doc project, except VIC-related files which were written following documentation from the Tegra technical reference manual.
NVDEC and NVJPG are nvidia's fixed-function hardware for video decoding and jpeg coding, respectively.
VIC (Video Image Compositor) is a hardware engine for image post-processing, including scaling, deinterlacing, color mapping and basic compositing.

Signed-off-by: averne <averne381@gmail.com>
---
 libavutil/clb0b6.h    |  303 +++++++
 libavutil/clc5b0.h    |  436 ++++++++++
 libavutil/cle7d0.h    |  129 +++
 libavutil/nvdec_drv.h | 1858 +++++++++++++++++++++++++++++++++++++++++
 libavutil/nvjpg_drv.h |  189 +++++
 libavutil/vic_drv.h   |  279 +++++++
 6 files changed, 3194 insertions(+)
 create mode 100644 libavutil/clb0b6.h
 create mode 100644 libavutil/clc5b0.h
 create mode 100644 libavutil/cle7d0.h
 create mode 100644 libavutil/nvdec_drv.h
 create mode 100644 libavutil/nvjpg_drv.h
 create mode 100644 libavutil/vic_drv.h

diff --git a/libavutil/clb0b6.h b/libavutil/clb0b6.h
new file mode 100644
index 0000000000..ee81ebc9d8
--- /dev/null
+++ b/libavutil/clb0b6.h
@@ -0,0 +1,303 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef AVUTIL_CLB0B6_H
+#define AVUTIL_CLB0B6_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NVB0B6_VIDEO_COMPOSITOR                                                 (0x0000B0B6)
+
+#define NVB0B6_VIDEO_COMPOSITOR_NOP                                             (0x00000100)
+#define NVB0B6_VIDEO_COMPOSITOR_NOP_PARAMETER                                   31:0
+#define NVB0B6_VIDEO_COMPOSITOR_PM_TRIGGER                                      (0x00000140)
+#define NVB0B6_VIDEO_COMPOSITOR_PM_TRIGGER_V                                    31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_APPLICATION_ID                              (0x00000200)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_APPLICATION_ID_ID                           31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_APPLICATION_ID_ID_COMPOSITOR                (0x00000000)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_WATCHDOG_TIMER                              (0x00000204)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_WATCHDOG_TIMER_TIMER                        31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_A                                     (0x00000240)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_A_UPPER                               7:0
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_B                                     (0x00000244)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_B_LOWER                               31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_C                                     (0x00000248)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_C_PAYLOAD                             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SAVE_AREA                                   (0x0000024C)
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SAVE_AREA_OFFSET                            27:0
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SAVE_AREA_CTX_VALID                         31:28
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH                                      (0x00000250)
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RESTORE                              0:0
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RESTORE_FALSE                        (0x00000000)
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RESTORE_TRUE                         (0x00000001)
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RST_NOTIFY                           1:1
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RST_NOTIFY_FALSE                     (0x00000000)
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RST_NOTIFY_TRUE                      (0x00000001)
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_RESERVED                             7:2
+#define NVB0B6_VIDEO_COMPOSITOR_CTX_SWITCH_ASID                                 23:8
+#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE                                         (0x00000300)
+#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY                                  0:0
+#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY_DISABLE                          (0x00000000)
+#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY_ENABLE                           (0x00000001)
+#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY_ON                               1:1
+#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY_ON_END                           (0x00000000)
+#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_NOTIFY_ON_BEGIN                         (0x00000001)
+#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_AWAKEN                                  8:8
+#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_AWAKEN_DISABLE                          (0x00000000)
+#define NVB0B6_VIDEO_COMPOSITOR_EXECUTE_AWAKEN_ENABLE                           (0x00000001)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D                                     (0x00000304)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_STRUCTURE_SIZE                      0:0
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_STRUCTURE_SIZE_ONE                  (0x00000000)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_STRUCTURE_SIZE_FOUR                 (0x00000001)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_AWAKEN_ENABLE                       8:8
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_AWAKEN_ENABLE_FALSE                 (0x00000000)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_AWAKEN_ENABLE_TRUE                  (0x00000001)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_OPERATION                           17:16
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_OPERATION_RELEASE                   (0x00000000)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_OPERATION_RESERVED0                 (0x00000001)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_OPERATION_RESERVED1                 (0x00000002)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_OPERATION_TRAP                      (0x00000003)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_FLUSH_DISABLE                       21:21
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_FLUSH_DISABLE_FALSE                 (0x00000000)
+#define NVB0B6_VIDEO_COMPOSITOR_SEMAPHORE_D_FLUSH_DISABLE_TRUE                  (0x00000001)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_LUMA_OFFSET(b)                     (0x00000400 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_LUMA_OFFSET_OFFSET                 31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_CHROMA_U_OFFSET(b)                 (0x00000404 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_CHROMA_U_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_CHROMA_V_OFFSET(b)                 (0x00000408 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_CHROMA_V_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_LUMA_OFFSET(b)                     (0x0000040C + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_LUMA_OFFSET_OFFSET                 31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_CHROMA_U_OFFSET(b)                 (0x00000410 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_CHROMA_U_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_CHROMA_V_OFFSET(b)                 (0x00000414 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE1_CHROMA_V_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_LUMA_OFFSET(b)                     (0x00000418 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_LUMA_OFFSET_OFFSET                 31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_CHROMA_U_OFFSET(b)                 (0x0000041C + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_CHROMA_U_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_CHROMA_V_OFFSET(b)                 (0x00000420 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE2_CHROMA_V_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_LUMA_OFFSET(b)                     (0x00000424 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_LUMA_OFFSET_OFFSET                 31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_CHROMA_U_OFFSET(b)                 (0x00000428 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_CHROMA_U_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_CHROMA_V_OFFSET(b)                 (0x0000042C + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE3_CHROMA_V_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_LUMA_OFFSET(b)                     (0x00000430 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_LUMA_OFFSET_OFFSET                 31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_CHROMA_U_OFFSET(b)                 (0x00000434 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_CHROMA_U_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_CHROMA_V_OFFSET(b)                 (0x00000438 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE4_CHROMA_V_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_LUMA_OFFSET(b)                     (0x0000043C + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_LUMA_OFFSET_OFFSET                 31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_CHROMA_U_OFFSET(b)                 (0x00000440 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_CHROMA_U_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_CHROMA_V_OFFSET(b)                 (0x00000444 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE5_CHROMA_V_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_LUMA_OFFSET(b)                     (0x00000448 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_LUMA_OFFSET_OFFSET                 31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_CHROMA_U_OFFSET(b)                 (0x0000044C + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_CHROMA_U_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_CHROMA_V_OFFSET(b)                 (0x00000450 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE6_CHROMA_V_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_LUMA_OFFSET(b)                     (0x00000454 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_LUMA_OFFSET_OFFSET                 31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_CHROMA_U_OFFSET(b)                 (0x00000458 + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_CHROMA_U_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_CHROMA_V_OFFSET(b)                 (0x0000045C + (b)*0x00000060)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE7_CHROMA_V_OFFSET_OFFSET             31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_PICTURE_INDEX                               (0x00000700)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_PICTURE_INDEX_INDEX                         31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS                              (0x00000704)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS_GPTIMER_ON                   0:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS_DEBUG_MODE                   4:4
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS_FALCON_CONTROL               8:8
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS_CONFIG_STRUCT_SIZE           31:16
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONFIG_STRUCT_OFFSET                        (0x00000708)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONFIG_STRUCT_OFFSET_OFFSET                 31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_FILTER_STRUCT_OFFSET                        (0x0000070C)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_FILTER_STRUCT_OFFSET_OFFSET                 31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_PALETTE_OFFSET                              (0x00000710)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_PALETTE_OFFSET_OFFSET                       31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_HIST_OFFSET                                 (0x00000714)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_HIST_OFFSET_OFFSET                          31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID                                  (0x00000718)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID_FCE_UCODE                        3:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID_CONFIG                           7:4
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID_PALETTE                          11:8
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID_OUTPUT                           15:12
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CONTEXT_ID_HIST                             19:16
+#define NVB0B6_VIDEO_COMPOSITOR_SET_FCE_UCODE_SIZE                              (0x0000071C)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_FCE_UCODE_SIZE_FCE_SZ                       15:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_LUMA_OFFSET                  (0x00000720)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_LUMA_OFFSET_OFFSET           31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_CHROMA_U_OFFSET              (0x00000724)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_CHROMA_U_OFFSET_OFFSET       31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_CHROMA_V_OFFSET              (0x00000728)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_CHROMA_V_OFFSET_OFFSET       31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_FCE_UCODE_OFFSET                            (0x0000072C)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_FCE_UCODE_OFFSET_OFFSET                     31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_STRUCT_OFFSET                           (0x00000730)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_STRUCT_OFFSET_OFFSET                    31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE                                    (0x00000734)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE_INTF_PART_ASEL                     3:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE_INTF_PART_BSEL                     7:4
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE_INTF_PART_CSEL                     11:8
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE_INTF_PART_DSEL                     15:12
+#define NVB0B6_VIDEO_COMPOSITOR_SET_CRC_MODE_CRC_MODE                           16:16
+#define NVB0B6_VIDEO_COMPOSITOR_SET_STATUS_OFFSET                               (0x00000738)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_STATUS_OFFSET_OFFSET                        31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID(b)                          (0x00000740 + (b)*0x00000004)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC0                 3:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC1                 7:4
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC2                 11:8
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC3                 15:12
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC4                 19:16
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC5                 23:20
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC6                 27:24
+#define NVB0B6_VIDEO_COMPOSITOR_SET_SLOT_CONTEXT_ID_CTX_ID_SFC7                 31:28
+#define NVB0B6_VIDEO_COMPOSITOR_SET_HISTORY_BUFFER_OFFSET(b)                    (0x00000780 + (b)*0x00000004)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_HISTORY_BUFFER_OFFSET_OFFSET                31:0
+#define NVB0B6_VIDEO_COMPOSITOR_SET_COMP_TAG_BUFFER_OFFSET(b)                   (0x000007C0 + (b)*0x00000004)
+#define NVB0B6_VIDEO_COMPOSITOR_SET_COMP_TAG_BUFFER_OFFSET_OFFSET               31:0
+#define NVB0B6_VIDEO_COMPOSITOR_PM_TRIGGER_END                                  (0x00001114)
+#define NVB0B6_VIDEO_COMPOSITOR_PM_TRIGGER_END_V                                31:0
+
+#define NVB0B6_DXVAHD_FRAME_FORMAT_PROGRESSIVE                                  0
+#define NVB0B6_DXVAHD_FRAME_FORMAT_INTERLACED_TOP_FIELD_FIRST                   1
+#define NVB0B6_DXVAHD_FRAME_FORMAT_INTERLACED_BOTTOM_FIELD_FIRST                2
+#define NVB0B6_DXVAHD_FRAME_FORMAT_TOP_FIELD                                    3
+#define NVB0B6_DXVAHD_FRAME_FORMAT_BOTTOM_FIELD                                 4
+#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_PROGRESSIVE                           5
+#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_INTERLACED_TOP_FIELD_FIRST            6
+#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_INTERLACED_BOTTOM_FIELD_FIRST         7
+#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_TOP_FIELD                             8
+#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_BOTTOM_FIELD                          9
+#define NVB0B6_DXVAHD_FRAME_FORMAT_TOP_FIELD_CHROMA_BOTTOM                      10
+#define NVB0B6_DXVAHD_FRAME_FORMAT_BOTTOM_FIELD_CHROMA_TOP                      11
+#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_TOP_FIELD_CHROMA_BOTTOM               12
+#define NVB0B6_DXVAHD_FRAME_FORMAT_SUBPIC_BOTTOM_FIELD_CHROMA_TOP               13
+
+#define NVB0B6_T_A8                                                             0
+#define NVB0B6_T_L8                                                             1
+#define NVB0B6_T_A4L4                                                           2
+#define NVB0B6_T_L4A4                                                           3
+#define NVB0B6_T_R8                                                             4
+#define NVB0B6_T_A8L8                                                           5
+#define NVB0B6_T_L8A8                                                           6
+#define NVB0B6_T_R8G8                                                           7
+#define NVB0B6_T_G8R8                                                           8
+#define NVB0B6_T_B5G6R5                                                         9
+#define NVB0B6_T_R5G6B5                                                         10
+#define NVB0B6_T_B6G5R5                                                         11
+#define NVB0B6_T_R5G5B6                                                         12
+#define NVB0B6_T_A1B5G5R5                                                       13
+#define NVB0B6_T_A1R5G5B5                                                       14
+#define NVB0B6_T_B5G5R5A1                                                       15
+#define NVB0B6_T_R5G5B5A1                                                       16
+#define NVB0B6_T_A5B5G5R1                                                       17
+#define NVB0B6_T_A5R1G5B5                                                       18
+#define NVB0B6_T_B5G5R1A5                                                       19
+#define NVB0B6_T_R1G5B5A5                                                       20
+#define NVB0B6_T_X1B5G5R5                                                       21
+#define NVB0B6_T_X1R5G5B5                                                       22
+#define NVB0B6_T_B5G5R5X1                                                       23
+#define NVB0B6_T_R5G5B5X1                                                       24
+#define NVB0B6_T_A4B4G4R4                                                       25
+#define NVB0B6_T_A4R4G4B4                                                       26
+#define NVB0B6_T_B4G4R4A4                                                       27
+#define NVB0B6_T_R4G4B4A4                                                       28
+#define NVB0B6_T_B8_G8_R8                                                       29
+#define NVB0B6_T_R8_G8_B8                                                       30
+#define NVB0B6_T_A8B8G8R8                                                       31
+#define NVB0B6_T_A8R8G8B8                                                       32
+#define NVB0B6_T_B8G8R8A8                                                       33
+#define NVB0B6_T_R8G8B8A8                                                       34
+#define NVB0B6_T_X8B8G8R8                                                       35
+#define NVB0B6_T_X8R8G8B8                                                       36
+#define NVB0B6_T_B8G8R8X8                                                       37
+#define NVB0B6_T_R8G8B8X8                                                       38
+#define NVB0B6_T_A2B10G10R10                                                    39
+#define NVB0B6_T_A2R10G10B10                                                    40
+#define NVB0B6_T_B10G10R10A2                                                    41
+#define NVB0B6_T_R10G10B10A2                                                    42
+#define NVB0B6_T_A4P4                                                           43
+#define NVB0B6_T_P4A4                                                           44
+#define NVB0B6_T_P8A8                                                           45
+#define NVB0B6_T_A8P8                                                           46
+#define NVB0B6_T_P8                                                             47
+#define NVB0B6_T_P1                                                             48
+#define NVB0B6_T_U8V8                                                           49
+#define NVB0B6_T_V8U8                                                           50
+#define NVB0B6_T_A8Y8U8V8                                                       51
+#define NVB0B6_T_V8U8Y8A8                                                       52
+#define NVB0B6_T_Y8_U8_V8                                                       53
+#define NVB0B6_T_Y8_V8_U8                                                       54
+#define NVB0B6_T_U8_V8_Y8                                                       55
+#define NVB0B6_T_V8_U8_Y8                                                       56
+#define NVB0B6_T_Y8_U8__Y8_V8                                                   57
+#define NVB0B6_T_Y8_V8__Y8_U8                                                   58
+#define NVB0B6_T_U8_Y8__V8_Y8                                                   59
+#define NVB0B6_T_V8_Y8__U8_Y8                                                   60
+#define NVB0B6_T_Y8___U8V8_N444                                                 61
+#define NVB0B6_T_Y8___V8U8_N444                                                 62
+#define NVB0B6_T_Y8___U8V8_N422                                                 63
+#define NVB0B6_T_Y8___V8U8_N422                                                 64
+#define NVB0B6_T_Y8___U8V8_N422R                                                65
+#define NVB0B6_T_Y8___V8U8_N422R                                                66
+#define NVB0B6_T_Y8___U8V8_N420                                                 67
+#define NVB0B6_T_Y8___V8U8_N420                                                 68
+#define NVB0B6_T_Y8___U8___V8_N444                                              69
+#define NVB0B6_T_Y8___U8___V8_N422                                              70
+#define NVB0B6_T_Y8___U8___V8_N422R                                             71
+#define NVB0B6_T_Y8___U8___V8_N420                                              72
+#define NVB0B6_T_U8                                                             73
+#define NVB0B6_T_V8                                                             74
+
+#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_OPAQUE                                    0
+#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_BACKGROUND                                1
+#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_DESTINATION                               2
+#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_SOURCE_STREAM                             3
+#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_COMPOSITED                                4
+#define NVB0B6_DXVAHD_ALPHA_FILL_MODE_SOURCE_ALPHA                              5
+
+#define NVB0B6_BLK_KIND_PITCH                                                   0
+#define NVB0B6_BLK_KIND_GENERIC_16Bx2                                           1
+#define NVB0B6_BLK_KIND_BL_NAIVE                                                2
+#define NVB0B6_BLK_KIND_BL_KEPLER_XBAR_RAW                                      3
+#define NVB0B6_BLK_KIND_VP2_TILED                                               15
+
+#define NVB0B6_FILTER_LENGTH_1TAP                                               0
+#define NVB0B6_FILTER_LENGTH_2TAP                                               1
+#define NVB0B6_FILTER_LENGTH_5TAP                                               2
+#define NVB0B6_FILTER_LENGTH_10TAP                                              3
+
+#define NVB0B6_FILTER_TYPE_NORMAL                                               0
+#define NVB0B6_FILTER_TYPE_NOISE                                                1
+#define NVB0B6_FILTER_TYPE_DETAIL                                               2
+
+#ifdef __cplusplus
+};     /* extern "C" */
+#endif
+#endif /* AVUTIL_CLB0B6_H */
diff --git a/libavutil/clc5b0.h b/libavutil/clc5b0.h
new file mode 100644
index 0000000000..f7957bf46a
--- /dev/null
+++ b/libavutil/clc5b0.h
@@ -0,0 +1,436 @@
+/*******************************************************************************
+    Copyright (c) 1993-2020, NVIDIA CORPORATION. All rights reserved.
+
+    Permission is hereby granted, free of charge, to any person obtaining a
+    copy of this software and associated documentation files (the "Software"),
+    to deal in the Software without restriction, including without limitation
+    the rights to use, copy, modify, merge, publish, distribute, sublicense,
+    and/or sell copies of the Software, and to permit persons to whom the
+    Software is furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in
+    all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+#ifndef AVUTIL_CLC5B0_H
+#define AVUTIL_CLC5B0_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NVC5B0_VIDEO_DECODER                                                    (0x0000C5B0)
+
+#define NVC5B0_NOP                                                              (0x00000100)
+#define NVC5B0_NOP_PARAMETER                                                    31:0
+#define NVC5B0_SET_APPLICATION_ID                                               (0x00000200)
+#define NVC5B0_SET_APPLICATION_ID_ID                                            31:0
+#define NVC5B0_SET_APPLICATION_ID_ID_MPEG12                                     (0x00000001)
+#define NVC5B0_SET_APPLICATION_ID_ID_VC1                                        (0x00000002)
+#define NVC5B0_SET_APPLICATION_ID_ID_H264                                       (0x00000003)
+#define NVC5B0_SET_APPLICATION_ID_ID_MPEG4                                      (0x00000004)
+#define NVC5B0_SET_APPLICATION_ID_ID_VP8                                        (0x00000005)
+#define NVC5B0_SET_APPLICATION_ID_ID_HEVC                                       (0x00000007)
+#define NVC5B0_SET_APPLICATION_ID_ID_VP9                                        (0x00000009)
+#define NVC5B0_SET_APPLICATION_ID_ID_HEVC_PARSER                                (0x0000000C)
+#define NVC5B0_SET_WATCHDOG_TIMER                                               (0x00000204)
+#define NVC5B0_SET_WATCHDOG_TIMER_TIMER                                         31:0
+#define NVC5B0_SEMAPHORE_A                                                      (0x00000240)
+#define NVC5B0_SEMAPHORE_A_UPPER                                                7:0
+#define NVC5B0_SEMAPHORE_B                                                      (0x00000244)
+#define NVC5B0_SEMAPHORE_B_LOWER                                                31:0
+#define NVC5B0_SEMAPHORE_C                                                      (0x00000248)
+#define NVC5B0_SEMAPHORE_C_PAYLOAD                                              31:0
+#define NVC5B0_CTX_SAVE_AREA                                                    (0x0000024C)
+#define NVC5B0_CTX_SAVE_AREA_OFFSET                                             31:0
+#define NVC5B0_CTX_SWITCH                                                       (0x00000250)
+#define NVC5B0_CTX_SWITCH_OP                                                    1:0
+#define NVC5B0_CTX_SWITCH_OP_CTX_UPDATE                                         (0x00000000)
+#define NVC5B0_CTX_SWITCH_OP_CTX_SAVE                                           (0x00000001)
+#define NVC5B0_CTX_SWITCH_OP_CTX_RESTORE                                        (0x00000002)
+#define NVC5B0_CTX_SWITCH_OP_CTX_FORCERESTORE                                   (0x00000003)
+#define NVC5B0_CTX_SWITCH_CTXID_VALID                                           2:2
+#define NVC5B0_CTX_SWITCH_CTXID_VALID_FALSE                                     (0x00000000)
+#define NVC5B0_CTX_SWITCH_CTXID_VALID_TRUE                                      (0x00000001)
+#define NVC5B0_CTX_SWITCH_RESERVED0                                             7:3
+#define NVC5B0_CTX_SWITCH_CTX_ID                                                23:8
+#define NVC5B0_CTX_SWITCH_RESERVED1                                             31:24
+#define NVC5B0_EXECUTE                                                          (0x00000300)
+#define NVC5B0_EXECUTE_NOTIFY                                                   0:0
+#define NVC5B0_EXECUTE_NOTIFY_DISABLE                                           (0x00000000)
+#define NVC5B0_EXECUTE_NOTIFY_ENABLE                                            (0x00000001)
+#define NVC5B0_EXECUTE_NOTIFY_ON                                                1:1
+#define NVC5B0_EXECUTE_NOTIFY_ON_END                                            (0x00000000)
+#define NVC5B0_EXECUTE_NOTIFY_ON_BEGIN                                          (0x00000001)
+#define NVC5B0_EXECUTE_AWAKEN                                                   8:8
+#define NVC5B0_EXECUTE_AWAKEN_DISABLE                                           (0x00000000)
+#define NVC5B0_EXECUTE_AWAKEN_ENABLE                                            (0x00000001)
+#define NVC5B0_SEMAPHORE_D                                                      (0x00000304)
+#define NVC5B0_SEMAPHORE_D_STRUCTURE_SIZE                                       0:0
+#define NVC5B0_SEMAPHORE_D_STRUCTURE_SIZE_ONE                                   (0x00000000)
+#define NVC5B0_SEMAPHORE_D_STRUCTURE_SIZE_FOUR                                  (0x00000001)
+#define NVC5B0_SEMAPHORE_D_AWAKEN_ENABLE                                        8:8
+#define NVC5B0_SEMAPHORE_D_AWAKEN_ENABLE_FALSE                                  (0x00000000)
+#define NVC5B0_SEMAPHORE_D_AWAKEN_ENABLE_TRUE                                   (0x00000001)
+#define NVC5B0_SEMAPHORE_D_OPERATION                                            17:16
+#define NVC5B0_SEMAPHORE_D_OPERATION_RELEASE                                    (0x00000000)
+#define NVC5B0_SEMAPHORE_D_OPERATION_RESERVED0                                  (0x00000001)
+#define NVC5B0_SEMAPHORE_D_OPERATION_RESERVED1                                  (0x00000002)
+#define NVC5B0_SEMAPHORE_D_OPERATION_TRAP                                       (0x00000003)
+#define NVC5B0_SEMAPHORE_D_FLUSH_DISABLE                                        21:21
+#define NVC5B0_SEMAPHORE_D_FLUSH_DISABLE_FALSE                                  (0x00000000)
+#define NVC5B0_SEMAPHORE_D_FLUSH_DISABLE_TRUE                                   (0x00000001)
+#define NVC5B0_SET_CONTROL_PARAMS                                               (0x00000400)
+#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE                                    3:0
+#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_MPEG1                              (0x00000000)
+#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_MPEG2                              (0x00000001)
+#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_VC1                                (0x00000002)
+#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_H264                               (0x00000003)
+#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_MPEG4                              (0x00000004)
+#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_DIVX3                              (0x00000004)
+#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_VP8                                (0x00000005)
+#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_HEVC                               (0x00000007)
+#define NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_VP9                                (0x00000009)
+#define NVC5B0_SET_CONTROL_PARAMS_GPTIMER_ON                                    4:4
+#define NVC5B0_SET_CONTROL_PARAMS_RET_ERROR                                     5:5
+#define NVC5B0_SET_CONTROL_PARAMS_ERR_CONCEAL_ON                                6:6
+#define NVC5B0_SET_CONTROL_PARAMS_ERROR_FRM_IDX                                 12:7
+#define NVC5B0_SET_CONTROL_PARAMS_MBTIMER_ON                                    13:13
+#define NVC5B0_SET_CONTROL_PARAMS_EC_INTRA_FRAME_USING_PSLC                     14:14
+#define NVC5B0_SET_CONTROL_PARAMS_ALL_INTRA_FRAME                               17:17
+#define NVC5B0_SET_CONTROL_PARAMS_RESERVED                                      31:18
+#define NVC5B0_SET_DRV_PIC_SETUP_OFFSET                                         (0x00000404)
+#define NVC5B0_SET_DRV_PIC_SETUP_OFFSET_OFFSET                                  31:0
+#define NVC5B0_SET_IN_BUF_BASE_OFFSET                                           (0x00000408)
+#define NVC5B0_SET_IN_BUF_BASE_OFFSET_OFFSET                                    31:0
+#define NVC5B0_SET_PICTURE_INDEX                                                (0x0000040C)
+#define NVC5B0_SET_PICTURE_INDEX_INDEX                                          31:0
+#define NVC5B0_SET_SLICE_OFFSETS_BUF_OFFSET                                     (0x00000410)
+#define NVC5B0_SET_SLICE_OFFSETS_BUF_OFFSET_OFFSET                              31:0
+#define NVC5B0_SET_COLOC_DATA_OFFSET                                            (0x00000414)
+#define NVC5B0_SET_COLOC_DATA_OFFSET_OFFSET                                     31:0
+#define NVC5B0_SET_HISTORY_OFFSET                                               (0x00000418)
+#define NVC5B0_SET_HISTORY_OFFSET_OFFSET                                        31:0
+#define NVC5B0_SET_DISPLAY_BUF_SIZE                                             (0x0000041C)
+#define NVC5B0_SET_DISPLAY_BUF_SIZE_SIZE                                        31:0
+#define NVC5B0_SET_HISTOGRAM_OFFSET                                             (0x00000420)
+#define NVC5B0_SET_HISTOGRAM_OFFSET_OFFSET                                      31:0
+#define NVC5B0_SET_NVDEC_STATUS_OFFSET                                          (0x00000424)
+#define NVC5B0_SET_NVDEC_STATUS_OFFSET_OFFSET                                   31:0
+#define NVC5B0_SET_DISPLAY_BUF_LUMA_OFFSET                                      (0x00000428)
+#define NVC5B0_SET_DISPLAY_BUF_LUMA_OFFSET_OFFSET                               31:0
+#define NVC5B0_SET_DISPLAY_BUF_CHROMA_OFFSET                                    (0x0000042C)
+#define NVC5B0_SET_DISPLAY_BUF_CHROMA_OFFSET_OFFSET                             31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET0                                         (0x00000430)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET0_OFFSET                                  31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET1                                         (0x00000434)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET1_OFFSET                                  31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET2                                         (0x00000438)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET2_OFFSET                                  31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET3                                         (0x0000043C)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET3_OFFSET                                  31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET4                                         (0x00000440)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET4_OFFSET                                  31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET5                                         (0x00000444)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET5_OFFSET                                  31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET6                                         (0x00000448)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET6_OFFSET                                  31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET7                                         (0x0000044C)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET7_OFFSET                                  31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET8                                         (0x00000450)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET8_OFFSET                                  31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET9                                         (0x00000454)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET9_OFFSET                                  31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET10                                        (0x00000458)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET10_OFFSET                                 31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET11                                        (0x0000045C)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET11_OFFSET                                 31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET12                                        (0x00000460)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET12_OFFSET                                 31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET13                                        (0x00000464)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET13_OFFSET                                 31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET14                                        (0x00000468)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET14_OFFSET                                 31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET15                                        (0x0000046C)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET15_OFFSET                                 31:0
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET16                                        (0x00000470)
+#define NVC5B0_SET_PICTURE_LUMA_OFFSET16_OFFSET                                 31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET0                                       (0x00000474)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET0_OFFSET                                31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET1                                       (0x00000478)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET1_OFFSET                                31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET2                                       (0x0000047C)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET2_OFFSET                                31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET3                                       (0x00000480)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET3_OFFSET                                31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET4                                       (0x00000484)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET4_OFFSET                                31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET5                                       (0x00000488)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET5_OFFSET                                31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET6                                       (0x0000048C)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET6_OFFSET                                31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET7                                       (0x00000490)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET7_OFFSET                                31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET8                                       (0x00000494)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET8_OFFSET                                31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET9                                       (0x00000498)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET9_OFFSET                                31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET10                                      (0x0000049C)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET10_OFFSET                               31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET11                                      (0x000004A0)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET11_OFFSET                               31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET12                                      (0x000004A4)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET12_OFFSET                               31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET13                                      (0x000004A8)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET13_OFFSET                               31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET14                                      (0x000004AC)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET14_OFFSET                               31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET15                                      (0x000004B0)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET15_OFFSET                               31:0
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET16                                      (0x000004B4)
+#define NVC5B0_SET_PICTURE_CHROMA_OFFSET16_OFFSET                               31:0
+#define NVC5B0_SET_PIC_SCRATCH_BUF_OFFSET                                       (0x000004B8)
+#define NVC5B0_SET_PIC_SCRATCH_BUF_OFFSET_OFFSET                                31:0
+#define NVC5B0_SET_EXTERNAL_MVBUFFER_OFFSET                                     (0x000004BC)
+#define NVC5B0_SET_EXTERNAL_MVBUFFER_OFFSET_OFFSET                              31:0
+#define NVC5B0_H264_SET_MBHIST_BUF_OFFSET                                       (0x00000500)
+#define NVC5B0_H264_SET_MBHIST_BUF_OFFSET_OFFSET                                31:0
+#define NVC5B0_VP8_SET_PROB_DATA_OFFSET                                         (0x00000540)
+#define NVC5B0_VP8_SET_PROB_DATA_OFFSET_OFFSET                                  31:0
+#define NVC5B0_VP8_SET_HEADER_PARTITION_BUF_BASE_OFFSET                         (0x00000544)
+#define NVC5B0_VP8_SET_HEADER_PARTITION_BUF_BASE_OFFSET_OFFSET                  31:0
+#define NVC5B0_HEVC_SET_SCALING_LIST_OFFSET                                     (0x00000580)
+#define NVC5B0_HEVC_SET_SCALING_LIST_OFFSET_OFFSET                              31:0
+#define NVC5B0_HEVC_SET_TILE_SIZES_OFFSET                                       (0x00000584)
+#define NVC5B0_HEVC_SET_TILE_SIZES_OFFSET_OFFSET                                31:0
+#define NVC5B0_HEVC_SET_FILTER_BUFFER_OFFSET                                    (0x00000588)
+#define NVC5B0_HEVC_SET_FILTER_BUFFER_OFFSET_OFFSET                             31:0
+#define NVC5B0_HEVC_SET_SAO_BUFFER_OFFSET                                       (0x0000058C)
+#define NVC5B0_HEVC_SET_SAO_BUFFER_OFFSET_OFFSET                                31:0
+#define NVC5B0_HEVC_SET_SLICE_INFO_BUFFER_OFFSET                                (0x00000590)
+#define NVC5B0_HEVC_SET_SLICE_INFO_BUFFER_OFFSET_OFFSET                         31:0
+#define NVC5B0_HEVC_SET_SLICE_GROUP_INDEX                                       (0x00000594)
+#define NVC5B0_HEVC_SET_SLICE_GROUP_INDEX_OFFSET                                31:0
+#define NVC5B0_VP9_SET_PROB_TAB_BUF_OFFSET                                      (0x000005C0)
+#define NVC5B0_VP9_SET_PROB_TAB_BUF_OFFSET_OFFSET                               31:0
+#define NVC5B0_VP9_SET_CTX_COUNTER_BUF_OFFSET                                   (0x000005C4)
+#define NVC5B0_VP9_SET_CTX_COUNTER_BUF_OFFSET_OFFSET                            31:0
+#define NVC5B0_VP9_SET_SEGMENT_READ_BUF_OFFSET                                  (0x000005C8)
+#define NVC5B0_VP9_SET_SEGMENT_READ_BUF_OFFSET_OFFSET                           31:0
+#define NVC5B0_VP9_SET_SEGMENT_WRITE_BUF_OFFSET                                 (0x000005CC)
+#define NVC5B0_VP9_SET_SEGMENT_WRITE_BUF_OFFSET_OFFSET                          31:0
+#define NVC5B0_VP9_SET_TILE_SIZE_BUF_OFFSET                                     (0x000005D0)
+#define NVC5B0_VP9_SET_TILE_SIZE_BUF_OFFSET_OFFSET                              31:0
+#define NVC5B0_VP9_SET_COL_MVWRITE_BUF_OFFSET                                   (0x000005D4)
+#define NVC5B0_VP9_SET_COL_MVWRITE_BUF_OFFSET_OFFSET                            31:0
+#define NVC5B0_VP9_SET_COL_MVREAD_BUF_OFFSET                                    (0x000005D8)
+#define NVC5B0_VP9_SET_COL_MVREAD_BUF_OFFSET_OFFSET                             31:0
+#define NVC5B0_VP9_SET_FILTER_BUFFER_OFFSET                                     (0x000005DC)
+#define NVC5B0_VP9_SET_FILTER_BUFFER_OFFSET_OFFSET                              31:0
+
+#define NVC5B0_ERROR_NONE                                                       (0x00000000)
+#define NVC5B0_OS_ERROR_EXECUTE_INSUFFICIENT_DATA                               (0x00000001)
+#define NVC5B0_OS_ERROR_SEMAPHORE_INSUFFICIENT_DATA                             (0x00000002)
+#define NVC5B0_OS_ERROR_INVALID_METHOD                                          (0x00000003)
+#define NVC5B0_OS_ERROR_INVALID_DMA_PAGE                                        (0x00000004)
+#define NVC5B0_OS_ERROR_UNHANDLED_INTERRUPT                                     (0x00000005)
+#define NVC5B0_OS_ERROR_EXCEPTION                                               (0x00000006)
+#define NVC5B0_OS_ERROR_INVALID_CTXSW_REQUEST                                   (0x00000007)
+#define NVC5B0_OS_ERROR_APPLICATION                                             (0x00000008)
+#define NVC5B0_OS_ERROR_SW_BREAKPT                                              (0x00000009)
+#define NVC5B0_OS_INTERRUPT_EXECUTE_AWAKEN                                      (0x00000100)
+#define NVC5B0_OS_INTERRUPT_BACKEND_SEMAPHORE_AWAKEN                            (0x00000200)
+#define NVC5B0_OS_INTERRUPT_CTX_ERROR_FBIF                                      (0x00000300)
+#define NVC5B0_OS_INTERRUPT_LIMIT_VIOLATION                                     (0x00000400)
+#define NVC5B0_OS_INTERRUPT_LIMIT_AND_FBIF_CTX_ERROR                            (0x00000500)
+#define NVC5B0_OS_INTERRUPT_HALT_ENGINE                                         (0x00000600)
+#define NVC5B0_OS_INTERRUPT_TRAP_NONSTALL                                       (0x00000700)
+#define NVC5B0_H264_VLD_ERR_SEQ_DATA_INCONSISTENT                               (0x00004001)
+#define NVC5B0_H264_VLD_ERR_PIC_DATA_INCONSISTENT                               (0x00004002)
+#define NVC5B0_H264_VLD_ERR_SLC_DATA_BUF_ADDR_OUT_OF_BOUNDS                     (0x00004100)
+#define NVC5B0_H264_VLD_ERR_BITSTREAM_ERROR                                     (0x00004101)
+#define NVC5B0_H264_VLD_ERR_CTX_DMA_ID_CTRL_IN_INVALID                          (0x000041F8)
+#define NVC5B0_H264_VLD_ERR_SLC_HDR_OUT_SIZE_NOT_MULT256                        (0x00004200)
+#define NVC5B0_H264_VLD_ERR_SLC_DATA_OUT_SIZE_NOT_MULT256                       (0x00004201)
+#define NVC5B0_H264_VLD_ERR_CTX_DMA_ID_FLOW_CTRL_INVALID                        (0x00004203)
+#define NVC5B0_H264_VLD_ERR_CTX_DMA_ID_SLC_HDR_OUT_INVALID                      (0x00004204)
+#define NVC5B0_H264_VLD_ERR_SLC_HDR_OUT_BUF_TOO_SMALL                           (0x00004205)
+#define NVC5B0_H264_VLD_ERR_SLC_HDR_OUT_BUF_ALREADY_VALID                       (0x00004206)
+#define NVC5B0_H264_VLD_ERR_SLC_DATA_OUT_BUF_TOO_SMALL                          (0x00004207)
+#define NVC5B0_H264_VLD_ERR_DATA_BUF_CNT_TOO_SMALL                              (0x00004208)
+#define NVC5B0_H264_VLD_ERR_BITSTREAM_EMPTY                                     (0x00004209)
+#define NVC5B0_H264_VLD_ERR_FRAME_WIDTH_TOO_LARGE                               (0x0000420A)
+#define NVC5B0_H264_VLD_ERR_FRAME_HEIGHT_TOO_LARGE                              (0x0000420B)
+#define NVC5B0_H264_VLD_ERR_HIST_BUF_TOO_SMALL                                  (0x00004300)
+#define NVC5B0_VC1_VLD_ERR_PIC_DATA_BUF_ADDR_OUT_OF_BOUND                       (0x00005100)
+#define NVC5B0_VC1_VLD_ERR_BITSTREAM_ERROR                                      (0x00005101)
+#define NVC5B0_VC1_VLD_ERR_PIC_HDR_OUT_SIZE_NOT_MULT256                         (0x00005200)
+#define NVC5B0_VC1_VLD_ERR_PIC_DATA_OUT_SIZE_NOT_MULT256                        (0x00005201)
+#define NVC5B0_VC1_VLD_ERR_CTX_DMA_ID_CTRL_IN_INVALID                           (0x00005202)
+#define NVC5B0_VC1_VLD_ERR_CTX_DMA_ID_FLOW_CTRL_INVALID                         (0x00005203)
+#define NVC5B0_VC1_VLD_ERR_CTX_DMA_ID_PIC_HDR_OUT_INVALID                       (0x00005204)
+#define NVC5B0_VC1_VLD_ERR_SLC_HDR_OUT_BUF_TOO_SMALL                            (0x00005205)
+#define NVC5B0_VC1_VLD_ERR_PIC_HDR_OUT_BUF_ALREADY_VALID                        (0x00005206)
+#define NVC5B0_VC1_VLD_ERR_PIC_DATA_OUT_BUF_TOO_SMALL                           (0x00005207)
+#define NVC5B0_VC1_VLD_ERR_DATA_INFO_IN_BUF_TOO_SMALL                           (0x00005208)
+#define NVC5B0_VC1_VLD_ERR_BITSTREAM_EMPTY                                      (0x00005209)
+#define NVC5B0_VC1_VLD_ERR_FRAME_WIDTH_TOO_LARGE                                (0x0000520A)
+#define NVC5B0_VC1_VLD_ERR_FRAME_HEIGHT_TOO_LARGE                               (0x0000520B)
+#define NVC5B0_VC1_VLD_ERR_PIC_DATA_OUT_BUF_FULL_TIME_OUT                       (0x00005300)
+#define NVC5B0_MPEG12_VLD_ERR_SLC_DATA_BUF_ADDR_OUT_OF_BOUNDS                   (0x00006100)
+#define NVC5B0_MPEG12_VLD_ERR_BITSTREAM_ERROR                                   (0x00006101)
+#define NVC5B0_MPEG12_VLD_ERR_SLC_DATA_OUT_SIZE_NOT_MULT256                     (0x00006200)
+#define NVC5B0_MPEG12_VLD_ERR_CTX_DMA_ID_CTRL_IN_INVALID                        (0x00006201)
+#define NVC5B0_MPEG12_VLD_ERR_CTX_DMA_ID_FLOW_CTRL_INVALID                      (0x00006202)
+#define NVC5B0_MPEG12_VLD_ERR_SLC_DATA_OUT_BUF_TOO_SMALL                        (0x00006203)
+#define NVC5B0_MPEG12_VLD_ERR_DATA_INFO_IN_BUF_TOO_SMALL                        (0x00006204)
+#define NVC5B0_MPEG12_VLD_ERR_BITSTREAM_EMPTY                                   (0x00006205)
+#define NVC5B0_MPEG12_VLD_ERR_INVALID_PIC_STRUCTURE                             (0x00006206)
+#define NVC5B0_MPEG12_VLD_ERR_INVALID_PIC_CODING_TYPE                           (0x00006207)
+#define NVC5B0_MPEG12_VLD_ERR_FRAME_WIDTH_TOO_LARGE                             (0x00006208)
+#define NVC5B0_MPEG12_VLD_ERR_FRAME_HEIGHT_TOO_LARGE                            (0x00006209)
+#define NVC5B0_MPEG12_VLD_ERR_SLC_DATA_OUT_BUF_FULL_TIME_OUT                    (0x00006300)
+#define NVC5B0_CMN_VLD_ERR_PDEC_RETURNED_ERROR                                  (0x00007101)
+#define NVC5B0_CMN_VLD_ERR_EDOB_FLUSH_TIME_OUT                                  (0x00007102)
+#define NVC5B0_CMN_VLD_ERR_EDOB_REWIND_TIME_OUT                                 (0x00007103)
+#define NVC5B0_CMN_VLD_ERR_VLD_WD_TIME_OUT                                      (0x00007104)
+#define NVC5B0_CMN_VLD_ERR_NUM_SLICES_ZERO                                      (0x00007105)
+#define NVC5B0_MPEG4_VLD_ERR_PIC_DATA_BUF_ADDR_OUT_OF_BOUND                     (0x00008100)
+#define NVC5B0_MPEG4_VLD_ERR_BITSTREAM_ERROR                                    (0x00008101)
+#define NVC5B0_MPEG4_VLD_ERR_PIC_HDR_OUT_SIZE_NOT_MULT256                       (0x00008200)
+#define NVC5B0_MPEG4_VLD_ERR_PIC_DATA_OUT_SIZE_NOT_MULT256                      (0x00008201)
+#define NVC5B0_MPEG4_VLD_ERR_CTX_DMA_ID_CTRL_IN_INVALID                         (0x00008202)
+#define NVC5B0_MPEG4_VLD_ERR_CTX_DMA_ID_FLOW_CTRL_INVALID                       (0x00008203)
+#define NVC5B0_MPEG4_VLD_ERR_CTX_DMA_ID_PIC_HDR_OUT_INVALID                     (0x00008204)
+#define NVC5B0_MPEG4_VLD_ERR_SLC_HDR_OUT_BUF_TOO_SMALL                          (0x00008205)
+#define NVC5B0_MPEG4_VLD_ERR_PIC_HDR_OUT_BUF_ALREADY_VALID                      (0x00008206)
+#define NVC5B0_MPEG4_VLD_ERR_PIC_DATA_OUT_BUF_TOO_SMALL                         (0x00008207)
+#define NVC5B0_MPEG4_VLD_ERR_DATA_INFO_IN_BUF_TOO_SMALL                         (0x00008208)
+#define NVC5B0_MPEG4_VLD_ERR_BITSTREAM_EMPTY                                    (0x00008209)
+#define NVC5B0_MPEG4_VLD_ERR_FRAME_WIDTH_TOO_LARGE                              (0x0000820A)
+#define NVC5B0_MPEG4_VLD_ERR_FRAME_HEIGHT_TOO_LARGE                             (0x0000820B)
+#define NVC5B0_MPEG4_VLD_ERR_PIC_DATA_OUT_BUF_FULL_TIME_OUT                     (0x00051E01)
+#define NVC5B0_DEC_ERROR_MPEG12_APPTIMER_EXPIRED                                (0xDEC10001)
+#define NVC5B0_DEC_ERROR_MPEG12_MVTIMER_EXPIRED                                 (0xDEC10002)
+#define NVC5B0_DEC_ERROR_MPEG12_INVALID_TOKEN                                   (0xDEC10003)
+#define NVC5B0_DEC_ERROR_MPEG12_SLICEDATA_MISSING                               (0xDEC10004)
+#define NVC5B0_DEC_ERROR_MPEG12_HWERR_INTERRUPT                                 (0xDEC10005)
+#define NVC5B0_DEC_ERROR_MPEG12_DETECTED_VLD_FAILURE                            (0xDEC10006)
+#define NVC5B0_DEC_ERROR_MPEG12_PICTURE_INIT                                    (0xDEC10100)
+#define NVC5B0_DEC_ERROR_MPEG12_STATEMACHINE_FAILURE                            (0xDEC10101)
+#define NVC5B0_DEC_ERROR_MPEG12_INVALID_CTXID_PIC                               (0xDEC10901)
+#define NVC5B0_DEC_ERROR_MPEG12_INVALID_CTXID_UCODE                             (0xDEC10902)
+#define NVC5B0_DEC_ERROR_MPEG12_INVALID_CTXID_FC                                (0xDEC10903)
+#define NVC5B0_DEC_ERROR_MPEG12_INVALID_CTXID_SLH                               (0xDEC10904)
+#define NVC5B0_DEC_ERROR_MPEG12_INVALID_UCODE_SIZE                              (0xDEC10905)
+#define NVC5B0_DEC_ERROR_MPEG12_INVALID_SLICE_COUNT                             (0xDEC10906)
+#define NVC5B0_DEC_ERROR_VC1_APPTIMER_EXPIRED                                   (0xDEC20001)
+#define NVC5B0_DEC_ERROR_VC1_MVTIMER_EXPIRED                                    (0xDEC20002)
+#define NVC5B0_DEC_ERROR_VC1_INVALID_TOKEN                                      (0xDEC20003)
+#define NVC5B0_DEC_ERROR_VC1_SLICEDATA_MISSING                                  (0xDEC20004)
+#define NVC5B0_DEC_ERROR_VC1_HWERR_INTERRUPT                                    (0xDEC20005)
+#define NVC5B0_DEC_ERROR_VC1_DETECTED_VLD_FAILURE                               (0xDEC20006)
+#define NVC5B0_DEC_ERROR_VC1_TIMEOUT_POLLING_FOR_DATA                           (0xDEC20007)
+#define NVC5B0_DEC_ERROR_VC1_PDEC_PIC_END_UNALIGNED                             (0xDEC20008)
+#define NVC5B0_DEC_ERROR_VC1_WDTIMER_EXPIRED                                    (0xDEC20009)
+#define NVC5B0_DEC_ERROR_VC1_ERRINTSTART                                        (0xDEC20010)
+#define NVC5B0_DEC_ERROR_VC1_IQT_ERRINT                                         (0xDEC20011)
+#define NVC5B0_DEC_ERROR_VC1_MC_ERRINT                                          (0xDEC20012)
+#define NVC5B0_DEC_ERROR_VC1_MC_IQT_ERRINT                                      (0xDEC20013)
+#define NVC5B0_DEC_ERROR_VC1_REC_ERRINT                                         (0xDEC20014)
+#define NVC5B0_DEC_ERROR_VC1_REC_IQT_ERRINT                                     (0xDEC20015)
+#define NVC5B0_DEC_ERROR_VC1_REC_MC_ERRINT                                      (0xDEC20016)
+#define NVC5B0_DEC_ERROR_VC1_REC_MC_IQT_ERRINT                                  (0xDEC20017)
+#define NVC5B0_DEC_ERROR_VC1_DBF_ERRINT                                         (0xDEC20018)
+#define NVC5B0_DEC_ERROR_VC1_DBF_IQT_ERRINT                                     (0xDEC20019)
+#define NVC5B0_DEC_ERROR_VC1_DBF_MC_ERRINT                                      (0xDEC2001A)
+#define NVC5B0_DEC_ERROR_VC1_DBF_MC_IQT_ERRINT                                  (0xDEC2001B)
+#define NVC5B0_DEC_ERROR_VC1_DBF_REC_ERRINT                                     (0xDEC2001C)
+#define NVC5B0_DEC_ERROR_VC1_DBF_REC_IQT_ERRINT                                 (0xDEC2001D)
+#define NVC5B0_DEC_ERROR_VC1_DBF_REC_MC_ERRINT                                  (0xDEC2001E)
+#define NVC5B0_DEC_ERROR_VC1_DBF_REC_MC_IQT_ERRINT                              (0xDEC2001F)
+#define NVC5B0_DEC_ERROR_VC1_PICTURE_INIT                                       (0xDEC20100)
+#define NVC5B0_DEC_ERROR_VC1_STATEMACHINE_FAILURE                               (0xDEC20101)
+#define NVC5B0_DEC_ERROR_VC1_INVALID_CTXID_PIC                                  (0xDEC20901)
+#define NVC5B0_DEC_ERROR_VC1_INVALID_CTXID_UCODE                                (0xDEC20902)
+#define NVC5B0_DEC_ERROR_VC1_INVALID_CTXID_FC                                   (0xDEC20903)
+#define NVC5B0_DEC_ERROR_VC1_INVAILD_CTXID_SLH                                  (0xDEC20904)
+#define NVC5B0_DEC_ERROR_VC1_INVALID_UCODE_SIZE                                 (0xDEC20905)
+#define NVC5B0_DEC_ERROR_VC1_INVALID_SLICE_COUNT                                (0xDEC20906)
+#define NVC5B0_DEC_ERROR_H264_APPTIMER_EXPIRED                                  (0xDEC30001)
+#define NVC5B0_DEC_ERROR_H264_MVTIMER_EXPIRED                                   (0xDEC30002)
+#define NVC5B0_DEC_ERROR_H264_INVALID_TOKEN                                     (0xDEC30003)
+#define NVC5B0_DEC_ERROR_H264_SLICEDATA_MISSING                                 (0xDEC30004)
+#define NVC5B0_DEC_ERROR_H264_HWERR_INTERRUPT                                   (0xDEC30005)
+#define NVC5B0_DEC_ERROR_H264_DETECTED_VLD_FAILURE                              (0xDEC30006)
+#define NVC5B0_DEC_ERROR_H264_ERRINTSTART                                       (0xDEC30010)
+#define NVC5B0_DEC_ERROR_H264_IQT_ERRINT                                        (0xDEC30011)
+#define NVC5B0_DEC_ERROR_H264_MC_ERRINT                                         (0xDEC30012)
+#define NVC5B0_DEC_ERROR_H264_MC_IQT_ERRINT                                     (0xDEC30013)
+#define NVC5B0_DEC_ERROR_H264_REC_ERRINT                                        (0xDEC30014)
+#define NVC5B0_DEC_ERROR_H264_REC_IQT_ERRINT                                    (0xDEC30015)
+#define NVC5B0_DEC_ERROR_H264_REC_MC_ERRINT                                     (0xDEC30016)
+#define NVC5B0_DEC_ERROR_H264_REC_MC_IQT_ERRINT                                 (0xDEC30017)
+#define NVC5B0_DEC_ERROR_H264_DBF_ERRINT                                        (0xDEC30018)
+#define NVC5B0_DEC_ERROR_H264_DBF_IQT_ERRINT                                    (0xDEC30019)
+#define NVC5B0_DEC_ERROR_H264_DBF_MC_ERRINT                                     (0xDEC3001A)
+#define NVC5B0_DEC_ERROR_H264_DBF_MC_IQT_ERRINT                                 (0xDEC3001B)
+#define NVC5B0_DEC_ERROR_H264_DBF_REC_ERRINT                                    (0xDEC3001C)
+#define NVC5B0_DEC_ERROR_H264_DBF_REC_IQT_ERRINT                                (0xDEC3001D)
+#define NVC5B0_DEC_ERROR_H264_DBF_REC_MC_ERRINT                                 (0xDEC3001E)
+#define NVC5B0_DEC_ERROR_H264_DBF_REC_MC_IQT_ERRINT                             (0xDEC3001F)
+#define NVC5B0_DEC_ERROR_H264_PICTURE_INIT                                      (0xDEC30100)
+#define NVC5B0_DEC_ERROR_H264_STATEMACHINE_FAILURE                              (0xDEC30101)
+#define NVC5B0_DEC_ERROR_H264_INVALID_CTXID_PIC                                 (0xDEC30901)
+#define NVC5B0_DEC_ERROR_H264_INVALID_CTXID_UCODE                               (0xDEC30902)
+#define NVC5B0_DEC_ERROR_H264_INVALID_CTXID_FC                                  (0xDEC30903)
+#define NVC5B0_DEC_ERROR_H264_INVALID_CTXID_SLH                                 (0xDEC30904)
+#define NVC5B0_DEC_ERROR_H264_INVALID_UCODE_SIZE                                (0xDEC30905)
+#define NVC5B0_DEC_ERROR_H264_INVALID_SLICE_COUNT                               (0xDEC30906)
+#define NVC5B0_DEC_ERROR_MPEG4_APPTIMER_EXPIRED                                 (0xDEC40001)
+#define NVC5B0_DEC_ERROR_MPEG4_MVTIMER_EXPIRED                                  (0xDEC40002)
+#define NVC5B0_DEC_ERROR_MPEG4_INVALID_TOKEN                                    (0xDEC40003)
+#define NVC5B0_DEC_ERROR_MPEG4_SLICEDATA_MISSING                                (0xDEC40004)
+#define NVC5B0_DEC_ERROR_MPEG4_HWERR_INTERRUPT                                  (0xDEC40005)
+#define NVC5B0_DEC_ERROR_MPEG4_DETECTED_VLD_FAILURE                             (0xDEC40006)
+#define NVC5B0_DEC_ERROR_MPEG4_TIMEOUT_POLLING_FOR_DATA                         (0xDEC40007)
+#define NVC5B0_DEC_ERROR_MPEG4_PDEC_PIC_END_UNALIGNED                           (0xDEC40008)
+#define NVC5B0_DEC_ERROR_MPEG4_WDTIMER_EXPIRED                                  (0xDEC40009)
+#define NVC5B0_DEC_ERROR_MPEG4_ERRINTSTART                                      (0xDEC40010)
+#define NVC5B0_DEC_ERROR_MPEG4_IQT_ERRINT                                       (0xDEC40011)
+#define NVC5B0_DEC_ERROR_MPEG4_MC_ERRINT                                        (0xDEC40012)
+#define NVC5B0_DEC_ERROR_MPEG4_MC_IQT_ERRINT                                    (0xDEC40013)
+#define NVC5B0_DEC_ERROR_MPEG4_REC_ERRINT                                       (0xDEC40014)
+#define NVC5B0_DEC_ERROR_MPEG4_REC_IQT_ERRINT                                   (0xDEC40015)
+#define NVC5B0_DEC_ERROR_MPEG4_REC_MC_ERRINT                                    (0xDEC40016)
+#define NVC5B0_DEC_ERROR_MPEG4_REC_MC_IQT_ERRINT                                (0xDEC40017)
+#define NVC5B0_DEC_ERROR_MPEG4_DBF_ERRINT                                       (0xDEC40018)
+#define NVC5B0_DEC_ERROR_MPEG4_DBF_IQT_ERRINT                                   (0xDEC40019)
+#define NVC5B0_DEC_ERROR_MPEG4_DBF_MC_ERRINT                                    (0xDEC4001A)
+#define NVC5B0_DEC_ERROR_MPEG4_DBF_MC_IQT_ERRINT                                (0xDEC4001B)
+#define NVC5B0_DEC_ERROR_MPEG4_DBF_REC_ERRINT                                   (0xDEC4001C)
+#define NVC5B0_DEC_ERROR_MPEG4_DBF_REC_IQT_ERRINT                               (0xDEC4001D)
+#define NVC5B0_DEC_ERROR_MPEG4_DBF_REC_MC_ERRINT                                (0xDEC4001E)
+#define NVC5B0_DEC_ERROR_MPEG4_DBF_REC_MC_IQT_ERRINT                            (0xDEC4001F)
+#define NVC5B0_DEC_ERROR_MPEG4_PICTURE_INIT                                     (0xDEC40100)
+#define NVC5B0_DEC_ERROR_MPEG4_STATEMACHINE_FAILURE                             (0xDEC40101)
+#define NVC5B0_DEC_ERROR_MPEG4_INVALID_CTXID_PIC                                (0xDEC40901)
+#define NVC5B0_DEC_ERROR_MPEG4_INVALID_CTXID_UCODE                              (0xDEC40902)
+#define NVC5B0_DEC_ERROR_MPEG4_INVALID_CTXID_FC                                 (0xDEC40903)
+#define NVC5B0_DEC_ERROR_MPEG4_INVALID_CTXID_SLH                                (0xDEC40904)
+#define NVC5B0_DEC_ERROR_MPEG4_INVALID_UCODE_SIZE                               (0xDEC40905)
+#define NVC5B0_DEC_ERROR_MPEG4_INVALID_SLICE_COUNT                              (0xDEC40906)
+
+#ifdef __cplusplus
+};     /* extern "C" */
+#endif
+#endif /* AVUTIL_CLC5B0_H */
diff --git a/libavutil/cle7d0.h b/libavutil/cle7d0.h
new file mode 100644
index 0000000000..f17e67036f
--- /dev/null
+++ b/libavutil/cle7d0.h
@@ -0,0 +1,129 @@
+/*******************************************************************************
+    Copyright (c) 1993-2020, NVIDIA CORPORATION. All rights reserved.
+
+    Permission is hereby granted, free of charge, to any person obtaining a
+    copy of this software and associated documentation files (the "Software"),
+    to deal in the Software without restriction, including without limitation
+    the rights to use, copy, modify, merge, publish, distribute, sublicense,
+    and/or sell copies of the Software, and to permit persons to whom the
+    Software is furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in
+    all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+#ifndef AVUTIL_CLE7D0_H
+#define AVUTIL_CLE7D0_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NVE7D0_VIDEO_NVJPG                                                               (0x0000E7D0)
+
+#define NVE7D0_NOP                                                              (0x00000100)
+#define NVE7D0_NOP_PARAMETER                                                    31:0
+#define NVE7D0_SET_APPLICATION_ID                                               (0x00000200)
+#define NVE7D0_SET_APPLICATION_ID_ID                                            31:0
+#define NVE7D0_SET_APPLICATION_ID_ID_NVJPG_DECODER                              (0x00000001)
+#define NVE7D0_SET_APPLICATION_ID_ID_NVJPG_ENCODER                              (0x00000002)
+#define NVE7D0_SET_WATCHDOG_TIMER                                               (0x00000204)
+#define NVE7D0_SET_WATCHDOG_TIMER_TIMER                                         31:0
+#define NVE7D0_SEMAPHORE_A                                                      (0x00000240)
+#define NVE7D0_SEMAPHORE_A_UPPER                                                7:0
+#define NVE7D0_SEMAPHORE_B                                                      (0x00000244)
+#define NVE7D0_SEMAPHORE_B_LOWER                                                31:0
+#define NVE7D0_SEMAPHORE_C                                                      (0x00000248)
+#define NVE7D0_SEMAPHORE_C_PAYLOAD                                              31:0
+#define NVE7D0_CTX_SAVE_AREA                                                    (0x0000024C)
+#define NVE7D0_CTX_SAVE_AREA_OFFSET                                             27:0
+#define NVE7D0_CTX_SAVE_AREA_CTX_VALID                                          31:28
+#define NVE7D0_CTX_SWITCH                                                       (0x00000250)
+#define NVE7D0_CTX_SWITCH_RESTORE                                               0:0
+#define NVE7D0_CTX_SWITCH_RESTORE_FALSE                                         (0x00000000)
+#define NVE7D0_CTX_SWITCH_RESTORE_TRUE                                          (0x00000001)
+#define NVE7D0_CTX_SWITCH_RST_NOTIFY                                            1:1
+#define NVE7D0_CTX_SWITCH_RST_NOTIFY_FALSE                                      (0x00000000)
+#define NVE7D0_CTX_SWITCH_RST_NOTIFY_TRUE                                       (0x00000001)
+#define NVE7D0_CTX_SWITCH_RESERVED                                              7:2
+#define NVE7D0_CTX_SWITCH_ASID                                                  23:8
+#define NVE7D0_EXECUTE                                                          (0x00000300)
+#define NVE7D0_EXECUTE_NOTIFY                                                   0:0
+#define NVE7D0_EXECUTE_NOTIFY_DISABLE                                           (0x00000000)
+#define NVE7D0_EXECUTE_NOTIFY_ENABLE                                            (0x00000001)
+#define NVE7D0_EXECUTE_NOTIFY_ON                                                1:1
+#define NVE7D0_EXECUTE_NOTIFY_ON_END                                            (0x00000000)
+#define NVE7D0_EXECUTE_NOTIFY_ON_BEGIN                                          (0x00000001)
+#define NVE7D0_EXECUTE_AWAKEN                                                   8:8
+#define NVE7D0_EXECUTE_AWAKEN_DISABLE                                           (0x00000000)
+#define NVE7D0_EXECUTE_AWAKEN_ENABLE                                            (0x00000001)
+#define NVE7D0_SEMAPHORE_D                                                      (0x00000304)
+#define NVE7D0_SEMAPHORE_D_STRUCTURE_SIZE                                       0:0
+#define NVE7D0_SEMAPHORE_D_STRUCTURE_SIZE_ONE                                   (0x00000000)
+#define NVE7D0_SEMAPHORE_D_STRUCTURE_SIZE_FOUR                                  (0x00000001)
+#define NVE7D0_SEMAPHORE_D_AWAKEN_ENABLE                                        8:8
+#define NVE7D0_SEMAPHORE_D_AWAKEN_ENABLE_FALSE                                  (0x00000000)
+#define NVE7D0_SEMAPHORE_D_AWAKEN_ENABLE_TRUE                                   (0x00000001)
+#define NVE7D0_SEMAPHORE_D_OPERATION                                            17:16
+#define NVE7D0_SEMAPHORE_D_OPERATION_RELEASE                                    (0x00000000)
+#define NVE7D0_SEMAPHORE_D_OPERATION_RESERVED0                                  (0x00000001)
+#define NVE7D0_SEMAPHORE_D_OPERATION_RESERVED1                                  (0x00000002)
+#define NVE7D0_SEMAPHORE_D_OPERATION_TRAP                                       (0x00000003)
+#define NVE7D0_SEMAPHORE_D_FLUSH_DISABLE                                        21:21
+#define NVE7D0_SEMAPHORE_D_FLUSH_DISABLE_FALSE                                  (0x00000000)
+#define NVE7D0_SEMAPHORE_D_FLUSH_DISABLE_TRUE                                   (0x00000001)
+#define NVE7D0_SET_CONTROL_PARAMS                                               (0x00000700)
+#define NVE7D0_SET_CONTROL_PARAMS_GPTIMER_ON                                    0:0
+#define NVE7D0_SET_CONTROL_PARAMS_DUMP_CYCLE_COUNT                              1:1
+#define NVE7D0_SET_CONTROL_PARAMS_DEBUG_MODE                                    2:2
+#define NVE7D0_SET_PICTURE_INDEX                                                (0x00000704)
+#define NVE7D0_SET_PICTURE_INDEX_INDEX                                          31:0
+#define NVE7D0_SET_IN_DRV_PIC_SETUP                                             (0x00000708)
+#define NVE7D0_SET_IN_DRV_PIC_SETUP_OFFSET                                      31:0
+#define NVE7D0_SET_OUT_STATUS                                                   (0x0000070C)
+#define NVE7D0_SET_OUT_STATUS_OFFSET                                            31:0
+#define NVE7D0_SET_BITSTREAM                                                    (0x00000710)
+#define NVE7D0_SET_BITSTREAM_OFFSET                                             31:0
+#define NVE7D0_SET_CUR_PIC                                                      (0x00000714)
+#define NVE7D0_SET_CUR_PIC_OFFSET                                               31:0
+#define NVE7D0_SET_CUR_PIC_CHROMA_U                                             (0x00000718)
+#define NVE7D0_SET_CUR_PIC_CHROMA_U_OFFSET                                      31:0
+#define NVE7D0_SET_CUR_PIC_CHROMA_V                                             (0x0000071C)
+#define NVE7D0_SET_CUR_PIC_CHROMA_V_OFFSET                                      31:0
+
+#define NVE7D0_ERROR_NONE                                                       (0x00000000)
+#define NVE7D0_OS_ERROR_EXECUTE_INSUFFICIENT_DATA                               (0x00000001)
+#define NVE7D0_OS_ERROR_SEMAPHORE_INSUFFICIENT_DATA                             (0x00000002)
+#define NVE7D0_OS_ERROR_INVALID_METHOD                                          (0x00000003)
+#define NVE7D0_OS_ERROR_INVALID_DMA_PAGE                                        (0x00000004)
+#define NVE7D0_OS_ERROR_UNHANDLED_INTERRUPT                                     (0x00000005)
+#define NVE7D0_OS_ERROR_EXCEPTION                                               (0x00000006)
+#define NVE7D0_OS_ERROR_INVALID_CTXSW_REQUEST                                   (0x00000007)
+#define NVE7D0_OS_ERROR_APPLICATION                                             (0x00000008)
+#define NVE7D0_OS_INTERRUPT_EXECUTE_AWAKEN                                      (0x00000100)
+#define NVE7D0_OS_INTERRUPT_BACKEND_SEMAPHORE_AWAKEN                            (0x00000200)
+#define NVE7D0_OS_INTERRUPT_CTX_ERROR_FBIF                                      (0x00000300)
+#define NVE7D0_OS_INTERRUPT_LIMIT_VIOLATION                                     (0x00000400)
+#define NVE7D0_OS_INTERRUPT_LIMIT_AND_FBIF_CTX_ERROR                            (0x00000500)
+#define NVE7D0_OS_INTERRUPT_HALT_ENGINE                                         (0x00000600)
+#define NVE7D0_OS_INTERRUPT_TRAP_NONSTALL                                       (0x00000700)
+#define NVE7D0_OS_INTERRUPT_CTX_SAVE_DONE                                       (0x00000800)
+#define NVE7D0_OS_INTERRUPT_CTX_RESTORE_DONE                                    (0x00000900)
+#define NVE7D0_ERROR_JPGAPPTIMER_EXPIRED                                        (0x30000001)
+#define NVE7D0_ERROR_JPGINVALID_INPUT                                           (0x30000002)
+#define NVE7D0_ERROR_JPGHWERR_INTERRUPT                                         (0x30000003)
+#define NVE7D0_ERROR_JPGBAD_MAGIC                                               (0x30000004)
+
+#ifdef __cplusplus
+};     /* extern "C" */
+#endif
+#endif /* AVUTIL_CLE7D0_H */
diff --git a/libavutil/nvdec_drv.h b/libavutil/nvdec_drv.h
new file mode 100644
index 0000000000..7803cd16b3
--- /dev/null
+++ b/libavutil/nvdec_drv.h
@@ -0,0 +1,1858 @@
+/*
+ * SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: MIT
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef AVUTIL_NVDEC_DRV_H
+#define AVUTIL_NVDEC_DRV_H
+
+// TODO: Many fields can be converted to bitfields to save memory BW
+// TODO: Revisit reserved fields for proper alignment and memory savings
+
+///////////////////////////////////////////////////////////////////////////////
+// NVDEC(MSDEC 5) is a single engine solution, and seperates into VLD, MV, IQT,
+//                MCFETCH, MC, MCC, REC, DBF, DFBFDMA, HIST etc unit.
+//                The class(driver to HW) can mainly seperate into VLD parser
+//                and Decoder part to be consistent with original design. And
+//                the sequence level info usally set in VLD part. Later codec like
+//                VP8 won't name in this way.
+// MSVLD: Multi-Standard VLD parser.
+//
+#define ALIGN_UP(v, n)          (((v) + ((n)-1)) &~ ((n)-1))
+#define NVDEC_ALIGN(value)      ALIGN_UP(value,256) // Align to 256 bytes
+#define NVDEC_MAX_MPEG2_SLICE   65536 // at 4096*4096, macroblock count = 65536, 1 macroblock per slice
+
+#define NVDEC_CODEC_MPEG1   0
+#define NVDEC_CODEC_MPEG2   1
+#define NVDEC_CODEC_VC1     2
+#define NVDEC_CODEC_H264    3
+#define NVDEC_CODEC_MPEG4   4
+#define NVDEC_CODEC_DIVX    NVDEC_CODEC_MPEG4
+#define NVDEC_CODEC_VP8     5
+#define NVDEC_CODEC_HEVC    7
+#define NVDEC_CODEC_VP9     9
+#define NVDEC_CODEC_HEVC_PARSER 12
+#define NVDEC_CODEC_AV1         10
+
+// AES encryption
+enum
+{
+    AES128_NONE = 0x0,
+    AES128_CTR = 0x1,
+    AES128_CBC,
+    AES128_ECB,
+    AES128_OFB,
+    AES128_CTR_LSB16B,
+    AES128_CLR_AS_ENCRYPT,
+    AES128_RESERVED = 0x7
+};
+
+enum
+{
+    AES128_CTS_DISABLE = 0x0,
+    AES128_CTS_ENABLE = 0x1
+};
+
+enum
+{
+    AES128_PADDING_NONE = 0x0,
+    AES128_PADDING_CARRY_OVER,
+    AES128_PADDING_RFC2630,
+    AES128_PADDING_RESERVED = 0x7
+};
+
+typedef enum
+{
+    ENCR_MODE_CTR64         = 0,
+    ENCR_MODE_CBC           = 1,
+    ENCR_MODE_ECB           = 2,
+    ENCR_MODE_ECB_PARTIAL   = 3,
+    ENCR_MODE_CBC_PARTIAL   = 4,
+    ENCR_MODE_CLEAR_INTO_VPR = 5,     // used for clear stream decoding into VPR.
+    ENCR_MODE_FORCE_INTO_VPR = 6,    //  used to force decode output into VPR.
+} ENCR_MODE;
+
+// drm_mode configuration
+//
+// Bit 0:2  AES encryption mode
+// Bit 3    CTS (CipherTextStealing) enable/disable
+// Bit 4:6  Padding type
+// Bit 7:7  Unwrap key enable/disable
+
+#define AES_MODE_MASK           0x7
+#define AES_CTS_MASK            0x1
+#define AES_PADDING_TYPE_MASK   0x7
+#define AES_UNWRAP_KEY_MASK     0x1
+
+#define AES_MODE_SHIFT          0
+#define AES_CTS_SHIFT           3
+#define AES_PADDING_TYPE_SHIFT  4
+#define AES_UNWRAP_KEY_SHIFT    7
+
+#define AES_SET_FLAG(M, C, P)   ((M & AES_MODE_MASK) << AES_MODE_SHIFT) | \
+                                ((C & AES_CTS_MASK) << AES_CTS_SHIFT) | \
+                                ((P & AES_PADDING_TYPE_MASK) << AES_PADDING_TYPE_SHIFT)
+
+#define AES_GET_FLAG(V, F)      ((V & ((AES_##F##_MASK) <<(AES_##F##_SHIFT))) >> (AES_##F##_SHIFT))
+
+#define DRM_MODE_MASK           0x7f        // Bits 0:6  (0:2 -> AES_MODE, 3 -> AES_CTS, 4:6 -> AES_PADDING_TYPE)
+#define AES_GET_DRM_MODE(V)      (V & DRM_MODE_MASK)
+
+enum { DRM_MS_PIFF_CTR  =   AES_SET_FLAG(AES128_CTR, AES128_CTS_DISABLE, AES128_PADDING_CARRY_OVER) };
+enum { DRM_MS_PIFF_CBC  =   AES_SET_FLAG(AES128_CBC, AES128_CTS_DISABLE, AES128_PADDING_NONE) };
+enum { DRM_MARLIN_CTR   =   AES_SET_FLAG(AES128_CTR, AES128_CTS_DISABLE, AES128_PADDING_NONE) };
+enum { DRM_MARLIN_CBC   =   AES_SET_FLAG(AES128_CBC, AES128_CTS_DISABLE, AES128_PADDING_RFC2630) };
+enum { DRM_WIDEVINE     =   AES_SET_FLAG(AES128_CBC, AES128_CTS_ENABLE,  AES128_PADDING_NONE) };
+enum { DRM_WIDEVINE_CTR =   AES_SET_FLAG(AES128_CTR, AES128_CTS_DISABLE, AES128_PADDING_CARRY_OVER) };
+enum { DRM_ULTRA_VIOLET =   AES_SET_FLAG(AES128_CTR_LSB16B, AES128_CTS_DISABLE, AES128_PADDING_NONE) };
+enum { DRM_NONE         =   AES_SET_FLAG(AES128_NONE, AES128_CTS_DISABLE, AES128_PADDING_NONE) };
+enum { DRM_CLR_AS_ENCRYPT = AES_SET_FLAG(AES128_CLR_AS_ENCRYPT, AES128_CTS_DISABLE, AES128_PADDING_NONE)};
+
+// SSM entry structure
+typedef struct _nvdec_ssm_s {
+    unsigned int bytes_of_protected_data;//bytes of protected data, follows bytes_of_clear_data. Note: When padding is enabled, it does not include the padding_bytes (1~15), which can be derived by "(16-(bytes_of_protected_data&0xF))&0xF"
+    unsigned int bytes_of_clear_data:16; //bytes of clear data, located before bytes_of_protected_data
+    unsigned int skip_byte_blk      : 4; //valid when (entry_type==0 && mode = 1)
+    unsigned int crypt_byte_blk     : 4; //valid when (entry_type==0 && mode = 1)
+    unsigned int skip               : 1; //whether this SSM entry should be skipped or not
+    unsigned int last               : 1; //whether this SSM entry is the last one for the whole decoding frame
+    unsigned int pad                : 1; //valid when (entry_type==0 && mode==0 && AES_PADDING_TYPE==AES128_PADDING_RFC2630), 0 for pad_end, 1 for pad_begin
+    unsigned int mode               : 1; //0 for normal mode, 1 for pattern mode
+    unsigned int entry_type         : 1; //0 for DATA, 1 for IV
+    unsigned int reserved           : 3;
+} nvdec_ssm_s; /* SubSampleMap, 8bytes */
+
+// PASS2 OTF extension structure for SSM support, not exist in nvdec_mpeg4_pic_s (as MPEG4 OTF SW-DRM is not supported yet)
+typedef struct _nvdec_pass2_otf_ext_s {
+    unsigned int ssm_entry_num      :16; //specifies how many SSM entries (each in unit of 8 bytes) existed in SET_SUB_SAMPLE_MAP_OFFSET surface
+    unsigned int ssm_iv_num         :16; //specifies how many SSM IV (each in unit of 16 bytes) existed in SET_SUB_SAMPLE_MAP_IV_OFFSET surface
+    unsigned int real_stream_length;     //the real stream length, which is the bitstream length EMD/VLD will get after whole frame SSM processing, sum up of "clear+protected" bytes in SSM entries and removing "non_slice_data/skip".
+    unsigned int non_slice_data     :16; //specifies the first many bytes needed to skip, includes only those of "clear+protected" bytes ("padding" bytes excluded)
+    unsigned int drm_mode           : 7;
+    unsigned int reserved           : 9;
+} nvdec_pass2_otf_ext_s; /* 12bytes */
+
+
+//NVDEC5.0 low latency decoding (partial stream kickoff without context switch), method will reuse HevcSetSliceInfoBufferOffset.
+typedef struct _nvdec_substream_entry_s {
+    unsigned int substream_start_offset;                    //substream byte start offset to bitstream base address
+    unsigned int substream_length;                          //subsream length in byte
+    unsigned int substream_first_tile_idx           : 8;    //the first tile index(raster scan in frame) of this substream,max is 255
+    unsigned int substream_last_tile_idx            : 8;    //the last tile index(raster scan in frame) of this substream, max is 255
+    unsigned int last_substream_entry_in_frame      : 1;    //this entry is the last substream entry of this frame
+    unsigned int reserved                           : 15;
+} nvdec_substream_entry_s;/*low latency without context switch substream entry map,12bytes*/
+
+
+// GIP
+
+/* tile border coefficients of filter */
+#define GIP_ASIC_VERT_FILTER_RAM_SIZE       16  /* bytes per pixel */
+
+/* BSD control data of current picture at tile border
+ * 11  * 128 bits per 4x4 tile = 128/(8*4) bytes per row */
+#define GIP_ASIC_BSD_CTRL_RAM_SIZE          4  /* bytes per row */
+
+/* 8 dc + 8 to boundary + 6*16 + 2*6*64 + 2*64 -> 63 * 16 bytes */
+#define GIP_ASIC_SCALING_LIST_SIZE          (16*64)
+
+/* tile border coefficients of filter */
+#define GIP_ASIC_VERT_SAO_RAM_SIZE          16  /* bytes per pixel */
+
+/* max number of tiles times width and height (2 bytes each),
+ * rounding up to next 16 bytes boundary + one extra 16 byte
+ * chunk (HW guys wanted to have this) */
+#define GIP_ASIC_TILE_SIZE                  ((20*22*2*2+16+15) & ~0xF)
+
+/* Segment map uses 32 bytes / CTB */
+#define GIP_ASIC_VP9_CTB_SEG_SIZE           32
+
+// HEVC Filter FG buffer
+#define HEVC_DBLK_TOP_SIZE_IN_SB16          ALIGN_UP(632, 128) // ctb16 + 444
+#define HEVC_DBLK_TOP_BUF_SIZE(w)           NVDEC_ALIGN( (ALIGN_UP(w,16)/16 + 2) * HEVC_DBLK_TOP_SIZE_IN_SB16) // 8K: 1285*256
+
+#define HEVC_DBLK_LEFT_SIZE_IN_SB16         ALIGN_UP(506, 128) // ctb16 + 444
+#define HEVC_DBLK_LEFT_BUF_SIZE(h)          NVDEC_ALIGN( (ALIGN_UP(h,16)/16 + 2) * HEVC_DBLK_LEFT_SIZE_IN_SB16) // 8K: 1028*256
+
+#define HEVC_SAO_LEFT_SIZE_IN_SB16          ALIGN_UP(713, 128) // ctb16 + 444
+#define HEVC_SAO_LEFT_BUF_SIZE(h)           NVDEC_ALIGN( (ALIGN_UP(h,16)/16 + 2) * HEVC_SAO_LEFT_SIZE_IN_SB16) // 8K: 1542*256
+
+// VP9 Filter FG buffer
+#define VP9_DBLK_TOP_SIZE_IN_SB64           ALIGN_UP(2000, 128) // 420
+#define VP9_DBLK_TOP_BUF_SIZE(w)            NVDEC_ALIGN( (ALIGN_UP(w,64)/64 + 2) * VP9_DBLK_TOP_SIZE_IN_SB64) // 8K: 1040*256
+
+#define VP9_DBLK_LEFT_SIZE_IN_SB64          ALIGN_UP(1600, 128) // 420
+#define VP9_DBLK_LEFT_BUF_SIZE(h)           NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * VP9_DBLK_LEFT_SIZE_IN_SB64) // 8K: 845*256
+
+// VP9 Hint Dump Buffer
+#define VP9_HINT_DUMP_SIZE_IN_SB64          ((64*64)/(4*4)*8)           // 8 bytes per CU, 256 CUs(2048 bytes) per SB64
+#define VP9_HINT_DUMP_SIZE(w, h)            NVDEC_ALIGN(VP9_HINT_DUMP_SIZE_IN_SB64*((w+63)/64)*((h+63)/64))
+
+// used for ecdma debug
+typedef struct _nvdec_ecdma_config_s
+{
+    unsigned int            ecdma_enable;                               // enable/disable  ecdma
+    unsigned short          ecdma_blk_x_src;                            // src start position x , it's 64x aligned
+    unsigned short          ecdma_blk_y_src;                            // src start position y , it's 8x aligned
+    unsigned short          ecdma_blk_x_dst;                            // dst start position x , it's 64x aligned
+    unsigned short          ecdma_blk_y_dst;                            // dst start position y , it's 8x aligned
+    unsigned short          ref_pic_idx;                                // ref(src) picture index , used to derived source picture base address
+    unsigned short          boundary0_top;                              // src insided tile/partition region top boundary
+    unsigned short          boundary0_bottom;                           // src insided tile/partition region bottom boundary
+    unsigned short          boundary1_left;                             // src insided tile/partition region left boundary
+    unsigned short          boundary1_right;                            // src insided tile/partition region right boundary
+    unsigned char           blk_copy_flag;                              // blk_copy enable flag.
+                                                                        // if it's 1 ,ctb_size ==3,ecdma_blk_x_src == boundary1_left and ecdma_blk_y_src == boundary0_top ;
+                                                                        // if it's 0 ,ecdma_blk_x_src == ecdma_blk_x_dst and ecdma_blk_y_src == ecdma_blk_y_dst;
+    unsigned char           ctb_size;                                   // ctb_size .0:64x64,1:32x32,2:16x16,3:8x8
+} nvdec_ecdma_config_s;
+
+typedef struct _nvdec_status_hevc_s
+{
+    unsigned int frame_status_intra_cnt;    //Intra block counter, in unit of 8x8 block, IPCM block included
+    unsigned int frame_status_inter_cnt;    //Inter block counter, in unit of 8x8 block, SKIP block included
+    unsigned int frame_status_skip_cnt;     //Skip block counter, in unit of 4x4 block, blocks having NO/ZERO texture/coeff data
+    unsigned int frame_status_fwd_mvx_cnt;  //ABS sum of forward  MVx, one 14bit MVx(integer) per 4x4 block
+    unsigned int frame_status_fwd_mvy_cnt;  //ABS sum of forward  MVy, one 14bit MVy(integer) per 4x4 block
+    unsigned int frame_status_bwd_mvx_cnt;  //ABS sum of backward MVx, one 14bit MVx(integer) per 4x4 block
+    unsigned int frame_status_bwd_mvy_cnt;  //ABS sum of backward MVy, one 14bit MVy(integer) per 4x4 block
+    unsigned int error_ctb_pos;             //[15:0] error ctb   position in Y direction, [31:16] error ctb   position in X direction
+    unsigned int error_slice_pos;           //[15:0] error slice position in Y direction, [31:16] error slice position in X direction
+} nvdec_status_hevc_s;
+
+typedef struct _nvdec_status_vp9_s
+{
+    unsigned int frame_status_intra_cnt;    //Intra block counter, in unit of 8x8 block, IPCM block included
+    unsigned int frame_status_inter_cnt;    //Inter block counter, in unit of 8x8 block, SKIP block included
+    unsigned int frame_status_skip_cnt;     //Skip block counter, in unit of 4x4 block, blocks having NO/ZERO texture/coeff data
+    unsigned int frame_status_fwd_mvx_cnt;  //ABS sum of forward  MVx, one 14bit MVx(integer) per 4x4 block
+    unsigned int frame_status_fwd_mvy_cnt;  //ABS sum of forward  MVy, one 14bit MVy(integer) per 4x4 block
+    unsigned int frame_status_bwd_mvx_cnt;  //ABS sum of backward MVx, one 14bit MVx(integer) per 4x4 block
+    unsigned int frame_status_bwd_mvy_cnt;  //ABS sum of backward MVy, one 14bit MVy(integer) per 4x4 block
+    unsigned int error_ctb_pos;             //[15:0] error ctb   position in Y direction, [31:16] error ctb   position in X direction
+    unsigned int error_slice_pos;           //[15:0] error slice position in Y direction, [31:16] error slice position in X direction
+} nvdec_status_vp9_s;
+
+typedef struct _nvdec_status_s
+{
+    unsigned int    mbs_correctly_decoded;          // total numers of correctly decoded macroblocks
+    unsigned int    mbs_in_error;                   // number of error macroblocks.
+    unsigned int    cycle_count;                    // total cycles taken for execute. read from PERF_DECODE_FRAME_V register
+    unsigned int    error_status;                   // report error if any
+    union
+    {
+        nvdec_status_hevc_s hevc;
+        nvdec_status_vp9_s vp9;
+    };
+    unsigned int    slice_header_error_code;        // report error in slice header
+
+} nvdec_status_s;
+
+// per 16x16 block, used in hevc/vp9 surface of SetExternalMVBufferOffset when error_external_mv_en = 1
+typedef struct _external_mv_s
+{
+    int             mvx     : 14;   //integrate pixel precision
+    int             mvy     : 14;   //integrate pixel precision
+    unsigned int    refidx  :  4;
+} external_mv_s;
+
+// HEVC
+typedef struct _nvdec_hevc_main10_444_ext_s
+{
+    unsigned int transformSkipRotationEnableFlag : 1;    //sps extension for transform_skip_rotation_enabled_flag
+    unsigned int transformSkipContextEnableFlag : 1;     //sps extension for transform_skip_context_enabled_flag
+    unsigned int intraBlockCopyEnableFlag :1;            //sps intraBlockCopyEnableFlag, always 0 before spec define it
+    unsigned int implicitRdpcmEnableFlag : 1;            //sps implicit_rdpcm_enabled_flag
+    unsigned int explicitRdpcmEnableFlag : 1;            //sps explicit_rdpcm_enabled_flag
+    unsigned int extendedPrecisionProcessingFlag : 1;    //sps extended_precision_processing_flag,always 0 in current profile
+    unsigned int intraSmoothingDisabledFlag : 1;         //sps intra_smoothing_disabled_flag
+    unsigned int highPrecisionOffsetsEnableFlag :1;      //sps high_precision_offsets_enabled_flag
+    unsigned int fastRiceAdaptationEnableFlag: 1;        //sps fast_rice_adaptation_enabled_flag
+    unsigned int cabacBypassAlignmentEnableFlag : 1;     //sps cabac_bypass_alignment_enabled_flag, always 0 in current profile
+    unsigned int sps_444_extension_reserved : 22;        //sps reserve for future extension
+
+    unsigned int log2MaxTransformSkipSize : 4 ;          //pps extension log2_max_transform_skip_block_size_minus2, 0...5
+    unsigned int crossComponentPredictionEnableFlag: 1;  //pps cross_component_prediction_enabled_flag
+    unsigned int chromaQpAdjustmentEnableFlag:1;         //pps chroma_qp_adjustment_enabled_flag
+    unsigned int diffCuChromaQpAdjustmentDepth:2;        //pps diff_cu_chroma_qp_adjustment_depth, 0...3
+    unsigned int chromaQpAdjustmentTableSize:3;          //pps chroma_qp_adjustment_table_size_minus1+1, 1...6
+    unsigned int log2SaoOffsetScaleLuma:3;               //pps log2_sao_offset_scale_luma, max(0,bitdepth-10),maxBitdepth 16 for future.
+    unsigned int log2SaoOffsetScaleChroma: 3;            //pps log2_sao_offset_scale_chroma
+    unsigned int pps_444_extension_reserved : 15;        //pps reserved
+    char         cb_qp_adjustment[6];                    //-[12,+12]
+    char         cr_qp_adjustment[6];                    //-[12,+12]
+    unsigned int   HevcFltAboveOffset;  // filter above offset respect to filter buffer, 256 bytes unit
+    unsigned int   HevcSaoAboveOffset;  // sao    above offset respect to filter buffer, 256 bytes unit
+} nvdec_hevc_main10_444_ext_s;
+
+typedef struct _nvdec_hevc_pic_v1_s
+{
+    // New fields
+    //hevc main10 444 extensions
+    nvdec_hevc_main10_444_ext_s hevc_main10_444_ext;
+
+    //HEVC skip bytes from beginning setting for secure
+    //it is different to the sw_hdr_skip_length who skips the middle of stream of
+    //the slice header which is parsed by driver
+    unsigned int   sw_skip_start_length : 14;
+    unsigned int   external_ref_mem_dis :  1;
+    unsigned int   error_recovery_start_pos :  2;       //0: from start of frame, 1: from start of slice segment, 2: from error detected ctb, 3: reserved
+    unsigned int   error_external_mv_en :  1;
+    unsigned int   reserved0            : 14;
+    // Reserved bits padding
+} nvdec_hevc_pic_v1_s;
+
+//No versioning in structure: NVDEC2 (T210 and GM206)
+//version v1 : NVDEC3 (T186 and GP100)
+//version v2 : NVDEC3.1 (GP10x)
+
+typedef struct _nvdec_hevc_pic_v2_s
+{
+    // mv-hevc field
+    unsigned  int  mv_hevc_enable                     :1;
+    unsigned  int  nuh_layer_id                       :6;
+    unsigned  int  default_ref_layers_active_flag     :1;
+    unsigned  int  NumDirectRefLayers                 :6;
+    unsigned  int  max_one_active_ref_layer_flag      :1;
+    unsigned  int  NumActiveRefLayerPics              :6;
+    unsigned  int  poc_lsb_not_present_flag           :1;
+    unsigned  int  reserved0                          :10;
+} nvdec_hevc_pic_v2_s;
+
+typedef struct _nvdec_hevc_pic_v3_s
+{
+    // slice level decoding
+    unsigned  int  slice_decoding_enable:1;//1: enable slice level decoding
+    unsigned  int  slice_ec_enable:1;      //1: enable slice error concealment. When slice_ec_enable=1,slice_decoding_enable must be 1;
+    unsigned  int  slice_ec_mv_type:2;     //0: zero mv; 1: co-located mv; 2: external mv;
+    unsigned  int  err_detected_sw:1;      //1: indicate sw/driver has detected error already in frame kick mode
+    unsigned  int  slice_ec_slice_type:2;  //0: B slice; 1: P slice ; others: reserved
+    unsigned  int  slice_strm_recfg_en:1;  //enable slice bitstream re-configure or not ;
+    unsigned  int  reserved:24;
+    unsigned  int  HevcSliceEdgeOffset;// slice edge buffer offset which repsect to filter buffer ,256 bytes as one unit
+}nvdec_hevc_pic_v3_s;
+
+typedef struct _nvdec_hevc_pic_s
+{
+    //The key/IV addr must be 128bit alignment
+    unsigned int   wrapped_session_key[4];                      //session keys
+    unsigned int   wrapped_content_key[4];                      //content keys
+    unsigned int   initialization_vector[4];                    //Ctrl64 initial vector
+    // hevc_bitstream_data_info
+    unsigned int   stream_len;                                  // stream length in one frame
+    unsigned int   enable_encryption;                           // flag to enable/disable encryption
+    unsigned int   key_increment   : 6;                           // added to content key after unwrapping
+    unsigned int   encryption_mode : 4;
+    unsigned int   key_slot_index  : 4;
+    unsigned int   ssm_en          : 1;
+    unsigned int   enable_histogram  : 1;                       // histogram stats output enable
+    unsigned int   enable_substream_decoding: 1;            //frame substream kickoff without context switch
+    unsigned int   reserved0       :15;
+
+    // Driver may or may not use based upon need.
+    // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode.
+    // Driver can send this value based upon resolution using the formula:
+    // gptimer_timeout_value = 3 * (cycles required for one frame)
+    unsigned int gptimer_timeout_value;
+
+    // general
+    unsigned char tileformat                 : 2 ;   // 0: TBL; 1: KBL; 2: Tile16x16
+    unsigned char gob_height                 : 3 ;   // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards)
+    unsigned char reserverd_surface_format   : 3 ;
+    unsigned char sw_start_code_e;                             // 0: stream doesn't contain start codes,1: stream contains start codes
+    unsigned char disp_output_mode;                            // 0: Rec.709 8 bit, 1: Rec.709 10 bit, 2: Rec.709 10 bits -> 8 bit, 3: Rec.2020 10 bit -> 8 bit
+    unsigned char reserved1;
+    unsigned int  framestride[2];                              // frame buffer stride for luma and chroma
+    unsigned int  colMvBuffersize;                             // collocated MV buffer size of one picture ,256 bytes unit
+    unsigned int  HevcSaoBufferOffset;                         // sao buffer offset respect to filter buffer ,256 bytes unit .
+    unsigned int  HevcBsdCtrlOffset;                           // bsd buffer offset respect to filter buffer ,256 bytes unit .
+    // sps
+    unsigned short pic_width_in_luma_samples;                      // :15, 48(?)..16384, multiple of 8 (48 is smallest width supported by NVDEC for CTU size 16x16)
+    unsigned short pic_height_in_luma_samples;                     // :15, 8..16384, multiple of 8
+    unsigned int chroma_format_idc                            : 4; // always 1 (=4:2:0)
+    unsigned int bit_depth_luma                               : 4; // 8..12
+    unsigned int bit_depth_chroma                             : 4;
+    unsigned int log2_min_luma_coding_block_size              : 4; // 3..6
+    unsigned int log2_max_luma_coding_block_size              : 4; // 3..6
+    unsigned int log2_min_transform_block_size                : 4; // 2..5
+    unsigned int log2_max_transform_block_size                : 4; // 2..5
+    unsigned int reserved2                                    : 4;
+
+    unsigned int max_transform_hierarchy_depth_inter          : 3; // 0..4
+    unsigned int max_transform_hierarchy_depth_intra          : 3; // 0..4
+    unsigned int scalingListEnable                            : 1; //
+    unsigned int amp_enable_flag                              : 1; //
+    unsigned int sample_adaptive_offset_enabled_flag          : 1; //
+    unsigned int pcm_enabled_flag                             : 1; //
+    unsigned int pcm_sample_bit_depth_luma                    : 4; //
+    unsigned int pcm_sample_bit_depth_chroma                  : 4;
+    unsigned int log2_min_pcm_luma_coding_block_size          : 4; //
+    unsigned int log2_max_pcm_luma_coding_block_size          : 4; //
+    unsigned int pcm_loop_filter_disabled_flag                : 1; //
+    unsigned int sps_temporal_mvp_enabled_flag                : 1; //
+    unsigned int strong_intra_smoothing_enabled_flag          : 1; //
+    unsigned int reserved3                                    : 3;
+    // pps
+    unsigned int dependent_slice_segments_enabled_flag        : 1; //
+    unsigned int output_flag_present_flag                     : 1; //
+    unsigned int num_extra_slice_header_bits                  : 3; //  0..7 (normally 0)
+    unsigned int sign_data_hiding_enabled_flag                : 1; //
+    unsigned int cabac_init_present_flag                      : 1; //
+    unsigned int num_ref_idx_l0_default_active                : 4; //  1..15
+    unsigned int num_ref_idx_l1_default_active                : 4; //  1..15
+    unsigned int init_qp                                      : 7; //  0..127, support higher bitdepth
+    unsigned int constrained_intra_pred_flag                  : 1; //
+    unsigned int transform_skip_enabled_flag                  : 1; //
+    unsigned int cu_qp_delta_enabled_flag                     : 1; //
+    unsigned int diff_cu_qp_delta_depth                       : 2; //  0..3
+    unsigned int reserved4                                    : 5; //
+
+    char         pps_cb_qp_offset                             ; //  -12..12
+    char         pps_cr_qp_offset                             ; //  -12..12
+    char         pps_beta_offset                              ; //  -12..12
+    char         pps_tc_offset                                ; //  -12..12
+    unsigned int pps_slice_chroma_qp_offsets_present_flag     : 1; //
+    unsigned int weighted_pred_flag                           : 1; //
+    unsigned int weighted_bipred_flag                         : 1; //
+    unsigned int transquant_bypass_enabled_flag               : 1; //
+    unsigned int tiles_enabled_flag                           : 1; // (redundant: = num_tile_columns_minus1!=0 || num_tile_rows_minus1!=0)
+    unsigned int entropy_coding_sync_enabled_flag             : 1; //
+    unsigned int num_tile_columns                             : 5; // 0..20
+    unsigned int num_tile_rows                                : 5; // 0..22
+    unsigned int loop_filter_across_tiles_enabled_flag        : 1; //
+    unsigned int loop_filter_across_slices_enabled_flag       : 1; //
+    unsigned int deblocking_filter_control_present_flag       : 1; //
+    unsigned int deblocking_filter_override_enabled_flag      : 1; //
+    unsigned int pps_deblocking_filter_disabled_flag          : 1; //
+    unsigned int lists_modification_present_flag              : 1; //
+    unsigned int log2_parallel_merge_level                    : 3; //  2..4
+    unsigned int slice_segment_header_extension_present_flag  : 1; // (normally 0)
+    unsigned int reserved5                                    : 6;
+
+    // reference picture related
+    unsigned char  num_ref_frames;
+    unsigned char  reserved6;
+    unsigned short longtermflag;                              // long term flag for refpiclist.bit 15 for picidx 0, bit 14 for picidx 1,...
+    unsigned char  initreflistidxl0[16];                           // :5, [refPicidx] 0..15
+    unsigned char  initreflistidxl1[16];                           // :5, [refPicidx] 0..15
+    short          RefDiffPicOrderCnts[16];                     // poc diff between current and reference pictures .[-128,127]
+    // misc
+    unsigned char  IDR_picture_flag;                            // idr flag for current picture
+    unsigned char  RAP_picture_flag;                            // rap flag for current picture
+    unsigned char  curr_pic_idx;                                // current  picture store buffer index,used to derive the store addess of frame buffer and MV
+    unsigned char  pattern_id;                                  // used for dithering to select between 2 tables
+    unsigned short sw_hdr_skip_length;                          // reference picture inititial related syntax elements(SE) bits in slice header.
+                                                                // those SE only decoding once in driver,related bits will flush in HW
+    unsigned short reserved7;
+
+    // used for ecdma debug
+    nvdec_ecdma_config_s  ecdma_cfg;
+
+    //DXVA on windows
+    unsigned int   separate_colour_plane_flag : 1;
+    unsigned int   log2_max_pic_order_cnt_lsb_minus4 : 4;    //0~12
+    unsigned int   num_short_term_ref_pic_sets : 7 ;  //0~64
+    unsigned int   num_long_term_ref_pics_sps :  6;  //0~32
+    unsigned int   bBitParsingDisable : 1 ; //disable parsing
+    unsigned int   num_delta_pocs_of_rps_idx : 8;
+    unsigned int   long_term_ref_pics_present_flag : 1;
+    unsigned int   reserved_dxva : 4;
+    //the number of bits for short_term_ref_pic_set()in slice header,dxva API
+    unsigned int   num_bits_short_term_ref_pics_in_slice;
+
+    // New additions
+    nvdec_hevc_pic_v1_s v1;
+    nvdec_hevc_pic_v2_s v2;
+    nvdec_hevc_pic_v3_s v3;
+    nvdec_pass2_otf_ext_s ssm;
+
+} nvdec_hevc_pic_s;
+
+//hevc slice info class
+typedef struct _hevc_slice_info_s {
+    unsigned int   first_flag    :1;//first slice(s) of frame,must valid for slice EC
+    unsigned int   err_flag      :1;//error slice(s) .optional info for EC
+    unsigned int   last_flag     :1;//last slice segment(s) of frame,this bit is must be valid when slice_strm_recfg_en==1 or slice_ec==1
+    unsigned int   conceal_partial_slice :1; // indicate do partial slice error conealment for packet loss case
+    unsigned int   available     :1; // indicate the slice bitstream is available.
+    unsigned int   reserved0     :7;
+    unsigned int   ctb_count     :20;// ctbs counter inside slice(s) .must valid for slice EC
+    unsigned int   bs_offset; //slice(s) bitstream offset in bitstream buffer (in byte unit)
+    unsigned int   bs_length; //slice(s) bitstream length. It is sum of aligned size and skip size and valid slice bitstream size.
+    unsigned short start_ctbx; //slice start ctbx ,it's optional,HW can output it in previous slice decoding.
+                                //but this is one check points for error
+    unsigned short start_ctby; //slice start ctby
+ } hevc_slice_info_s;
+
+
+//hevc slice ctx class
+//slice pos and next slice address
+typedef struct  _slice_edge_ctb_pos_ctx_s {
+    unsigned int    next_slice_pos_ctbxy;         //2d address in raster scan
+    unsigned int    next_slice_segment_addr;      //1d address in  tile scan
+}slice_edge_ctb_pos_ctx_s;
+
+//  next slice's first ctb located tile related information
+typedef struct  _slice_edge_tile_ctx_s {
+    unsigned int    tileInfo1;// Misc tile info includes tile width and tile height and tile col and tile row
+    unsigned int    tileInfo2;// Misc tile info includes tile start ctbx and start ctby and tile index
+    unsigned int    tileInfo3;// Misc tile info includes  ctb pos inside tile
+} slice_edge_tile_ctx_s;
+
+//frame level stats
+typedef struct  _slice_edge_stats_ctx_s {
+    unsigned int   frame_status_intra_cnt;// frame stats for intra block count
+    unsigned int   frame_status_inter_cnt;// frame stats for inter block count
+    unsigned int   frame_status_skip_cnt;// frame stats for skip block count
+    unsigned int   frame_status_fwd_mvx_cnt;// frame stats for sum of  abs fwd mvx
+    unsigned int   frame_status_fwd_mvy_cnt;// frame stats for sum of  abs fwd mvy
+    unsigned int   frame_status_bwd_mvx_cnt;// frame stats for sum of  abs bwd mvx
+    unsigned int   frame_status_bwd_mvy_cnt;// frame stats for sum of  abs bwd mvy
+    unsigned int   frame_status_mv_cnt_ext;// extension bits of  sum of abs mv to keep full precision.
+}slice_edge_stats_ctx_s;
+
+//ctx of vpc_edge unit for tile left
+typedef struct  _slice_vpc_edge_ctx_s {
+    unsigned int   reserved;
+}slice_vpc_edge_ctx_s;
+
+//ctx of vpc_main unit
+typedef struct  _slice_vpc_main_ctx_s {
+    unsigned int   reserved;
+} slice_vpc_main_ctx_s;
+
+//hevc slice edge ctx class
+typedef struct  _slice_edge_ctx_s {
+    //ctb pos
+    slice_edge_ctb_pos_ctx_s  slice_ctb_pos_ctx;
+    // stats
+    slice_edge_stats_ctx_s slice_stats_ctx;
+    // tile info
+    slice_edge_tile_ctx_s    slice_tile_ctx;
+    //vpc_edge
+    slice_vpc_edge_ctx_s  slice_vpc_edge_ctx;
+    //vpc_main
+    slice_vpc_main_ctx_s  slice_vpc_main_ctx;
+} slice_edge_ctx_s;
+
+typedef struct _nvdec_hevc_scaling_list_s {
+    unsigned char    ScalingListDCCoeff16x16[6];
+    unsigned char    ScalingListDCCoeff32x32[2];
+    unsigned char    reserved0[8];
+
+    unsigned char    ScalingList4x4[6][16];
+    unsigned char    ScalingList8x8[6][64];
+    unsigned char    ScalingList16x16[6][64];
+    unsigned char    ScalingList32x32[2][64];
+} nvdec_hevc_scaling_list_s;
+
+
+//vp9
+
+typedef struct _nvdec_vp9_pic_v1_s
+{
+    // New fields
+    // new_var : xx; // for variables with expanded bitlength, comment on why the new bit legth is required
+    // Reserved bits for padding and/or non-HW specific functionality
+    unsigned int   Vp9FltAboveOffset;  // filter above offset respect to filter buffer, 256 bytes unit
+    unsigned int   external_ref_mem_dis :  1;
+    unsigned int   bit_depth            :  4;
+    unsigned int   error_recovery_start_pos :  2;       //0: from start of frame, 1: from start of slice segment, 2: from error detected ctb, 3: reserved
+    unsigned int   error_external_mv_en :  1;
+    unsigned int   Reserved0            : 24;
+} nvdec_vp9_pic_v1_s;
+
+enum VP9_FRAME_SFC_ID
+{
+    VP9_LAST_FRAME_SFC = 0,
+    VP9_GOLDEN_FRAME_SFC,
+    VP9_ALTREF_FRAME_SFC,
+    VP9_CURR_FRAME_SFC
+};
+
+typedef struct _nvdec_vp9_pic_s
+{
+    // vp9_bitstream_data_info
+    //Key and IV address must 128bit alignment
+    unsigned int   wrapped_session_key[4];                      //session keys
+    unsigned int   wrapped_content_key[4];                      //content keys
+    unsigned int   initialization_vector[4];                    //Ctrl64 initial vector
+    unsigned int   stream_len;                                  // stream length in one frame
+    unsigned int   enable_encryption;                           // flag to enable/disable encryption
+    unsigned int   key_increment      : 6;                      // added to content key after unwrapping
+    unsigned int   encryption_mode    : 4;
+    unsigned int   sw_hdr_skip_length :14;                      //vp9 skip bytes setting for secure
+    unsigned int   key_slot_index     : 4;
+    unsigned int   ssm_en             : 1;
+    unsigned int   enable_histogram   : 1;                      // histogram stats output enable
+    unsigned int   reserved0          : 2;
+
+    // Driver may or may not use based upon need.
+    // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode.
+    // Driver can send this value based upon resolution using the formula:
+    // gptimer_timeout_value = 3 * (cycles required for one frame)
+    unsigned int gptimer_timeout_value;
+
+    //general
+    unsigned char  tileformat                 : 2 ;   // 0: TBL; 1: KBL; 2: Tile16x16
+    unsigned char  gob_height                 : 3 ;   // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards)
+    unsigned char  reserverd_surface_format   : 3 ;
+    unsigned char  reserved1[3];
+    unsigned int   Vp9BsdCtrlOffset;                           // bsd buffer offset respect to filter buffer ,256 bytes unit .
+
+
+    //ref_last dimensions
+    unsigned short  ref0_width;    //ref_last coded width
+    unsigned short  ref0_height;   //ref_last coded height
+    unsigned short  ref0_stride[2];    //ref_last stride
+
+    //ref_golden dimensions
+    unsigned short  ref1_width;    //ref_golden coded width
+    unsigned short  ref1_height;   //ref_golden coded height
+    unsigned short  ref1_stride[2];    //ref_golden stride
+
+    //ref_alt dimensions
+    unsigned short  ref2_width;    //ref_alt coded width
+    unsigned short  ref2_height;   //ref_alt coded height
+    unsigned short  ref2_stride[2];    //ref_alt stride
+
+
+    /* Current frame dimensions */
+    unsigned short  width;    //pic width
+    unsigned short  height;   //pic height
+    unsigned short  framestride[2];   // frame buffer stride for luma and chroma
+
+    unsigned char   keyFrame  :1;
+    unsigned char   prevIsKeyFrame:1;
+    unsigned char   resolutionChange:1;
+    unsigned char   errorResilient:1;
+    unsigned char   prevShowFrame:1;
+    unsigned char   intraOnly:1;
+    unsigned char   reserved2 : 2;
+
+    /* DCT coefficient partitions */
+    //unsigned int    offsetToDctParts;
+
+    unsigned char   reserved3[3];
+    //unsigned char   activeRefIdx[3];//3 bits
+    //unsigned char   refreshFrameFlags;
+    //unsigned char   refreshEntropyProbs;
+    //unsigned char   frameParallelDecoding;
+    //unsigned char   resetFrameContext;
+
+    unsigned char   refFrameSignBias[4];
+    char            loopFilterLevel;//6 bits
+    char            loopFilterSharpness;//3 bits
+
+    /* Quantization parameters */
+    unsigned char   qpYAc;
+    char            qpYDc;
+    char            qpChAc;
+    char            qpChDc;
+
+    /* From here down, frame-to-frame persisting stuff */
+
+    char            lossless;
+    char            transform_mode;
+    char            allow_high_precision_mv;
+    char            mcomp_filter_type;
+    char            comp_pred_mode;
+    char            comp_fixed_ref;
+    char            comp_var_ref[2];
+    char            log2_tile_columns;
+    char            log2_tile_rows;
+
+    /* Segment and macroblock specific values */
+    unsigned char   segmentEnabled;
+    unsigned char   segmentMapUpdate;
+    unsigned char   segmentMapTemporalUpdate;
+    unsigned char   segmentFeatureMode; /* ABS data or delta data */
+    unsigned char   segmentFeatureEnable[8][4];
+    short           segmentFeatureData[8][4];
+    char            modeRefLfEnabled;
+    char            mbRefLfDelta[4];
+    char            mbModeLfDelta[2];
+    char            reserved5;            // for alignment
+
+    // New additions
+    nvdec_vp9_pic_v1_s v1;
+    nvdec_pass2_otf_ext_s ssm;
+
+} nvdec_vp9_pic_s;
+
+#define NVDEC_VP9HWPAD(x, y) unsigned char x[y]
+
+typedef struct {
+    /* last bytes of address 41 */
+    unsigned char joints[3];
+    unsigned char sign[2];
+    /* address 42 */
+    unsigned char class0[2][1];
+    unsigned char fp[2][3];
+    unsigned char class0_hp[2];
+    unsigned char hp[2];
+    unsigned char classes[2][10];
+    /* address 43 */
+    unsigned char class0_fp[2][2][3];
+    unsigned char bits[2][10];
+
+} nvdec_nmv_context;
+
+typedef struct {
+    unsigned int joints[4];
+    unsigned int sign[2][2];
+    unsigned int classes[2][11];
+    unsigned int class0[2][2];
+    unsigned int bits[2][10][2];
+    unsigned int class0_fp[2][2][4];
+    unsigned int fp[2][4];
+    unsigned int class0_hp[2][2];
+    unsigned int hp[2][2];
+
+} nvdec_nmv_context_counts;
+
+/* Adaptive entropy contexts, padding elements are added to have
+ * 256 bit aligned tables for HW access.
+ * Compile with TRACE_PROB_TABLES to print bases for each table. */
+typedef struct nvdec_vp9AdaptiveEntropyProbs_s
+{
+    /* address 32 */
+    unsigned char inter_mode_prob[7][4];
+    unsigned char intra_inter_prob[4];
+
+    /* address 33 */
+    unsigned char uv_mode_prob[10][8];
+    unsigned char tx8x8_prob[2][1];
+    unsigned char tx16x16_prob[2][2];
+    unsigned char tx32x32_prob[2][3];
+    unsigned char sb_ymode_probB[4][1];
+    unsigned char sb_ymode_prob[4][8];
+
+    /* address 37 */
+    unsigned char partition_prob[2][16][4];
+
+    /* address 41 */
+    unsigned char uv_mode_probB[10][1];
+    unsigned char switchable_interp_prob[4][2];
+    unsigned char comp_inter_prob[5];
+    unsigned char mbskip_probs[3];
+    NVDEC_VP9HWPAD(pad1, 1);
+
+    nvdec_nmv_context nmvc;
+
+    /* address 44 */
+    unsigned char single_ref_prob[5][2];
+    unsigned char comp_ref_prob[5];
+    NVDEC_VP9HWPAD(pad2, 17);
+
+    /* address 45 */
+    unsigned char probCoeffs[2][2][6][6][4];
+    unsigned char probCoeffs8x8[2][2][6][6][4];
+    unsigned char probCoeffs16x16[2][2][6][6][4];
+    unsigned char probCoeffs32x32[2][2][6][6][4];
+
+} nvdec_vp9AdaptiveEntropyProbs_t;
+
+/* Entropy contexts */
+typedef struct nvdec_vp9EntropyProbs_s
+{
+    /* Default keyframe probs */
+    /* Table formatted for 256b memory, probs 0to7 for all tables followed by
+     * probs 8toN for all tables.
+     * Compile with TRACE_PROB_TABLES to print bases for each table. */
+
+    unsigned char kf_bmode_prob[10][10][8];
+
+    /* Address 25 */
+    unsigned char kf_bmode_probB[10][10][1];
+    unsigned char ref_pred_probs[3];
+    unsigned char mb_segment_tree_probs[7];
+    unsigned char segment_pred_probs[3];
+    unsigned char ref_scores[4];
+    unsigned char prob_comppred[2];
+    NVDEC_VP9HWPAD(pad1, 9);
+
+    /* Address 29 */
+    unsigned char kf_uv_mode_prob[10][8];
+    unsigned char kf_uv_mode_probB[10][1];
+    NVDEC_VP9HWPAD(pad2, 6);
+
+    nvdec_vp9AdaptiveEntropyProbs_t a;    /* Probs with backward adaptation */
+
+} nvdec_vp9EntropyProbs_t;
+
+/* Counters for adaptive entropy contexts */
+typedef struct nvdec_vp9EntropyCounts_s
+{
+    unsigned int inter_mode_counts[7][3][2];
+    unsigned int sb_ymode_counts[4][10];
+    unsigned int uv_mode_counts[10][10];
+    unsigned int partition_counts[16][4];
+    unsigned int switchable_interp_counts[4][3];
+    unsigned int intra_inter_count[4][2];
+    unsigned int comp_inter_count[5][2];
+    unsigned int single_ref_count[5][2][2];
+    unsigned int comp_ref_count[5][2];
+    unsigned int tx32x32_count[2][4];
+    unsigned int tx16x16_count[2][3];
+    unsigned int tx8x8_count[2][2];
+    unsigned int mbskip_count[3][2];
+
+    nvdec_nmv_context_counts nmvcount;
+
+    unsigned int countCoeffs[2][2][6][6][4];
+    unsigned int countCoeffs8x8[2][2][6][6][4];
+    unsigned int countCoeffs16x16[2][2][6][6][4];
+    unsigned int countCoeffs32x32[2][2][6][6][4];
+
+    unsigned int countEobs[4][2][2][6][6];
+
+} nvdec_vp9EntropyCounts_t;
+
+// Legacy codecs encryption parameters
+typedef struct _nvdec_pass2_otf_s {
+    unsigned int   wrapped_session_key[4];  // session keys
+    unsigned int   wrapped_content_key[4];  // content keys
+    unsigned int   initialization_vector[4];// Ctrl64 initial vector
+    unsigned int   enable_encryption : 1;   // flag to enable/disable encryption
+    unsigned int   key_increment     : 6;   // added to content key after unwrapping
+    unsigned int   encryption_mode   : 4;
+    unsigned int   key_slot_index    : 4;
+    unsigned int   ssm_en            : 1;
+    unsigned int   reserved1         :16;   // reserved
+} nvdec_pass2_otf_s; // 0x10 bytes
+
+typedef struct _nvdec_display_param_s
+{
+    unsigned int enableTFOutput    : 1; //=1, enable dbfdma to output the display surface; if disable, then the following configure on tf is useless.
+    //remap for VC1
+    unsigned int VC1MapYFlag       : 1;
+    unsigned int MapYValue         : 3;
+    unsigned int VC1MapUVFlag      : 1;
+    unsigned int MapUVValue        : 3;
+    //tf
+    unsigned int OutStride         : 8;
+    unsigned int TilingFormat      : 3;
+    unsigned int OutputStructure   : 1; //(0=frame, 1=field)
+    unsigned int reserved0         :11;
+    int OutputTop[2];                   // in units of 256
+    int OutputBottom[2];                // in units of 256
+    //histogram
+    unsigned int enableHistogram   : 1; // enable histogram info collection.
+    unsigned int HistogramStartX   :12; // start X of Histogram window
+    unsigned int HistogramStartY   :12; // start Y of Histogram window
+    unsigned int reserved1         : 7;
+    unsigned int HistogramEndX     :12; // end X of Histogram window
+    unsigned int HistogramEndY     :12; // end y of Histogram window
+    unsigned int reserved2         : 8;
+} nvdec_display_param_s;  // size 0x1c bytes
+
+// H.264
+typedef struct _nvdec_dpb_entry_s  // 16 bytes
+{
+    unsigned int index          : 7;    // uncompressed frame buffer index
+    unsigned int col_idx        : 5;    // index of associated co-located motion data buffer
+    unsigned int state          : 2;    // bit1(state)=1: top field used for reference, bit1(state)=1: bottom field used for reference
+    unsigned int is_long_term   : 1;    // 0=short-term, 1=long-term
+    unsigned int not_existing   : 1;    // 1=marked as non-existing
+    unsigned int is_field       : 1;    // set if unpaired field or complementary field pair
+    unsigned int top_field_marking : 4;
+    unsigned int bottom_field_marking : 4;
+    unsigned int output_memory_layout : 1;  // Set according to picture level output NV12/NV24 setting.
+    unsigned int reserved       : 6;
+    unsigned int FieldOrderCnt[2];      // : 2*32 [top/bottom]
+    int FrameIdx;                       // : 16   short-term: FrameNum (16 bits), long-term: LongTermFrameIdx (4 bits)
+} nvdec_dpb_entry_s;
+
+typedef struct _nvdec_h264_pic_s
+{
+    nvdec_pass2_otf_s encryption_params;
+    unsigned char eos[16];
+    unsigned char explicitEOSPresentFlag;
+    unsigned char hint_dump_en; //enable COLOMV surface dump for all frames, which includes hints of "MV/REFIDX/QP/CBP/MBPART/MBTYPE", nvbug: 200212874
+    unsigned char reserved0[2];
+    unsigned int stream_len;
+    unsigned int slice_count;
+    unsigned int mbhist_buffer_size;     // to pass buffer size of MBHIST_BUFFER
+
+    // Driver may or may not use based upon need.
+    // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode.
+    // Driver can send this value based upon resolution using the formula:
+    // gptimer_timeout_value = 3 * (cycles required for one frame)
+    unsigned int gptimer_timeout_value;
+
+    // Fields from msvld_h264_seq_s
+    int log2_max_pic_order_cnt_lsb_minus4;
+    int delta_pic_order_always_zero_flag;
+    int frame_mbs_only_flag;
+    int PicWidthInMbs;
+    int FrameHeightInMbs;
+
+    unsigned int tileFormat                 : 2 ;   // 0: TBL; 1: KBL; 2: Tile16x16
+    unsigned int gob_height                 : 3 ;   // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards)
+    unsigned int reserverd_surface_format   : 27;
+
+    // Fields from msvld_h264_pic_s
+    int entropy_coding_mode_flag;
+    int pic_order_present_flag;
+    int num_ref_idx_l0_active_minus1;
+    int num_ref_idx_l1_active_minus1;
+    int deblocking_filter_control_present_flag;
+    int redundant_pic_cnt_present_flag;
+    int transform_8x8_mode_flag;
+
+    // Fields from mspdec_h264_picture_setup_s
+    unsigned int pitch_luma;                    // Luma pitch
+    unsigned int pitch_chroma;                  // chroma pitch
+
+    unsigned int luma_top_offset;               // offset of luma top field in units of 256
+    unsigned int luma_bot_offset;               // offset of luma bottom field in units of 256
+    unsigned int luma_frame_offset;             // offset of luma frame in units of 256
+    unsigned int chroma_top_offset;             // offset of chroma top field in units of 256
+    unsigned int chroma_bot_offset;             // offset of chroma bottom field in units of 256
+    unsigned int chroma_frame_offset;           // offset of chroma frame in units of 256
+    unsigned int HistBufferSize;                // in units of 256
+
+    unsigned int MbaffFrameFlag           : 1;  //
+    unsigned int direct_8x8_inference_flag: 1;  //
+    unsigned int weighted_pred_flag       : 1;  //
+    unsigned int constrained_intra_pred_flag:1; //
+    unsigned int ref_pic_flag             : 1;  // reference picture (nal_ref_idc != 0)
+    unsigned int field_pic_flag           : 1;  //
+    unsigned int bottom_field_flag        : 1;  //
+    unsigned int second_field             : 1;  // second field of complementary reference field
+    unsigned int log2_max_frame_num_minus4: 4;  //  (0..12)
+    unsigned int chroma_format_idc        : 2;  //
+    unsigned int pic_order_cnt_type       : 2;  //  (0..2)
+    int pic_init_qp_minus26               : 6;  // : 6 (-26..+25)
+    int chroma_qp_index_offset            : 5;  // : 5 (-12..+12)
+    int second_chroma_qp_index_offset     : 5;  // : 5 (-12..+12)
+
+    unsigned int weighted_bipred_idc      : 2;  // : 2 (0..2)
+    unsigned int CurrPicIdx               : 7;  // : 7  uncompressed frame buffer index
+    unsigned int CurrColIdx               : 5;  // : 5  index of associated co-located motion data buffer
+    unsigned int frame_num                : 16; //
+    unsigned int frame_surfaces           : 1;  // frame surfaces flag
+    unsigned int output_memory_layout     : 1;  // 0: NV12; 1:NV24. Field pair must use the same setting.
+
+    int CurrFieldOrderCnt[2];                   // : 32 [Top_Bottom], [0]=TopFieldOrderCnt, [1]=BottomFieldOrderCnt
+    nvdec_dpb_entry_s dpb[16];
+    unsigned char WeightScale[6][4][4];         // : 6*4*4*8 in raster scan order (not zig-zag order)
+    unsigned char WeightScale8x8[2][8][8];      // : 2*8*8*8 in raster scan order (not zig-zag order)
+
+    // mvc setup info, must be zero if not mvc
+    unsigned char num_inter_view_refs_lX[2];         // number of inter-view references
+    char reserved1[14];                               // reserved for alignment
+    signed char inter_view_refidx_lX[2][16];         // DPB indices (must also be marked as long-term)
+
+    // lossless decode (At the time of writing this manual, x264 and JM encoders, differ in Intra_8x8 reference sample filtering)
+    unsigned int lossless_ipred8x8_filter_enable        : 1;       // = 0, skips Intra_8x8 reference sample filtering, for vertical and horizontal predictions (x264 encoded streams); = 1, filter Intra_8x8 reference samples (JM encoded streams)
+    unsigned int qpprime_y_zero_transform_bypass_flag   : 1;       // determines the transform bypass mode
+    unsigned int reserved2                              : 30;      // kept for alignment; may be used for other parameters
+
+    nvdec_display_param_s displayPara;
+    nvdec_pass2_otf_ext_s ssm;
+
+} nvdec_h264_pic_s;
+
+// VC-1 Scratch buffer
+typedef enum _vc1_fcm_e
+{
+    FCM_PROGRESSIVE = 0,
+    FCM_FRAME_INTERLACE = 2,
+    FCM_FIELD_INTERLACE = 3
+} vc1_fcm_e;
+
+typedef enum _syntax_vc1_ptype_e
+{
+    PTYPE_I       = 0,
+    PTYPE_P       = 1,
+    PTYPE_B       = 2,
+    PTYPE_BI      = 3, //PTYPE_BI is not used to config register NV_CNVDEC_VLD_PIC_INFO_COMMON. field NV_CNVDEC_VLD_PIC_INFO_COMMON_PIC_CODING_VC1 is only 2 bits. I and BI pictures are configured with same value. Please refer to manual.
+    PTYPE_SKIPPED = 4
+} syntax_vc1_ptype_e;
+
+// 7.1.1.32, Table 46 etc.
+enum vc1_mvmode_e
+{
+    MVMODE_MIXEDMV                = 0,
+    MVMODE_1MV                    = 1,
+    MVMODE_1MV_HALFPEL            = 2,
+    MVMODE_1MV_HALFPEL_BILINEAR   = 3,
+    MVMODE_INTENSITY_COMPENSATION = 4
+};
+
+// 9.1.1.42, Table 105
+typedef enum _vc1_fptype_e
+{
+    FPTYPE_I_I = 0,
+    FPTYPE_I_P,
+    FPTYPE_P_I,
+    FPTYPE_P_P,
+    FPTYPE_B_B,
+    FPTYPE_B_BI,
+    FPTYPE_BI_B,
+    FPTYPE_BI_BI
+} vc1_fptype_e;
+
+// Table 43 (7.1.1.31.2)
+typedef enum _vc1_dqprofile_e
+{
+    DQPROFILE_ALL_FOUR_EDGES_  = 0,
+    DQPROFILE_DOUBLE_EDGE_     = 1,
+    DQPROFILE_SINGLE_EDGE_     = 2,
+    DQPROFILE_ALL_MACROBLOCKS_ = 3
+} vc1_dqprofile_e;
+
+typedef struct _nvdec_vc1_pic_s
+{
+    nvdec_pass2_otf_s encryption_params;
+    unsigned char eos[16];                    // to pass end of stream data separately if not present in bitstream surface
+    unsigned char prefixStartCode[4];         // used for dxva to pass prefix start code.
+    unsigned int  bitstream_offset;           // offset in words from start of bitstream surface if there is gap.
+    unsigned char explicitEOSPresentFlag;     // to indicate that eos[] is used for passing end of stream data.
+    unsigned char reserved0[3];
+    unsigned int stream_len;
+    unsigned int slice_count;
+    unsigned int scratch_pic_buffer_size;
+
+    // Driver may or may not use based upon need.
+    // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode.
+    // Driver can send this value based upon resolution using the formula:
+    // gptimer_timeout_value = 3 * (cycles required for one frame)
+    unsigned int gptimer_timeout_value;
+
+    // Fields from vc1_seq_s
+    unsigned short FrameWidth;     // actual frame width
+    unsigned short FrameHeight;    // actual frame height
+
+    unsigned char profile;        // 1 = SIMPLE or MAIN, 2 = ADVANCED
+    unsigned char postprocflag;
+    unsigned char pulldown;
+    unsigned char interlace;
+
+    unsigned char tfcntrflag;
+    unsigned char finterpflag;
+    unsigned char psf;
+    unsigned char tileFormat                 : 2 ;   // 0: TBL; 1: KBL; 2: Tile16x16
+    unsigned char gob_height                 : 3 ;   // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards)
+    unsigned char reserverd_surface_format   : 3 ;
+
+    // simple,main
+    unsigned char multires;
+    unsigned char syncmarker;
+    unsigned char rangered;
+    unsigned char maxbframes;
+
+    // Fields from vc1_entrypoint_s
+    unsigned char dquant;
+    unsigned char panscan_flag;
+    unsigned char refdist_flag;
+    unsigned char quantizer;
+
+    unsigned char extended_mv;
+    unsigned char extended_dmv;
+    unsigned char overlap;
+    unsigned char vstransform;
+
+    // Fields from vc1_scratch_s
+    char refdist;
+    char reserved1[3];               // for alignment
+
+    // Fields from vld_vc1_pic_s
+    vc1_fcm_e fcm;
+    syntax_vc1_ptype_e ptype;
+    int tfcntr;
+    int rptfrm;
+    int tff;
+    int rndctrl;
+    int pqindex;
+    int halfqp;
+    int pquantizer;
+    int postproc;
+    int condover;
+    int transacfrm;
+    int transacfrm2;
+    int transdctab;
+    int pqdiff;
+    int abspq;
+    int dquantfrm;
+    vc1_dqprofile_e dqprofile;
+    int dqsbedge;
+    int dqdbedge;
+    int dqbilevel;
+    int mvrange;
+    enum vc1_mvmode_e mvmode;
+    enum vc1_mvmode_e mvmode2;
+    int lumscale;
+    int lumshift;
+    int mvtab;
+    int cbptab;
+    int ttmbf;
+    int ttfrm;
+    int bfraction;
+    vc1_fptype_e fptype;
+    int numref;
+    int reffield;
+    int dmvrange;
+    int intcompfield;
+    int lumscale1;  //  type was char in ucode
+    int lumshift1;  //  type was char in ucode
+    int lumscale2;  //  type was char in ucode
+    int lumshift2;  //  type was char in ucode
+    int mbmodetab;
+    int imvtab;
+    int icbptab;
+    int fourmvbptab;
+    int fourmvswitch;
+    int intcomp;
+    int twomvbptab;
+    // simple,main
+    int rangeredfrm;
+
+    // Fields from pdec_vc1_pic_s
+    unsigned int   HistBufferSize;                  // in units of 256
+    // frame buffers
+    unsigned int   FrameStride[2];                  // [y_c]
+    unsigned int   luma_top_offset;                 // offset of luma top field in units of 256
+    unsigned int   luma_bot_offset;                 // offset of luma bottom field in units of 256
+    unsigned int   luma_frame_offset;               // offset of luma frame in units of 256
+    unsigned int   chroma_top_offset;               // offset of chroma top field in units of 256
+    unsigned int   chroma_bot_offset;               // offset of chroma bottom field in units of 256
+    unsigned int   chroma_frame_offset;             // offset of chroma frame in units of 256
+
+    unsigned short CodedWidth;                      // entrypoint specific
+    unsigned short CodedHeight;                     // entrypoint specific
+
+    unsigned char  loopfilter;                      // entrypoint specific
+    unsigned char  fastuvmc;                        // entrypoint specific
+    unsigned char  output_memory_layout;            // picture specific
+    unsigned char  ref_memory_layout[2];            // picture specific 0: fwd, 1: bwd
+    unsigned char  reserved3[3];                    // for alignment
+
+    nvdec_display_param_s displayPara;
+    nvdec_pass2_otf_ext_s ssm;
+
+} nvdec_vc1_pic_s;
+
+// MPEG-2
+typedef struct _nvdec_mpeg2_pic_s
+{
+    nvdec_pass2_otf_s encryption_params;
+    unsigned char eos[16];
+    unsigned char explicitEOSPresentFlag;
+    unsigned char reserved0[3];
+    unsigned int stream_len;
+    unsigned int slice_count;
+
+    // Driver may or may not use based upon need.
+    // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode.
+    // Driver can send this value based upon resolution using the formula:
+    // gptimer_timeout_value = 3 * (cycles required for one frame)
+    unsigned int gptimer_timeout_value;
+
+    // Fields from vld_mpeg2_seq_pic_info_s
+    short FrameWidth;                   // actual frame width
+    short FrameHeight;                  // actual frame height
+    unsigned char picture_structure;    // 0 => Reserved, 1 => Top field, 2 => Bottom field, 3 => Frame picture. Table 6-14.
+    unsigned char picture_coding_type;  // 0 => Forbidden, 1 => I, 2 => P, 3 => B, 4 => D (for MPEG-2). Table 6-12.
+    unsigned char intra_dc_precision;   // 0 => 8 bits, 1=> 9 bits, 2 => 10 bits, 3 => 11 bits. Table 6-13.
+    char frame_pred_frame_dct;          // as in section 6.3.10
+    char concealment_motion_vectors;    // as in section 6.3.10
+    char intra_vlc_format;              // as in section 6.3.10
+    unsigned char tileFormat                 : 2 ;   // 0: TBL; 1: KBL; 2: Tile16x16
+    unsigned char gob_height                 : 3 ;   // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards)
+    unsigned char reserverd_surface_format   : 3 ;
+
+    char reserved1;                     // always 0
+    char f_code[4];                  // as in section 6.3.10
+
+    // Fields from pdec_mpeg2_picture_setup_s
+    unsigned short PicWidthInMbs;
+    unsigned short  FrameHeightInMbs;
+    unsigned int pitch_luma;
+    unsigned int pitch_chroma;
+    unsigned int luma_top_offset;
+    unsigned int luma_bot_offset;
+    unsigned int luma_frame_offset;
+    unsigned int chroma_top_offset;
+    unsigned int chroma_bot_offset;
+    unsigned int chroma_frame_offset;
+    unsigned int HistBufferSize;
+    unsigned short output_memory_layout;
+    unsigned short alternate_scan;
+    unsigned short secondfield;
+    /******************************/
+    // Got rid of the union kept for compatibility with NVDEC1.
+    // Removed field mpeg2, and kept rounding type.
+    // NVDEC1 ucode is not using the mpeg2 field, instead using codec type from the methods.
+    // Rounding type should only be set for Divx3.11.
+    unsigned short rounding_type;
+    /******************************/
+    unsigned int MbInfoSizeInBytes;
+    unsigned int q_scale_type;
+    unsigned int top_field_first;
+    unsigned int full_pel_fwd_vector;
+    unsigned int full_pel_bwd_vector;
+    unsigned char quant_mat_8x8intra[64];
+    unsigned char quant_mat_8x8nonintra[64];
+    unsigned int ref_memory_layout[2]; //0:for fwd; 1:for bwd
+
+    nvdec_display_param_s displayPara;
+    nvdec_pass2_otf_ext_s ssm;
+
+} nvdec_mpeg2_pic_s;
+
+// MPEG-4
+typedef struct _nvdec_mpeg4_pic_s
+{
+    nvdec_pass2_otf_s encryption_params;
+    unsigned char eos[16];
+    unsigned char explicitEOSPresentFlag;
+    unsigned char reserved2[3];     // for alignment
+    unsigned int stream_len;
+    unsigned int slice_count;
+    unsigned int scratch_pic_buffer_size;
+
+    // Driver may or may not use based upon need.
+    // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode.
+    // Driver can send this value based upon resolution using the formula:
+    // gptimer_timeout_value = 3 * (cycles required for one frame)
+    unsigned int gptimer_timeout_value;
+
+    // Fields from vld_mpeg4_seq_s
+    short FrameWidth;                     // :13 video_object_layer_width
+    short FrameHeight;                    // :13 video_object_layer_height
+    char  vop_time_increment_bitcount;    // : 5 1..16
+    char  resync_marker_disable;          // : 1
+    unsigned char tileFormat                 : 2 ;   // 0: TBL; 1: KBL; 2: Tile16x16
+    unsigned char gob_height                 : 3 ;   // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards)
+    unsigned char reserverd_surface_format   : 3 ;
+    char  reserved3;                      // for alignment
+
+    // Fields from pdec_mpeg4_picture_setup_s
+    int width;                              // : 13
+    int height;                             // : 13
+
+    unsigned int FrameStride[2];            // [y_c]
+    unsigned int luma_top_offset;           // offset of luma top field in units of 256
+    unsigned int luma_bot_offset;           // offset of luma bottom field in units of 256
+    unsigned int luma_frame_offset;         // offset of luma frame in units of 256
+    unsigned int chroma_top_offset;         // offset of chroma top field in units of 256
+    unsigned int chroma_bot_offset;         // offset of chroma bottom field in units of 256
+    unsigned int chroma_frame_offset;       // offset of chroma frame in units of 256
+
+    unsigned int HistBufferSize;            // in units of 256, History buffer size
+
+    int trd[2];                             // : 16, temporal reference frame distance (only needed for B-VOPs)
+    int trb[2];                             // : 16, temporal reference B-VOP distance from fwd reference frame (only needed for B-VOPs)
+
+    int divx_flags;                         // : 16 (bit 0: DivX interlaced chroma rounding, bit 1: Divx 4 boundary padding, bit 2: Divx IDCT)
+
+    short vop_fcode_forward;                // : 1...7
+    short vop_fcode_backward;               // : 1...7
+
+    unsigned char interlaced;               // : 1
+    unsigned char quant_type;               // : 1
+    unsigned char quarter_sample;           // : 1
+    unsigned char short_video_header;       // : 1
+
+    unsigned char curr_output_memory_layout; // : 1 0:NV12; 1:NV24
+    unsigned char ptype;                    // picture type: 0 for PTYPE_I, 1 for PTYPE_P, 2 for PTYPE_B, 3 for PTYPE_BI, 4 for PTYPE_SKIPPED
+    unsigned char rnd;                      // : 1, rounding mode
+    unsigned char alternate_vertical_scan_flag; // : 1
+
+    unsigned char top_field_flag;           // : 1
+    unsigned char reserved0[3];             // alignment purpose
+
+    unsigned char intra_quant_mat[64];      // : 64*8
+    unsigned char nonintra_quant_mat[64];   // : 64*8
+    unsigned char ref_memory_layout[2];    //0:for fwd; 1:for bwd
+    unsigned char reserved1[34];            // 256 byte alignemnt till now
+
+    nvdec_display_param_s displayPara;
+
+} nvdec_mpeg4_pic_s;
+
+// VP8
+enum VP8_FRAME_TYPE
+{
+    VP8_KEYFRAME = 0,
+    VP8_INTERFRAME = 1
+};
+
+enum VP8_FRAME_SFC_ID
+{
+    VP8_GOLDEN_FRAME_SFC = 0,
+    VP8_ALTREF_FRAME_SFC,
+    VP8_LAST_FRAME_SFC,
+    VP8_CURR_FRAME_SFC
+};
+
+typedef struct _nvdec_vp8_pic_s
+{
+    nvdec_pass2_otf_s encryption_params;
+
+    // Driver may or may not use based upon need.
+    // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode.
+    // Driver can send this value based upon resolution using the formula:
+    // gptimer_timeout_value = 3 * (cycles required for one frame)
+    unsigned int gptimer_timeout_value;
+
+    unsigned short FrameWidth;     // actual frame width
+    unsigned short FrameHeight;    // actual frame height
+
+    unsigned char keyFrame;        // 1: key frame; 0: not
+    unsigned char version;
+    unsigned char tileFormat                 : 2 ;   // 0: TBL; 1: KBL; 2: Tile16x16
+    unsigned char gob_height                 : 3 ;   // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards)
+    unsigned char reserverd_surface_format   : 3 ;
+    unsigned char errorConcealOn;  // 1: error conceal on; 0: off
+
+    unsigned int  firstPartSize;   // the size of first partition(frame header and mb header partition)
+
+    // ctx
+    unsigned int   HistBufferSize;                  // in units of 256
+    unsigned int   VLDBufferSize;                   // in units of 1
+    // current frame buffers
+    unsigned int   FrameStride[2];                  // [y_c]
+    unsigned int   luma_top_offset;                 // offset of luma top field in units of 256
+    unsigned int   luma_bot_offset;                 // offset of luma bottom field in units of 256
+    unsigned int   luma_frame_offset;               // offset of luma frame in units of 256
+    unsigned int   chroma_top_offset;               // offset of chroma top field in units of 256
+    unsigned int   chroma_bot_offset;               // offset of chroma bottom field in units of 256
+    unsigned int   chroma_frame_offset;             // offset of chroma frame in units of 256
+
+    nvdec_display_param_s displayPara;
+
+    // decode picture buffere related
+    char current_output_memory_layout;
+    char output_memory_layout[3];  // output NV12/NV24 setting. item 0:golden; 1: altref; 2: last
+
+    unsigned char segmentation_feature_data_update;
+    unsigned char reserved1[3];
+
+    // ucode return result
+    unsigned int resultValue;      // ucode return the picture header info; includes copy_buffer_to_golden etc.
+    unsigned int partition_offset[8];            // byte offset to each token partition (used for encrypted streams only)
+
+    nvdec_pass2_otf_ext_s ssm;
+
+} nvdec_vp8_pic_s; // size is 0xc0
+
+// PASS1
+
+//Sample means the entire frame is encrypted with a single IV, and subsample means a given frame may be encrypted in multiple chunks with different IVs.
+#define NUM_SUBSAMPLES      32
+
+typedef struct _bytes_of_data_s
+{
+    unsigned int    clear_bytes;                    // clear bytes per subsample
+    unsigned int    encypted_bytes;                 // encrypted bytes per subsample
+
+} bytes_of_data_s;
+
+typedef struct _nvdec_pass1_input_data_s
+{
+    bytes_of_data_s sample_size[NUM_SUBSAMPLES];    // clear/encrypted bytes per subsample
+    unsigned int    initialization_vector[NUM_SUBSAMPLES][4];   // Ctrl64 initial vector per subsample
+    unsigned char   IvValid[NUM_SUBSAMPLES];        // each element will tell whether IV is valid for that subsample or not.
+    unsigned int    stream_len;                     // encrypted bitstream size.
+    unsigned int    clearBufferSize;                // allocated size of clear buffer size
+    unsigned int    reencryptBufferSize;            // allocated size of reencrypted buffer size
+    unsigned int    vp8coeffPartitonBufferSize;     // allocated buffer for vp8 coeff partition buffer
+    unsigned int    PrevWidth;                        // required for VP9
+    unsigned int    num_nals        :16;            // number of subsamples in a frame
+    unsigned int    drm_mode        : 8;            // DRM mode
+    unsigned int    key_sel         : 4;            // key select from keyslot
+    unsigned int    codec           : 4;            // codecs selection
+    unsigned int    TotalSizeOfClearData;           // Used with Pattern based encryption
+    unsigned int    SliceHdrOffset;                 // This is used with pattern mode encryption where data before slice hdr comes in clear.
+    unsigned int    EncryptBlkCnt   :16;
+    unsigned int    SkipBlkCnt      :16;
+} nvdec_pass1_input_data_s;
+
+#define VP8_MAX_TOKEN_PARTITIONS     8
+#define VP9_MAX_FRAMES_IN_SUPERFRAME 8
+
+typedef struct _nvdec_pass1_output_data_s
+{
+    unsigned int    clear_header_size;              // h264/vc1/mpeg2/vp8, decrypted pps/sps/part of slice header info, 128 bits aligned
+    unsigned int    reencrypt_data_size;            // h264/vc1/mpeg2, slice level data, vp8 mb header info, 128 bits aligned
+    unsigned int    clear_token_data_size;          // vp8, clear token data saved in VPR, 128 bits aligned
+    unsigned int    key_increment   : 6;            // added to content key after unwrapping
+    unsigned int    encryption_mode : 4;            // encryption mode
+    unsigned int    bReEncrypted    : 1;            // set to 0 if no re-encryption is done.
+    unsigned int    bvp9SuperFrame  : 1;            // set to 1 for vp9 superframe
+    unsigned int    vp9NumFramesMinus1    : 3;      // set equal to numFrames-1 for vp9superframe. Max 8 frames are possible in vp9 superframe.
+    unsigned int    reserved1       :17;            // reserved, 32 bit alignment
+    unsigned int    wrapped_session_key[4];         // session keys
+    unsigned int    wrapped_content_key[4];         // content keys
+    unsigned int    initialization_vector[4];       // Ctrl64 initial vector
+    union {
+        unsigned int    partition_size[VP8_MAX_TOKEN_PARTITIONS];            // size of each token partition (used for encrypted streams of VP8)
+        unsigned int    vp9_frame_sizes[VP9_MAX_FRAMES_IN_SUPERFRAME];       // frame size information for all frames in vp9 superframe.
+    };
+    unsigned int    vp9_clear_hdr_size[VP9_MAX_FRAMES_IN_SUPERFRAME];          // clear header size for each frame in vp9 superframe.
+} nvdec_pass1_output_data_s;
+
+
+/*****************************************************
+            AV1
+*****************************************************/
+typedef struct _scale_factors_reference_s{
+  short             x_scale_fp;                                // horizontal fixed point scale factor
+  short             y_scale_fp;                                // vertical fixed point scale factor
+}scale_factors_reference_s;
+
+typedef struct _frame_info_t{
+    unsigned short  width;                                     // in pixel, av1 support arbitray resolution
+    unsigned short  height;
+    unsigned short  stride[2];                                 // luma and chroma stride in 16Bytes
+    unsigned int    frame_buffer_idx;                          // TBD :clean associate the reference frame and frame buffer id to lookup base_addr
+} frame_info_t;
+
+typedef struct _ref_frame_struct_s{
+    frame_info_t    info;
+    scale_factors_reference_s sf;                              // scalefactor for reference frame and current frame size, driver can calculate it
+    unsigned char   sign_bias                    : 1;          // calcuate based on frame_offset and current frame offset
+    unsigned char   wmtype                       : 2;          // global motion parameters : identity,translation,rotzoom,affine
+    unsigned char   reserved_rf                  : 5;
+    short           frame_off;                                 // relative offset to current frame
+    short           roffset;                                   // relative offset from current frame
+} ref_frame_struct_s;
+
+typedef struct _av1_fgs_cfg_t{
+    //from AV1 spec 5.9.30 Film Grain Params syntax
+    unsigned short apply_grain                   : 1;
+    unsigned short overlap_flag                  : 1;
+    unsigned short clip_to_restricted_range      : 1;
+    unsigned short chroma_scaling_from_luma      : 1;
+    unsigned short num_y_points_b                : 1;          // flag indicates num_y_points>0
+    unsigned short num_cb_points_b               : 1;          // flag indicates num_cb_points>0
+    unsigned short num_cr_points_b               : 1;          // flag indicates num_cr_points>0
+    unsigned short scaling_shift                 : 4;
+    unsigned short reserved_fgs                  : 5;
+	unsigned short sw_random_seed;
+	short          cb_offset;
+	short          cr_offset;
+	char           cb_mult;
+	char           cb_luma_mult;
+	char           cr_mult;
+	char           cr_luma_mult;
+} av1_fgs_cfg_t;
+
+
+typedef struct _nvdec_av1_pic_s
+{
+    nvdec_pass2_otf_s encryption_params;
+
+    nvdec_pass2_otf_ext_s ssm;
+
+    av1_fgs_cfg_t fgs_cfg;
+
+    // Driver may or may not use based upon need.
+    // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode.
+    // Driver can send this value based upon resolution using the formula:
+    // gptimer_timeout_value = 3 * (cycles required for one frame)
+    unsigned int    gptimer_timeout_value;
+
+    unsigned int    stream_len;                                // stream length.
+    unsigned int    reserved12;                                // skip bytes length to real frame data .
+
+    //sequence header
+    unsigned int    use_128x128_superblock       : 1;          // superblock 128x128 or 64x64, 0:64x64, 1: 128x128
+    unsigned int    chroma_format                : 2;          // 1:420, others:reserved for future
+    unsigned int    bit_depth                    : 4;          // bitdepth
+    unsigned int    enable_filter_intra          : 1;          // tool enable in seq level, 0 : disable 1: frame header control
+    unsigned int    enable_intra_edge_filter     : 1;
+    unsigned int    enable_interintra_compound   : 1;
+    unsigned int    enable_masked_compound       : 1;
+    unsigned int    enable_dual_filter           : 1;          // enable or disable vertical and horiz filter selection
+    unsigned int    reserved10                   : 1;          // 0 - disable order hint, and related tools
+    unsigned int    reserved0                    : 3;
+    unsigned int    enable_jnt_comp              : 1;          // 0 - disable joint compound modes
+    unsigned int    reserved1                    : 1;
+    unsigned int    enable_cdef                  : 1;
+    unsigned int    reserved11                   : 1;
+    unsigned int    enable_fgs                   : 1;
+    unsigned int    enable_substream_decoding    : 1;          //enable frame substream kickoff mode without context switch
+    unsigned int    reserved2                    : 10;         // reserved bits
+
+    //frame header
+    unsigned int    frame_type                   : 2;          // 0:Key frame, 1:Inter frame, 2:intra only, 3:s-frame
+    unsigned int    show_frame                   : 1;          // show frame flag
+    unsigned int    reserved13                   : 1;
+    unsigned int    disable_cdf_update           : 1;          // disable CDF update during symbol decoding
+    unsigned int    allow_screen_content_tools   : 1;          // screen content tool enable
+    unsigned int    cur_frame_force_integer_mv   : 1;          // AMVR enable
+    unsigned int    scale_denom_minus9           : 3;          // The denominator minus9  of the superres scale
+    unsigned int    allow_intrabc                : 1;          // IBC enable
+    unsigned int    allow_high_precision_mv      : 1;          // 1/8 precision mv enable
+    unsigned int    interp_filter                : 3;          // interpolation filter : EIGHTTAP_REGULAR,....
+    unsigned int    switchable_motion_mode       : 1;          // 0: simple motion mode, 1: SIMPLE, OBMC, LOCAL  WARP
+    unsigned int    use_ref_frame_mvs            : 1;          // 1: current frame can use the previous frame mv information, MFMV
+    unsigned int    refresh_frame_context        : 1;          // backward update flag
+    unsigned int    delta_q_present_flag         : 1;          // quantizer index delta values are present in the block level
+    unsigned int    delta_q_res                  : 2;          // left shift will apply to decoded quantizer index delta values
+    unsigned int    delta_lf_present_flag        : 1;          // specified whether loop filter delta values are present in the block level
+    unsigned int    delta_lf_res                 : 2;          // specifies the left shift will apply to decoded loop filter values
+    unsigned int    delta_lf_multi               : 1;          // seperate loop filter deltas for Hy,Vy,U,V edges
+    unsigned int    reserved3                    : 1;
+    unsigned int    coded_lossless               : 1;          // 1 means all segments use lossless coding. Frame is fully lossless, CDEF/DBF will disable
+    unsigned int    tile_enabled                 : 1;          // tile enable
+    unsigned int    reserved4                    : 2;
+    unsigned int    superres_is_scaled           : 1;          // frame level frame for using_superres
+    unsigned int    reserved_fh                  : 1;
+
+    unsigned int    tile_cols                    : 8;          // horizontal tile numbers in frame, max is 64
+    unsigned int    tile_rows                    : 8;          // vertical tile numbers in frame, max is 64
+    unsigned int    context_update_tile_id       : 16;         // which tile cdf will be seleted as the backward update CDF, MAXTILEROW=64, MAXTILECOL=64, 12bits
+
+    unsigned int    cdef_damping_minus_3         : 2;          // controls the amount of damping in the deringing filter
+    unsigned int    cdef_bits                    : 2;          // the number of bits needed to specify which CDEF filter to apply
+    unsigned int    frame_tx_mode                : 3;          // 0:ONLY4x4,3:LARGEST,4:SELECT
+    unsigned int    frame_reference_mode         : 2;          // single,compound,select
+    unsigned int    skip_mode_flag               : 1;          // skip mode
+    unsigned int    skip_ref0                    : 4;
+    unsigned int    skip_ref1                    : 4;
+    unsigned int    allow_warp                   : 1;          // sequence level & frame level warp enable
+    unsigned int    reduced_tx_set_used          : 1;          // whether the frame is  restricted to oa reduced subset of the full set of transform types
+    unsigned int    ref_scaling_enable           : 1;
+    unsigned int    reserved5                    : 1;
+    unsigned int    reserved6                    : 10;         // reserved bits
+    unsigned short  superres_upscaled_width;                   // upscale width, frame_size_with_refs() syntax,restoration will use it
+    unsigned short  superres_luma_step;
+    unsigned short  superres_chroma_step;
+    unsigned short  superres_init_luma_subpel_x;
+    unsigned short  superres_init_chroma_subpel_x;
+
+    /*frame header qp information*/
+    unsigned char   base_qindex;                               // the maximum qp is 255
+    char            y_dc_delta_q;
+    char            u_dc_delta_q;
+    char            v_dc_delta_q;
+    char            u_ac_delta_q;
+    char            v_ac_delta_q;
+    unsigned char   qm_y;                                      // 4bit: 0-15
+    unsigned char   qm_u;
+    unsigned char   qm_v;
+
+    /*cdef, need to update in the new spec*/
+    unsigned int    cdef_y_pri_strength;                       // 4bit for one, max is 8
+    unsigned int    cdef_uv_pri_strength;                      // 4bit for one, max is 8
+    unsigned int    cdef_y_sec_strength          : 16;         // 2bit for one, max is 8
+    unsigned int    cdef_uv_sec_strength         : 16;         // 2bit for one, max is 8
+
+    /*segmentation*/
+    unsigned char   segment_enabled;
+    unsigned char   segment_update_map;
+    unsigned char   reserved7;
+    unsigned char   segment_temporal_update;
+    short           segment_feature_data[8][8];
+    unsigned char   last_active_segid;                         // The highest numbered segment id that has some enabled feature.
+    unsigned char   segid_preskip;                             // Whether the segment id will be read before the skip syntax element.
+                                                               // 1: the segment id will be read first.
+                                                               // 0: the skip syntax element will be read first.
+    unsigned char   prevsegid_flag;                            // 1 : previous segment id is  available
+    unsigned char   segment_quant_sign           : 8;          // sign bit for segment alternative QP
+
+    /*loopfilter*/
+    unsigned char   filter_level[2];
+    unsigned char   filter_level_u;
+    unsigned char   filter_level_v;
+    unsigned char   lf_sharpness_level;
+    char            lf_ref_deltas[8];                          // 0 = Intra, Last, Last2+Last3, GF, BRF, ARF2, ARF
+    char            lf_mode_deltas[2];                         // 0 = ZERO_MV, MV
+
+    /*restoration*/
+    unsigned char   lr_type ;                                  // restoration type.  Y:bit[1:0];U:bit[3:2],V:bit[5:4]
+    unsigned char   lr_unit_size;                              // restoration unit size 0:32x32, 1:64x64, 2:128x128,3:256x256;  Y:bit[1:0];U:bit[3:2],V:bit[5:4]
+
+    //general
+    frame_info_t    current_frame;
+    ref_frame_struct_s ref_frame[7];                           // Last, Last2, Last3, Golden, BWDREF, ALTREF2, ALTREF
+
+    unsigned int    use_temporal0_mvs            : 1;
+    unsigned int    use_temporal1_mvs            : 1;
+    unsigned int    use_temporal2_mvs            : 1;
+    unsigned int    mf1_type                     : 3;
+    unsigned int    mf2_type                     : 3;
+    unsigned int    mf3_type                     : 3;
+    unsigned int    reserved_mfmv                : 20;
+
+    short           mfmv_offset[3][7];                         // 3: mf0~2, 7: Last, Last2, Last3, Golden, BWDREF, ALTREF2, ALTREF
+    char            mfmv_side[3][7];                           // flag for reverse offset great than 0
+                                                               // MFMV relative offset from the ref frame(reference to reference relative offset)
+
+    unsigned char   tileformat                   : 2;          // 0: TBL; 1: KBL;
+    unsigned char   gob_height                   : 3;          // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards)
+    unsigned char   errorConcealOn               : 1;          // this field is not used, use ctrl_param.error_conceal_on to enable error concealment in ucode,
+                                                               // always set NV_CNVDEC_GIP_ERR_CONCEAL_CTRL_ON = 1 to enable error detect in hw
+    unsigned char   reserver8                    : 2;          // reserve
+
+    unsigned char   stream_error_detection       : 1;
+    unsigned char   mv_error_detection           : 1;
+    unsigned char   coeff_error_detection        : 1;
+    unsigned char   reserved_eh                  : 5;
+
+    // Filt neighbor buffer offset
+    unsigned int    Av1FltTopOffset;                           // filter top buffer offset respect to filter buffer, 256 bytes unit
+    unsigned int    Av1FltVertOffset;                          // filter vertical buffer offset respect to filter buffer, 256 bytes unit
+    unsigned int    Av1CdefVertOffset;                         // cdef vertical buffer offset respect to filter buffer, 256 bytes unit
+    unsigned int    Av1LrVertOffset;                           // lr vertical buffer offset respect to filter buffer, 256 bytes unit
+    unsigned int    Av1HusVertOffset;                          // hus vertical buffer offset respect to filter buffer, 256 bytes unit
+    unsigned int    Av1FgsVertOffset;                          // fgs vertical buffer offset respect to filter buffer, 256 bytes unit
+
+    unsigned int    enable_histogram             : 1;
+    unsigned int    sw_skip_start_length         : 14;         //skip start length
+    unsigned int    reserved_stat                : 17;
+
+} nvdec_av1_pic_s;
+
+//////////////////////////////////////////////////////////////////////
+// AV1 Buffer structure
+//////////////////////////////////////////////////////////////////////
+typedef struct _AV1FilmGrainMemory
+ {
+    unsigned char   scaling_lut_y[256];
+    unsigned char   scaling_lut_cb[256];
+    unsigned char   scaling_lut_cr[256];
+    short           cropped_luma_grain_block[4096];
+    short           cropped_cb_grain_block[1024];
+    short           cropped_cr_grain_block[1024];
+} AV1FilmGrainMemory;
+
+typedef struct _AV1TileInfo_OLD
+{
+    unsigned char   width_in_sb;
+    unsigned char   height_in_sb;
+    unsigned char   tile_start_b0;
+    unsigned char   tile_start_b1;
+    unsigned char   tile_start_b2;
+    unsigned char   tile_start_b3;
+    unsigned char   tile_end_b0;
+    unsigned char   tile_end_b1;
+    unsigned char   tile_end_b2;
+    unsigned char   tile_end_b3;
+    unsigned char   padding[6];
+} AV1TileInfo_OLD;
+
+typedef struct _AV1TileInfo
+{
+    unsigned char   width_in_sb;
+    unsigned char   padding_w;
+    unsigned char   height_in_sb;
+    unsigned char   padding_h;
+} AV1TileInfo;
+
+typedef struct _AV1TileStreamInfo
+{
+    unsigned int    tile_start;
+    unsigned int    tile_end;
+    unsigned char   padding[8];
+} AV1TileStreamInfo;
+
+
+// AV1 TileSize buffer
+#define AV1_MAX_TILES                       256
+#define AV1_TILEINFO_BUF_SIZE_OLD           NVDEC_ALIGN(AV1_MAX_TILES * sizeof(AV1TileInfo_OLD))
+#define AV1_TILEINFO_BUF_SIZE               NVDEC_ALIGN(AV1_MAX_TILES * sizeof(AV1TileInfo))
+
+// AV1 TileStreamInfo buffer
+#define AV1_TILESTREAMINFO_BUF_SIZE         NVDEC_ALIGN(AV1_MAX_TILES * sizeof(AV1TileStreamInfo))
+
+// AV1 SubStreamEntry buffer
+#define MAX_SUBSTREAM_ENTRY_SIZE            32
+#define AV1_SUBSTREAM_ENTRY_BUF_SIZE        NVDEC_ALIGN(MAX_SUBSTREAM_ENTRY_SIZE * sizeof(nvdec_substream_entry_s))
+
+// AV1 FilmGrain Parameter buffer
+#define AV1_FGS_BUF_SIZE                    NVDEC_ALIGN(sizeof(AV1FilmGrainMemory))
+
+// AV1 Temporal MV buffer
+#define AV1_TEMPORAL_MV_SIZE_IN_64x64       256            // 4Bytes for 8x8
+#define AV1_TEMPORAL_MV_BUF_SIZE(w, h)      ALIGN_UP( ALIGN_UP(w,128) * ALIGN_UP(h,128) / (64*64) * AV1_TEMPORAL_MV_SIZE_IN_64x64, 4096)
+
+// AV1 SegmentID buffer
+#define AV1_SEGMENT_ID_SIZE_IN_64x64        128            // (3bits + 1 pad_bits) for 4x4
+#define AV1_SEGMENT_ID_BUF_SIZE(w, h)       ALIGN_UP( ALIGN_UP(w,128) * ALIGN_UP(h,128) / (64*64) * AV1_SEGMENT_ID_SIZE_IN_64x64, 4096)
+
+// AV1 Global Motion buffer
+#define AV1_GLOBAL_MOTION_BUF_SIZE          NVDEC_ALIGN(7*32)
+
+// AV1 Intra Top buffer
+#define AV1_INTRA_TOP_BUF_SIZE              NVDEC_ALIGN(8*8192)
+
+// AV1 Histogram buffer
+#define AV1_HISTOGRAM_BUF_SIZE              NVDEC_ALIGN(1024)
+
+// AV1 Filter FG buffer
+#define AV1_DBLK_TOP_SIZE_IN_SB64           ALIGN_UP(1920, 128)
+#define AV1_DBLK_TOP_BUF_SIZE(w)            NVDEC_ALIGN( (ALIGN_UP(w,64)/64 + 2) * AV1_DBLK_TOP_SIZE_IN_SB64)
+
+#define AV1_DBLK_LEFT_SIZE_IN_SB64          ALIGN_UP(1536, 128)
+#define AV1_DBLK_LEFT_BUF_SIZE(h)           NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * AV1_DBLK_LEFT_SIZE_IN_SB64)
+
+#define AV1_CDEF_LEFT_SIZE_IN_SB64          ALIGN_UP(1792, 128)
+#define AV1_CDEF_LEFT_BUF_SIZE(h)           NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * AV1_CDEF_LEFT_SIZE_IN_SB64)
+
+#define AV1_HUS_LEFT_SIZE_IN_SB64           ALIGN_UP(12544, 128)
+#define AV1_ASIC_HUS_LEFT_BUFFER_SIZE(h)    NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * AV1_HUS_LEFT_SIZE_IN_SB64)
+#define AV1_HUS_LEFT_BUF_SIZE(h)            2*AV1_ASIC_HUS_LEFT_BUFFER_SIZE(h)     // Ping-Pong buffers
+
+#define AV1_LR_LEFT_SIZE_IN_SB64            ALIGN_UP(1920, 128)
+#define AV1_LR_LEFT_BUF_SIZE(h)             NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * AV1_LR_LEFT_SIZE_IN_SB64)
+
+#define AV1_FGS_LEFT_SIZE_IN_SB64           ALIGN_UP(320, 128)
+#define AV1_FGS_LEFT_BUF_SIZE(h)            NVDEC_ALIGN( (ALIGN_UP(h,64)/64 + 2) * AV1_FGS_LEFT_SIZE_IN_SB64)
+
+// AV1 Hint Dump Buffer
+#define AV1_HINT_DUMP_SIZE_IN_SB64          ((64*64)/(4*4)*8)           // 8 bytes per CU, 256 CUs(2048 bytes) per SB64
+#define AV1_HINT_DUMP_SIZE_IN_SB128         ((128*128)/(4*4)*8)         // 8 bytes per CU,1024 CUs(8192 bytes) per SB128
+#define AV1_HINT_DUMP_SIZE(w, h)            NVDEC_ALIGN(AV1_HINT_DUMP_SIZE_IN_SB128*((w+127)/128)*((h+127)/128))  // always use SB128 for allocation
+
+
+/*******************************************************************
+                New  H264
+********************************************************************/
+typedef struct _nvdec_new_h264_pic_s
+{
+    nvdec_pass2_otf_s encryption_params;
+    unsigned char eos[16];
+    unsigned char explicitEOSPresentFlag;
+    unsigned char hint_dump_en; //enable COLOMV surface dump for all frames, which includes hints of "MV/REFIDX/QP/CBP/MBPART/MBTYPE", nvbug: 200212874
+    unsigned char reserved0[2];
+    unsigned int stream_len;
+    unsigned int slice_count;
+    unsigned int mbhist_buffer_size;     // to pass buffer size of MBHIST_BUFFER
+
+    // Driver may or may not use based upon need.
+    // If 0 then default value of 1<<27 = 298ms @ 450MHz will be used in ucode.
+    // Driver can send this value based upon resolution using the formula:
+    // gptimer_timeout_value = 3 * (cycles required for one frame)
+    unsigned int gptimer_timeout_value;
+
+    // Fields from msvld_h264_seq_s
+    int log2_max_pic_order_cnt_lsb_minus4;
+    int delta_pic_order_always_zero_flag;
+    int frame_mbs_only_flag;
+    int PicWidthInMbs;
+    int FrameHeightInMbs;
+
+    unsigned int tileFormat                 : 2 ;   // 0: TBL; 1: KBL; 2: Tile16x16
+    unsigned int gob_height                 : 3 ;   // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards)
+    unsigned int reserverd_surface_format   : 27;
+
+    // Fields from msvld_h264_pic_s
+    int entropy_coding_mode_flag;
+    int pic_order_present_flag;
+    int num_ref_idx_l0_active_minus1;
+    int num_ref_idx_l1_active_minus1;
+    int deblocking_filter_control_present_flag;
+    int redundant_pic_cnt_present_flag;
+    int transform_8x8_mode_flag;
+
+    // Fields from mspdec_h264_picture_setup_s
+    unsigned int pitch_luma;                    // Luma pitch
+    unsigned int pitch_chroma;                  // chroma pitch
+
+    unsigned int luma_top_offset;               // offset of luma top field in units of 256
+    unsigned int luma_bot_offset;               // offset of luma bottom field in units of 256
+    unsigned int luma_frame_offset;             // offset of luma frame in units of 256
+    unsigned int chroma_top_offset;             // offset of chroma top field in units of 256
+    unsigned int chroma_bot_offset;             // offset of chroma bottom field in units of 256
+    unsigned int chroma_frame_offset;           // offset of chroma frame in units of 256
+    unsigned int HistBufferSize;                // in units of 256
+
+    unsigned int MbaffFrameFlag           : 1;  //
+    unsigned int direct_8x8_inference_flag: 1;  //
+    unsigned int weighted_pred_flag       : 1;  //
+    unsigned int constrained_intra_pred_flag:1; //
+    unsigned int ref_pic_flag             : 1;  // reference picture (nal_ref_idc != 0)
+    unsigned int field_pic_flag           : 1;  //
+    unsigned int bottom_field_flag        : 1;  //
+    unsigned int second_field             : 1;  // second field of complementary reference field
+    unsigned int log2_max_frame_num_minus4: 4;  //  (0..12)
+    unsigned int chroma_format_idc        : 2;  //
+    unsigned int pic_order_cnt_type       : 2;  //  (0..2)
+    int pic_init_qp_minus26               : 6;  // : 6 (-26..+25)
+    int chroma_qp_index_offset            : 5;  // : 5 (-12..+12)
+    int second_chroma_qp_index_offset     : 5;  // : 5 (-12..+12)
+
+    unsigned int weighted_bipred_idc      : 2;  // : 2 (0..2)
+    unsigned int CurrPicIdx               : 7;  // : 7  uncompressed frame buffer index
+    unsigned int CurrColIdx               : 5;  // : 5  index of associated co-located motion data buffer
+    unsigned int frame_num                : 16; //
+    unsigned int frame_surfaces           : 1;  // frame surfaces flag
+    unsigned int output_memory_layout     : 1;  // 0: NV12; 1:NV24. Field pair must use the same setting.
+
+    int CurrFieldOrderCnt[2];                   // : 32 [Top_Bottom], [0]=TopFieldOrderCnt, [1]=BottomFieldOrderCnt
+    nvdec_dpb_entry_s dpb[16];
+    unsigned char WeightScale[6][4][4];         // : 6*4*4*8 in raster scan order (not zig-zag order)
+    unsigned char WeightScale8x8[2][8][8];      // : 2*8*8*8 in raster scan order (not zig-zag order)
+
+    // mvc setup info, must be zero if not mvc
+    unsigned char num_inter_view_refs_lX[2];         // number of inter-view references
+    char reserved1[14];                               // reserved for alignment
+    signed char inter_view_refidx_lX[2][16];         // DPB indices (must also be marked as long-term)
+
+    // lossless decode (At the time of writing this manual, x264 and JM encoders, differ in Intra_8x8 reference sample filtering)
+    unsigned int lossless_ipred8x8_filter_enable        : 1;       // = 0, skips Intra_8x8 reference sample filtering, for vertical and horizontal predictions (x264 encoded streams); = 1, filter Intra_8x8 reference samples (JM encoded streams)
+    unsigned int qpprime_y_zero_transform_bypass_flag   : 1;       // determines the transform bypass mode
+    unsigned int reserved2                              : 30;      // kept for alignment; may be used for other parameters
+
+    nvdec_display_param_s displayPara;
+    nvdec_pass2_otf_ext_s ssm;
+
+} nvdec_new_h264_pic_s;
+
+// golden crc struct dumped into surface
+// for each part, if golden crc compare is enabled, one interface is selected to do crc calculation in vmod.
+// vmod's crc is compared with cmod's golden crc (4*32 bits), and compare reuslt is written into surface.
+typedef struct
+{
+    // input
+    unsigned int    dbg_crc_enable_partb    : 1;    // Eable flag for enable/disable interface crc calculation in NVDEC HW's part b
+    unsigned int    dbg_crc_enable_partc    : 1;    // Eable flag for enable/disable interface crc calculation in NVDEC HW's part c
+    unsigned int    dbg_crc_enable_partd    : 1;    // Eable flag for enable/disable interface crc calculation in NVDEC HW's part d
+    unsigned int    dbg_crc_enable_parte    : 1;    // Eable flag for enable/disable interface crc calculation in NVDEC HW's part e
+    unsigned int    dbg_crc_intf_partb      : 6;    // For partb to select which interface to compare crc. see DBG_CRC_PARTE_INTF_SEL for detailed control value for each interface
+    unsigned int    dbg_crc_intf_partc      : 6;    // For partc to select which interface to compare crc. see DBG_CRC_PARTE_INTF_SEL for detailed control value for each interface
+    unsigned int    dbg_crc_intf_partd      : 6;    // For partd to select which interface to compare crc. see DBG_CRC_PARTE_INTF_SEL for detailed control value for each interface
+    unsigned int    dbg_crc_intf_parte      : 6;    // For parte to select which interface to compare crc. see DBG_CRC_PARTE_INTF_SEL for detailed control value for each interface
+    unsigned int    reserved0               : 4;
+
+    unsigned int    dbg_crc_partb_golden[4];        // Golden crc values for part b
+    unsigned int    dbg_crc_partc_golden[4];        // Golden crc values for part c
+    unsigned int    dbg_crc_partd_golden[4];        // Golden crc values for part d
+    unsigned int    dbg_crc_parte_golden[4];        // Golden crc values for part e
+
+    // output
+    unsigned int    dbg_crc_comp_partb      : 4;    // Compare result for part b
+    unsigned int    dbg_crc_comp_partc      : 4;    // Compare result for part c
+    unsigned int    dbg_crc_comp_partd      : 4;    // Compare result for part d
+    unsigned int    dbg_crc_comp_parte      : 4;    // Compare result for part e
+    unsigned int    reserved1               : 16;
+
+    unsigned char   reserved2[56];
+}nvdec_crc_s;                                       // 128 Bytes
+
+#endif /* AVUTIL_DRV_NVDEC_H */
diff --git a/libavutil/nvjpg_drv.h b/libavutil/nvjpg_drv.h
new file mode 100644
index 0000000000..cd8e976952
--- /dev/null
+++ b/libavutil/nvjpg_drv.h
@@ -0,0 +1,189 @@
+/*******************************************************************************
+    Copyright (c) 2016-2020, NVIDIA CORPORATION. All rights reserved.
+
+    Permission is hereby granted, free of charge, to any person obtaining a
+    copy of this software and associated documentation files (the "Software"),
+    to deal in the Software without restriction, including without limitation
+    the rights to use, copy, modify, merge, publish, distribute, sublicense,
+    and/or sell copies of the Software, and to permit persons to whom the
+    Software is furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in
+    all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+    THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS IN THE SOFTWARE.
+
+*******************************************************************************/
+
+#ifndef AVUTIL_NVJPG_DRV_H
+#define AVUTIL_NVJPG_DRV_H
+
+#include <stdint.h>
+
+typedef uint8_t  NvU8;
+typedef uint16_t NvU16;
+typedef uint32_t NvU32;
+typedef uint64_t NvU64;
+typedef  int8_t  NvS8;
+typedef  int16_t NvS16;
+typedef  int32_t NvS32;
+typedef  int64_t NvS64;
+typedef _Bool    NvBool;
+
+//
+// CLASS NV_E7D0_NVJPG
+//
+// NVJPG is the combination of JPEG decoder and encoder, it will support baseline sequential profile.
+// In the encoder side, it support: a. 420 pitch linear format, b. programable huffman/qunat table, ... etc.
+// In the decoder side, it support: a. 400/420/422/444 decoding, b. YUV2RGB, c. Power2Scale: 1/2, 1/4, 1/8, d.ChromaSumbSample ... etc.
+// ===================
+
+
+// huffuman table:
+// huffuman table is organized in symbol value order, each table item include 2 field, codeWord length, and codeWord value
+#define DCVALUEITEM 12
+#define ACVALUEITEM 256 // in fact, only 162 items are used in baseline sequential profile.
+typedef struct
+{
+    unsigned short length;             // 4 bit, code word length
+    unsigned short value;              // 16 bit, code word value
+}huffman_symbol_s;
+
+
+typedef struct
+{
+    // surface related
+    unsigned int   bitstream_start_off;// start offset position in bitstream buffer where data should be written (byte offset)
+    unsigned int   bitstream_buf_size; // size in bytes of the buffer allocated for bitstream slice/mb data
+    unsigned int   luma_stride;        // 64 bytes align;
+    unsigned int   chroma_stride;      // 64 bytes align;
+    unsigned int   inputType    : 4;   // 0: YUV; 1: RGB, 2: BGR, 3:RGBA, 4: BGRA, 5: ABGR, 6: ARGB
+    unsigned int   chromaFormat : 2;   // chroma format: 0: 444; 1: 422H; 2:422V; 3:420
+    unsigned int   tilingMode   : 2;   // 0: linear; 1: GPU_blkLinear; 2: Tegra_blkLinear
+    unsigned int   gobHeight    : 3;   // used for blkLinear, 0: 2; 1: 4; ... 4: 32
+    unsigned int   yuvMemoryMode: 3;   // 0-semi planar nv12; 1-semi planar nv21; 2-plane(yuy2); 3-planar
+    unsigned int   reserved_0   : 18;
+    // control para
+    unsigned short imageWidth;         // real image width, up to 16K
+    unsigned short imageHeight;        // real image height, up to 16K
+    unsigned short jpegWidth;          // image width align to 8 or 16 pixel
+    unsigned short jpegHeight;         // image height align to 8 or 16 pixel
+    unsigned int   totalMcu;
+    unsigned int   widthMcu;
+    unsigned int   heightMcu;
+    unsigned int   restartInterval;    // restart interval, 0 means disable the restart feature
+
+    // rate control related
+    unsigned int   rateControl   : 2;  // RC: 0:disable; 1:block-base; others: reserve
+    unsigned int   rcTargetYBits : 11; // target luma bits per block, [0 ~ (1<<11)-1]
+    unsigned int   rcTargetCBits : 11; // target chroma bits per block, [0 ~ (1<<11)-1]
+    unsigned int   reserved_1    : 8;
+    unsigned int   preQuant      : 1;  // pre quant trunction enabled flag
+    unsigned int   rcThreshIdx   : 8;  // pre_quant threshold index [1 ~ 63]
+    unsigned int   rcThreshMag   : 21; // threshold magnitude
+    // mjpeg-typeB
+    unsigned int   isMjpgTypeB   : 1;  // a flag indicate mjpg type B format, which control HW no stuff byte.
+    unsigned int   reserved_2    : 1;
+    // huffman tables
+    huffman_symbol_s hfDcLuma[DCVALUEITEM];   //dc luma huffman table, arranged in symbol increase order, encoder can directly index and use
+    huffman_symbol_s hfAcLuma[ACVALUEITEM];   //ac luma huffman table, arranged in symbol increase order, encoder can directly index and use
+    huffman_symbol_s hfDcChroma[DCVALUEITEM]; //dc chroma huffman table, arranged in symbol increase order, encoder can directly index and use
+    huffman_symbol_s hfAcChroma[ACVALUEITEM]; //ac chroma huffman table, arranged in symbol increase order, encoder can directly index and use
+    // quantization tables
+    unsigned short  quantLumaFactor[64];        //luma quantize factor table, arranged in horizontal scan order, (1<<15)/quantLuma
+    unsigned short  quantChromaFactor[64];      //chroma quantize factor table, arranged in horizontal scan order, (1<<15)/quantLuma
+
+    unsigned char  reserve[0x6c];
+}nvjpg_enc_drv_pic_setup_s;
+
+typedef struct
+{
+    unsigned int bitstream_size; //exact residual part bitstram size of current image
+    unsigned int mcu_x;          //encoded mcu_x
+    unsigned int mcu_y;          //encoded mcu_y
+    unsigned int cycle_count;
+    unsigned int error_status;   //report error if any
+    unsigned char reserved1[12];
+}nvjpg_enc_status;
+
+struct ctrl_param_s
+{
+    union
+    {
+        struct
+        {
+            unsigned int gptimer_on         :1;
+            unsigned int dump_cycle         :1;
+            unsigned int debug_mode         :1;
+            unsigned int reserved           :29;
+        }bits;
+        unsigned int data;
+    };
+};
+
+
+//NVJPG Decoder class interface
+typedef struct
+{
+    int codeNum[16]; //the number of huffman code with length i
+    unsigned char minCodeIdx[16]; //the index of the min huffman code with length i
+    int minCode[16];  //the min huffman code with length i
+    unsigned char symbol[162]; // symbol need to be coded.
+    unsigned char reserved[2]; // alignment
+}huffman_tab_s;
+
+typedef struct
+{
+    unsigned char hblock;
+    unsigned char vblock;
+    unsigned char quant;
+    unsigned char ac;
+    unsigned char dc;
+    unsigned char reserved[3]; //alignment
+} block_parameter_s;
+
+typedef struct
+{
+    huffman_tab_s  huffTab[2][4];
+    block_parameter_s blkPar[4];
+    unsigned char quant[4][64]; //quant table
+    int restart_interval;
+    int frame_width;
+    int frame_height;
+    int mcu_width;
+    int mcu_height;
+    int comp;
+    int bitstream_offset;
+    int bitstream_size;
+    int stream_chroma_mode;  //0-mono chrome; 1-yuv420; 2-yuv422H; 3-yuv422V; 4-yuv444;
+    int output_chroma_mode;  //0-mono chrome; 1-yuv420; 2-yuv422H; 3-yuv422V; 4-yuv444;
+    int output_pixel_format; //0-yuv; 1-RGB; 2-BGR; 3-RGBA; 4-BGRA; 5-ABGR; 6-ARGB
+    int output_stride_luma;   //64 bytes align
+    int output_stride_chroma; //64 bytes align
+    int alpha_value;
+    int yuv2rgb_param[6]; //K0, K1, K2, K3, K4, C
+    int tile_mode; //0-pitch linear; 1-gpu block linear; 2-tegra block linear
+    int block_linear_height;
+    int memory_mode; //0-semi planar nv12; 1-semi planar nv21; 2-plane(yuy2); 3-planar
+    int power2_downscale; //0-no scale; 1- 1/2; 2- 1/4; 3- 1/8
+    int motion_jpeg_type; //0-type A; 1-type B
+    int start_mcu_x;      //set start mcu x for error robust
+    int start_mcu_y;      //set start mcu y for error robust
+}nvjpg_dec_drv_pic_setup_s;
+
+typedef struct
+{
+    unsigned int bytes_offset; //bytes consumed by HW
+    unsigned int mcu_x;        //decoded mcu_x
+    unsigned int mcu_y;        //decoded mcu_y
+    unsigned int cycle_count;
+    unsigned int error_status; //report error if any
+    unsigned char reserved1[12];
+}nvjpg_dec_status;
+#endif /* AVUTIL_NVJPG_DRV_H */
diff --git a/libavutil/vic_drv.h b/libavutil/vic_drv.h
new file mode 100644
index 0000000000..32ebe1a17d
--- /dev/null
+++ b/libavutil/vic_drv.h
@@ -0,0 +1,279 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef AVUTIL_VIC_DRV_H
+#define AVUTIL_VIC_DRV_H
+
+#include <stdint.h>
+
+typedef uint8_t  NvU8;
+typedef uint16_t NvU16;
+typedef uint32_t NvU32;
+typedef uint64_t NvU64;
+typedef  int8_t  NvS8;
+typedef  int16_t NvS16;
+typedef  int32_t NvS32;
+typedef  int64_t NvS64;
+typedef _Bool    NvBool;
+
+typedef struct VicPipeConfig {
+    NvU32 DownsampleHoriz       : 11;
+    NvU32 reserved0             :  5;
+    NvU32 DownsampleVert        : 11;
+    NvU32 reserved1             :  5;
+    NvU32 reserved2             : 32;
+    NvU32 reserved3             : 32;
+    NvU32 reserved4             : 32;
+} VicPipeConfig;
+
+typedef struct VicOutputConfig {
+    NvU64 AlphaFillMode         :  3;
+    NvU64 AlphaFillSlot         :  3;
+    NvU64 BackgroundAlpha       : 10;
+    NvU64 BackgroundR           : 10;
+    NvU64 BackgroundG           : 10;
+    NvU64 BackgroundB           : 10;
+    NvU64 RegammaMode           :  2;
+    NvU64 OutputFlipX           :  1;
+    NvU64 OutputFlipY           :  1;
+    NvU64 OutputTranspose       :  1;
+    NvU64 reserved1             :  1;
+    NvU64 reserved2             : 12;
+    NvU32 TargetRectLeft        : 14;
+    NvU32 reserved3             :  2;
+    NvU32 TargetRectRight       : 14;
+    NvU32 reserved4             :  2;
+    NvU32 TargetRectTop         : 14;
+    NvU32 reserved5             :  2;
+    NvU32 TargetRectBottom      : 14;
+    NvU32 reserved6             :  2;
+} VicOutputConfig;
+
+typedef struct VicOutputSurfaceConfig {
+    NvU32 OutPixelFormat        :  7;
+    NvU32 OutChromaLocHoriz     :  2;
+    NvU32 OutChromaLocVert      :  2;
+    NvU32 OutBlkKind            :  4;
+    NvU32 OutBlkHeight          :  4;
+    NvU32 reserved0             :  3;
+    NvU32 reserved1             : 10;
+    NvU32 OutSurfaceWidth       : 14;
+    NvU32 OutSurfaceHeight      : 14;
+    NvU32 reserved2             :  4;
+    NvU32 OutLumaWidth          : 14;
+    NvU32 OutLumaHeight         : 14;
+    NvU32 reserved3             :  4;
+    NvU32 OutChromaWidth        : 14;
+    NvU32 OutChromaHeight       : 14;
+    NvU32 reserved4             :  4;
+} VicOutputSurfaceConfig;
+
+typedef struct VicMatrixStruct {
+    NvU64 matrix_coeff00        : 20;
+    NvU64 matrix_coeff10        : 20;
+    NvU64 matrix_coeff20        : 20;
+    NvU64 matrix_r_shift        :  4;
+    NvU64 matrix_coeff01        : 20;
+    NvU64 matrix_coeff11        : 20;
+    NvU64 matrix_coeff21        : 20;
+    NvU64 reserved0             :  3;
+    NvU64 matrix_enable         :  1;
+    NvU64 matrix_coeff02        : 20;
+    NvU64 matrix_coeff12        : 20;
+    NvU64 matrix_coeff22        : 20;
+    NvU64 reserved1             :  4;
+    NvU64 matrix_coeff03        : 20;
+    NvU64 matrix_coeff13        : 20;
+    NvU64 matrix_coeff23        : 20;
+    NvU64 reserved2             :  4;
+} VicMatrixStruct;
+
+typedef struct VicClearRectStruct {
+    NvU32 ClearRect0Left        : 14;
+    NvU32 reserved0             :  2;
+    NvU32 ClearRect0Right       : 14;
+    NvU32 reserved1             :  2;
+    NvU32 ClearRect0Top         : 14;
+    NvU32 reserved2             :  2;
+    NvU32 ClearRect0Bottom      : 14;
+    NvU32 reserved3             :  2;
+    NvU32 ClearRect1Left        : 14;
+    NvU32 reserved4             :  2;
+    NvU32 ClearRect1Right       : 14;
+    NvU32 reserved5             :  2;
+    NvU32 ClearRect1Top         : 14;
+    NvU32 reserved6             :  2;
+    NvU32 ClearRect1Bottom      : 14;
+    NvU32 reserved7             :  2;
+} VicClearRectStruct;
+
+typedef struct VicSlotStructSlotConfig {
+    NvU64 SlotEnable            :  1;
+    NvU64 DeNoise               :  1;
+    NvU64 AdvancedDenoise       :  1;
+    NvU64 CadenceDetect         :  1;
+    NvU64 MotionMap             :  1;
+    NvU64 MMapCombine           :  1;
+    NvU64 IsEven                :  1;
+    NvU64 ChromaEven            :  1;
+    NvU64 CurrentFieldEnable    :  1;
+    NvU64 PrevFieldEnable       :  1;
+    NvU64 NextFieldEnable       :  1;
+    NvU64 NextNrFieldEnable     :  1;
+    NvU64 CurMotionFieldEnable  :  1;
+    NvU64 PrevMotionFieldEnable :  1;
+    NvU64 PpMotionFieldEnable   :  1;
+    NvU64 CombMotionFieldEnable :  1;
+    NvU64 FrameFormat           :  4;
+    NvU64 FilterLengthY         :  2;
+    NvU64 FilterLengthX         :  2;
+    NvU64 Panoramic             : 12;
+    NvU64 reserved1             : 22;
+    NvU64 DetailFltClamp        :  6;
+    NvU64 FilterNoise           : 10;
+    NvU64 FilterDetail          : 10;
+    NvU64 ChromaNoise           : 10;
+    NvU64 ChromaDetail          : 10;
+    NvU64 DeinterlaceMode       :  4;
+    NvU64 MotionAccumWeight     :  3;
+    NvU64 NoiseIir              : 11;
+    NvU64 LightLevel            :  4;
+    NvU64 reserved4             :  2;
+    NvU32 SoftClampLow          : 10;
+    NvU32 SoftClampHigh         : 10;
+    NvU32 reserved5             :  3;
+    NvU32 reserved6             :  9;
+    NvU32 PlanarAlpha           : 10;
+    NvU32 ConstantAlpha         :  1;
+    NvU32 StereoInterleave      :  3;
+    NvU32 ClipEnabled           :  1;
+    NvU32 ClearRectMask         :  8;
+    NvU32 DegammaMode           :  2;
+    NvU32 reserved7             :  1;
+    NvU32 DecompressEnable      :  1;
+    NvU32 reserved9             :  5;
+    NvU64 DecompressCtbCount    :  8;
+    NvU64 DecompressZbcColor    : 32;
+    NvU64 reserved12            : 24;
+    NvU32 SourceRectLeft        : 30;
+    NvU32 reserved14            :  2;
+    NvU32 SourceRectRight       : 30;
+    NvU32 reserved15            :  2;
+    NvU32 SourceRectTop         : 30;
+    NvU32 reserved16            :  2;
+    NvU32 SourceRectBottom      : 30;
+    NvU32 reserved17            :  2;
+    NvU32 DestRectLeft          : 14;
+    NvU32 reserved18            :  2;
+    NvU32 DestRectRight         : 14;
+    NvU32 reserved19            :  2;
+    NvU32 DestRectTop           : 14;
+    NvU32 reserved20            :  2;
+    NvU32 DestRectBottom        : 14;
+    NvU32 reserved21            :  2;
+    NvU32 reserved22            : 32;
+    NvU32 reserved23            : 32;
+} VicSlotStructSlotConfig;
+
+typedef struct VicSlotStructSlotSurfaceConfig {
+    NvU32 SlotPixelFormat       :  7;
+    NvU32 SlotChromaLocHoriz    :  2;
+    NvU32 SlotChromaLocVert     :  2;
+    NvU32 SlotBlkKind           :  4;
+    NvU32 SlotBlkHeight         :  4;
+    NvU32 SlotCacheWidth        :  3;
+    NvU32 reserved0             : 10;
+    NvU32 SlotSurfaceWidth      : 14;
+    NvU32 SlotSurfaceHeight     : 14;
+    NvU32 reserved1             :  4;
+    NvU32 SlotLumaWidth         : 14;
+    NvU32 SlotLumaHeight        : 14;
+    NvU32 reserved2             :  4;
+    NvU32 SlotChromaWidth       : 14;
+    NvU32 SlotChromaHeight      : 14;
+    NvU32 reserved3             :  4;
+} VicSlotStructSlotSurfaceConfig;
+
+typedef struct VicSlotStructLumaKeyStruct {
+    NvU64 luma_coeff0           : 20;
+    NvU64 luma_coeff1           : 20;
+    NvU64 luma_coeff2           : 20;
+    NvU64 luma_r_shift          :  4;
+    NvU64 luma_coeff3           : 20;
+    NvU64 LumaKeyLower          : 10;
+    NvU64 LumaKeyUpper          : 10;
+    NvU64 LumaKeyEnabled        :  1;
+    NvU64 reserved0             :  2;
+    NvU64 reserved1             : 21;
+} VicSlotStructLumaKeyStruct;
+
+typedef struct VicSlotStructBlendingSlotStruct {
+    NvU32 AlphaK1               : 10;
+    NvU32 reserved0             :  6;
+    NvU32 AlphaK2               : 10;
+    NvU32 reserved1             :  6;
+    NvU32 SrcFactCMatchSelect   :  3;
+    NvU32 reserved2             :  1;
+    NvU32 DstFactCMatchSelect   :  3;
+    NvU32 reserved3             :  1;
+    NvU32 SrcFactAMatchSelect   :  3;
+    NvU32 reserved4             :  1;
+    NvU32 DstFactAMatchSelect   :  3;
+    NvU32 reserved5             :  1;
+    NvU32 reserved6             :  4;
+    NvU32 reserved7             :  4;
+    NvU32 reserved8             :  4;
+    NvU32 reserved9             :  4;
+    NvU32 reserved10            :  2;
+    NvU32 OverrideR             : 10;
+    NvU32 OverrideG             : 10;
+    NvU32 OverrideB             : 10;
+    NvU32 OverrideA             : 10;
+    NvU32 reserved11            :  2;
+    NvU32 UseOverrideR          :  1;
+    NvU32 UseOverrideG          :  1;
+    NvU32 UseOverrideB          :  1;
+    NvU32 UseOverrideA          :  1;
+    NvU32 MaskR                 :  1;
+    NvU32 MaskG                 :  1;
+    NvU32 MaskB                 :  1;
+    NvU32 MaskA                 :  1;
+    NvU32 reserved12            : 12;
+} VicSlotStructBlendingSlotStruct;
+
+typedef struct VicSlotStruct {
+    VicSlotStructSlotConfig         slotConfig;
+    VicSlotStructSlotSurfaceConfig  slotSurfaceConfig;
+    VicSlotStructLumaKeyStruct      lumaKeyStruct;
+    VicMatrixStruct                 colorMatrixStruct;
+    VicMatrixStruct                 gamutMatrixStruct;
+    VicSlotStructBlendingSlotStruct blendingSlotStruct;
+} VicSlotStruct;
+
+typedef struct VicConfigStruct {
+    VicPipeConfig                   pipeConfig;
+    VicOutputConfig                 outputConfig;
+    VicOutputSurfaceConfig          outputSurfaceConfig;
+    VicMatrixStruct                 outColorMatrixStruct;
+    VicClearRectStruct              clearRectStruct[4];
+    VicSlotStruct                   slotStruct[8];
+} VicConfigStruct;
+
+#endif /* AVUTIL_VIC_DRV_H */
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (3 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 04/16] avutil: add hardware definitions for NVDEC, NVJPG and VIC averne
@ 2024-05-30 19:43 ` averne
  2024-05-31  8:32   ` Rémi Denis-Courmont
  2024-06-05 20:29   ` Mark Thompson
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext averne
                   ` (10 subsequent siblings)
  15 siblings, 2 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

This includes a new pixel format for nvtegra hardware frames, and several objects for interaction with hardware blocks.
In particular, this contains code for channels (handles to hardware engines), maps (memory-mapped buffers shared with engines), and command buffers (abstraction for building command lists sent to the engines).

Signed-off-by: averne <averne381@gmail.com>
---
 configure                  |    2 +
 libavutil/Makefile         |    4 +
 libavutil/nvtegra.c        | 1035 ++++++++++++++++++++++++++++++++++++
 libavutil/nvtegra.h        |  258 +++++++++
 libavutil/nvtegra_host1x.h |   94 ++++
 libavutil/pixdesc.c        |    4 +
 libavutil/pixfmt.h         |    8 +
 7 files changed, 1405 insertions(+)
 create mode 100644 libavutil/nvtegra.c
 create mode 100644 libavutil/nvtegra.h
 create mode 100644 libavutil/nvtegra_host1x.h

diff --git a/configure b/configure
index 09fb2aed1b..51f169bfbd 100755
--- a/configure
+++ b/configure
@@ -361,6 +361,7 @@ External library support:
   --disable-vdpau          disable Nvidia Video Decode and Presentation API for Unix code [autodetect]
   --disable-videotoolbox   disable VideoToolbox code [autodetect]
   --disable-vulkan         disable Vulkan code [autodetect]
+  --enable-nvtegra         enable nvtegra code [no]
 
 Toolchain options:
   --arch=ARCH              select architecture [$arch]
@@ -3151,6 +3152,7 @@ videotoolbox_hwaccel_deps="videotoolbox pthreads"
 videotoolbox_hwaccel_extralibs="-framework QuartzCore"
 vulkan_deps="threads"
 vulkan_deps_any="libdl LoadLibrary"
+nvtegra_deps="gpl"
 
 av1_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_AV1"
 av1_d3d11va_hwaccel_select="av1_decoder"
diff --git a/libavutil/Makefile b/libavutil/Makefile
index 9c112bc58a..733a23a8a3 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -52,6 +52,7 @@ HEADERS = adler32.h                                                     \
           hwcontext_videotoolbox.h                                      \
           hwcontext_vdpau.h                                             \
           hwcontext_vulkan.h                                            \
+          nvtegra.h                                                     \
           nvhost_ioctl.h                                                \
           nvmap_ioctl.h                                                 \
           iamf.h                                                        \
@@ -209,6 +210,7 @@ OBJS-$(CONFIG_VDPAU)                    += hwcontext_vdpau.o
 OBJS-$(CONFIG_VULKAN)                   += hwcontext_vulkan.o vulkan.o
 
 OBJS-$(!CONFIG_VULKAN)                  += hwcontext_stub.o
+OBJS-$(CONFIG_NVTEGRA)                  += nvtegra.o
 
 OBJS += $(COMPAT_OBJS:%=../compat/%)
 
@@ -230,6 +232,8 @@ SKIPHEADERS-$(CONFIG_VDPAU)            += hwcontext_vdpau.h
 SKIPHEADERS-$(CONFIG_VULKAN)           += hwcontext_vulkan.h vulkan.h   \
                                           vulkan_functions.h            \
                                           vulkan_loader.h
+SKIPHEADERS-$(CONFIG_NVTEGRA)          += nvtegra.h                     \
+                                          nvtegra_host1x.h
 
 TESTPROGS = adler32                                                     \
             aes                                                         \
diff --git a/libavutil/nvtegra.c b/libavutil/nvtegra.c
new file mode 100644
index 0000000000..ad0bbbdfaa
--- /dev/null
+++ b/libavutil/nvtegra.c
@@ -0,0 +1,1035 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef __SWITCH__
+#   include <sys/ioctl.h>
+#   include <sys/mman.h>
+#   include <fcntl.h>
+#   include <unistd.h>
+#else
+#   include <stdlib.h>
+#   include <switch.h>
+#endif
+
+#include <string.h>
+
+#include "buffer.h"
+#include "log.h"
+#include "error.h"
+#include "mem.h"
+#include "thread.h"
+
+#include "nvhost_ioctl.h"
+#include "nvmap_ioctl.h"
+#include "nvtegra_host1x.h"
+
+#include "nvtegra.h"
+
+/*
+ * Tag used by the kernel to identify allocations.
+ * Official software has been seen using 0x900, 0xf00, 0x1100, 0x1400, 0x4000.
+ */
+#define MEM_TAG (0xfeed)
+
+struct DriverState {
+    int nvmap_fd, nvhost_fd;
+};
+
+static AVMutex g_driver_init_mtx = AV_MUTEX_INITIALIZER;
+static struct DriverState *g_driver_state = NULL;
+static AVBufferRef *g_driver_state_ref = NULL;
+
+static void free_driver_fds(void *opaque, uint8_t *data) {
+    if (!g_driver_state)
+        return;
+
+#ifndef __SWITCH__
+    if (g_driver_state->nvmap_fd > 0)
+        close(g_driver_state->nvmap_fd);
+
+    if (g_driver_state->nvhost_fd > 0)
+        close(g_driver_state->nvhost_fd);
+#else
+    nvFenceExit();
+    nvMapExit();
+    nvExit();
+    mmuExit();
+#endif
+
+    g_driver_init_mtx  = (AVMutex)AV_MUTEX_INITIALIZER;
+    g_driver_state_ref = NULL;
+    av_freep(&g_driver_state);
+}
+
+static int init_driver_fds(void) {
+    AVBufferRef *ref;
+    struct DriverState *state;
+    int err;
+
+    state = av_mallocz(sizeof(*state));
+    if (!state)
+        return AVERROR(ENOMEM);
+
+    ref = av_buffer_create((uint8_t *)state, sizeof(*state), free_driver_fds, NULL, 0);
+    if (!state)
+        return AVERROR(ENOMEM);
+
+    g_driver_state     = state;
+    g_driver_state_ref = ref;
+
+#ifndef __SWITCH__
+    err = open("/dev/nvmap", O_RDWR | O_SYNC);
+    if (err < 0)
+        return AVERROR(errno);
+    state->nvmap_fd = err;
+
+    err = open("/dev/nvhost-ctrl", O_RDWR | O_SYNC);
+    if (err < 0)
+        return AVERROR(errno);
+    state->nvhost_fd = err;
+#else
+    err = nvInitialize();
+    if (R_FAILED(err))
+        return AVERROR(err);
+
+    err = nvMapInit();
+    if (R_FAILED(err))
+        return AVERROR(err);
+    state->nvmap_fd = nvMapGetFd();
+
+    err = nvFenceInit();
+    if (R_FAILED(err))
+        return AVERROR(err);
+    /* libnx doesn't export the nvhost-ctrl file descriptor */
+
+    err = mmuInitialize();
+    if (R_FAILED(err))
+        return AVERROR(err);
+#endif
+
+    return 0;
+}
+
+static inline int get_nvmap_fd(void) {
+    if (!g_driver_state)
+        return AVERROR_UNKNOWN;
+
+    if (!g_driver_state->nvmap_fd)
+        return AVERROR_UNKNOWN;
+
+    return g_driver_state->nvmap_fd;
+}
+
+static inline int get_nvhost_fd(void) {
+    if (!g_driver_state)
+        return AVERROR_UNKNOWN;
+
+    if (!g_driver_state->nvhost_fd)
+        return AVERROR_UNKNOWN;
+
+    return g_driver_state->nvhost_fd;
+}
+
+AVBufferRef *av_nvtegra_driver_init(void) {
+    AVBufferRef *out = NULL;
+    int err;
+
+    /*
+     * We have to do this overly complex dance of putting driver fds in a refcounted struct,
+     * otherwise initializing multiple hwcontexts would leak fds
+     */
+
+    err = ff_mutex_lock(&g_driver_init_mtx);
+    if (err != 0)
+        goto exit;
+
+    if (g_driver_state_ref) {
+        out = av_buffer_ref(g_driver_state_ref);
+        goto exit;
+    }
+
+    err = init_driver_fds();
+    if (err < 0) {
+        /* In case memory allocations failed, call the destructor ourselves */
+        av_buffer_unref(&g_driver_state_ref);
+        free_driver_fds(NULL, NULL);
+        goto exit;
+    }
+
+    out = g_driver_state_ref;
+
+exit:
+    ff_mutex_unlock(&g_driver_init_mtx);
+    return out;
+}
+
+int av_nvtegra_channel_open(AVNVTegraChannel *channel, const char *dev) {
+    int err;
+#ifndef __SWITCH__
+    struct nvhost_get_param_arg args;
+
+    err = open(dev, O_RDWR);
+    if (err < 0)
+        return AVERROR(errno);
+
+    channel->fd = err;
+
+    args = (struct nvhost_get_param_arg){0};
+
+    err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_GET_SYNCPOINT, &args);
+    if (err < 0)
+        goto fail;
+
+    channel->syncpt = args.value;
+
+    return 0;
+
+fail:
+    close(channel->fd);
+    return AVERROR(errno);
+#else
+    err = nvChannelCreate(&channel->channel, dev);
+    if (R_FAILED(err))
+        return AVERROR(err);
+
+    err = nvioctlChannel_GetSyncpt(channel->channel.fd, 0, &channel->syncpt);
+    if (R_FAILED(err))
+        goto fail;
+
+    return 0;
+
+fail:
+    nvChannelClose(&channel->channel);
+    return AVERROR(err);
+#endif
+}
+
+int av_nvtegra_channel_close(AVNVTegraChannel *channel) {
+#ifndef __SWITCH__
+    if (!channel->fd)
+        return 0;
+
+    return close(channel->fd);
+#else
+    nvChannelClose(&channel->channel);
+    return 0;
+#endif
+}
+
+int av_nvtegra_channel_get_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t *clock_rate) {
+    int err;
+#ifndef __SWITCH__
+    struct nvhost_clk_rate_args args;
+
+    args = (struct nvhost_clk_rate_args){
+        .moduleid = moduleid,
+    };
+
+    err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_GET_CLK_RATE, &args);
+    if (err < 0)
+        return AVERROR(errno);
+
+    if (clock_rate)
+        *clock_rate = args.rate;
+
+    return 0;
+#else
+    uint32_t tmp;
+
+    err = AVERROR(nvioctlChannel_GetModuleClockRate(channel->channel.fd, moduleid, &tmp));
+    if (err < 0)
+        return err;
+
+    if (clock_rate)
+        *clock_rate = tmp * 1000;
+
+    return 0;
+#endif
+}
+
+int av_nvtegra_channel_set_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t clock_rate) {
+#ifndef __SWITCH__
+    struct nvhost_clk_rate_args args;
+
+    args = (struct nvhost_clk_rate_args){
+        .rate     = clock_rate,
+        .moduleid = moduleid,
+    };
+
+    return (ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SET_CLK_RATE, &args) < 0) ? AVERROR(errno) : 0;
+#else
+    return AVERROR(nvioctlChannel_SetModuleClockRate(channel->channel.fd, moduleid, clock_rate / 1000));
+#endif
+}
+
+int av_nvtegra_channel_submit(AVNVTegraChannel *channel, AVNVTegraCmdbuf *cmdbuf, uint32_t *fence) {
+    int err;
+#ifndef __SWITCH__
+    struct nvhost_submit_args args;
+
+    args = (struct nvhost_submit_args){
+        .submit_version          = NVHOST_SUBMIT_VERSION_V2,
+        .num_syncpt_incrs        = cmdbuf->num_syncpt_incrs,
+        .num_cmdbufs             = cmdbuf->num_cmdbufs,
+        .num_relocs              = cmdbuf->num_relocs,
+        .num_waitchks            = cmdbuf->num_waitchks,
+        .timeout                 = 0,
+        .flags                   = 0,
+        .fence                   = 0,
+        .syncpt_incrs            = (uintptr_t)cmdbuf->syncpt_incrs,
+        .cmdbuf_exts             = (uintptr_t)cmdbuf->cmdbuf_exts,
+        .checksum_methods        = 0,
+        .checksum_falcon_methods = 0,
+        .pad                     = { 0 },
+        .reloc_types             = (uintptr_t)cmdbuf->reloc_types,
+        .cmdbufs                 = (uintptr_t)cmdbuf->cmdbufs,
+        .relocs                  = (uintptr_t)cmdbuf->relocs,
+        .reloc_shifts            = (uintptr_t)cmdbuf->reloc_shifts,
+        .waitchks                = (uintptr_t)cmdbuf->waitchks,
+        .waitbases               = 0,
+        .class_ids               = (uintptr_t)cmdbuf->class_ids,
+        .fences                  = (uintptr_t)cmdbuf->fences,
+    };
+
+    err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SUBMIT, &args);
+    if (err < 0)
+        return AVERROR(errno);
+
+    if (fence)
+        *fence = args.fence;
+
+    return 0;
+#else
+    nvioctl_fence tmp;
+
+    err = nvioctlChannel_Submit(channel->channel.fd, (nvioctl_cmdbuf *)cmdbuf->cmdbufs, cmdbuf->num_cmdbufs,
+                                NULL, NULL, 0, (nvioctl_syncpt_incr *)cmdbuf->syncpt_incrs, cmdbuf->num_syncpt_incrs,
+                                &tmp, 1);
+    if (R_FAILED(err))
+        return AVERROR(err);
+
+    if (fence)
+        *fence = tmp.value;
+
+    return 0;
+#endif
+}
+
+int av_nvtegra_channel_set_submit_timeout(AVNVTegraChannel *channel, uint32_t timeout_ms) {
+#ifndef __SWITCH__
+    struct nvhost_set_timeout_args args;
+
+    args = (struct nvhost_set_timeout_args){
+        .timeout = timeout_ms,
+    };
+
+    return (ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SET_TIMEOUT, &args) < 0) ? AVERROR(errno) : 0;
+#else
+    return AVERROR(nvioctlChannel_SetSubmitTimeout(channel->channel.fd, timeout_ms));
+#endif
+}
+
+int av_nvtegra_syncpt_wait(AVNVTegraChannel *channel, uint32_t threshold, int32_t timeout) {
+#ifndef __SWITCH__
+    struct nvhost_ctrl_syncpt_waitex_args args = {
+        .id      = channel->syncpt,
+        .thresh  = threshold,
+        .timeout = timeout,
+    };
+
+    return (ioctl(get_nvhost_fd(), NVHOST_IOCTL_CTRL_SYNCPT_WAITEX, &args) < 0) ? AVERROR(errno) : 0;
+#else
+    NvFence fence;
+
+    fence = (NvFence){
+        .id    = channel->syncpt,
+        .value = threshold,
+    };
+
+    return AVERROR(nvFenceWait(&fence, timeout));
+#endif
+}
+
+#ifdef __SWITCH__
+static inline bool convert_cache_flags(uint32_t flags) {
+    /* Return whether the map should be CPU-cacheable */
+    switch (flags & NVMAP_HANDLE_CACHE_FLAG) {
+        case NVMAP_HANDLE_INNER_CACHEABLE:
+        case NVMAP_HANDLE_CACHEABLE:
+            return true;
+        default:
+            return false;
+    }
+}
+#endif
+
+int av_nvtegra_map_allocate(AVNVTegraMap *map, AVNVTegraChannel *channel, uint32_t size,
+                            uint32_t align, int heap_mask, int flags)
+{
+#ifndef __SWITCH__
+    struct nvmap_create_handle create_args;
+    struct nvmap_alloc_handle alloc_args;
+    int err;
+
+    create_args = (struct nvmap_create_handle){
+        .size   = size,
+    };
+
+    err = ioctl(get_nvmap_fd(), NVMAP_IOC_CREATE, &create_args);
+    if (err < 0)
+        return AVERROR(errno);
+
+    map->size   = size;
+    map->handle = create_args.handle;
+
+    alloc_args = (struct nvmap_alloc_handle){
+        .handle    = create_args.handle,
+        .heap_mask = heap_mask,
+        .flags     = flags | (MEM_TAG << 16),
+        .align     = align,
+    };
+
+    err = ioctl(get_nvmap_fd(), NVMAP_IOC_ALLOC, &alloc_args);
+    if (err < 0)
+        goto fail;
+
+    return 0;
+
+fail:
+    av_nvtegra_map_free(map);
+    return AVERROR(errno);
+#else
+    void *mem;
+
+    map->owner = channel->channel.fd;
+
+    size = FFALIGN(size, 0x1000);
+
+    mem = aligned_alloc(FFALIGN(align, 0x1000), size);
+    if (!mem)
+        return AVERROR(ENOMEM);
+
+    return AVERROR(nvMapCreate(&map->map, mem, size, 0x10000, NvKind_Pitch,
+                               convert_cache_flags(flags)));
+#endif
+}
+
+int av_nvtegra_map_free(AVNVTegraMap *map) {
+#ifndef __SWITCH__
+    int err;
+
+    if (!map->handle)
+        return 0;
+
+    err = ioctl(get_nvmap_fd(), NVMAP_IOC_FREE, map->handle);
+    if (err < 0)
+        return AVERROR(errno);
+
+    map->handle = 0;
+
+    return 0;
+#else
+    void *addr = map->map.cpu_addr;
+
+    if (!map->map.cpu_addr)
+        return 0;
+
+    nvMapClose(&map->map);
+    free(addr);
+    return 0;
+#endif
+}
+
+int av_nvtegra_map_from_va(AVNVTegraMap *map, AVNVTegraChannel *owner, void *mem,
+                           uint32_t size, uint32_t align, uint32_t flags)
+{
+#ifndef __SWITCH__
+    struct nvmap_create_handle_from_va args;
+    int err;
+
+    args = (struct nvmap_create_handle_from_va){
+        .va    = (uintptr_t)mem,
+        .size  = size,
+        .flags = flags | (MEM_TAG << 16),
+    };
+
+    err = ioctl(get_nvmap_fd(), NVMAP_IOC_FROM_VA, &args);
+    if (err < 0)
+        return AVERROR(errno);
+
+    map->cpu_addr = mem;
+    map->size     = size;
+    map->handle   = args.handle;
+
+    return 0;
+#else
+
+    map->owner = owner->channel.fd;
+
+    return AVERROR(nvMapCreate(&map->map, mem, FFALIGN(size, 0x1000), 0x10000, NvKind_Pitch,
+                               convert_cache_flags(flags)));;
+#endif
+}
+
+int av_nvtegra_map_close(AVNVTegraMap *map) {
+#ifndef __SWITCH__
+    return av_nvtegra_map_free(map);
+#else
+    nvMapClose(&map->map);
+    return 0;
+#endif
+}
+
+int av_nvtegra_map_map(AVNVTegraMap *map) {
+#ifndef __SWITCH__
+    void *addr;
+
+    addr = mmap(NULL, map->size, PROT_READ | PROT_WRITE, MAP_SHARED, map->handle, 0);
+    if (addr == MAP_FAILED)
+        return AVERROR(errno);
+
+    map->cpu_addr = addr;
+
+    return 0;
+#else
+    nvioctl_command_buffer_map params;
+    int err;
+
+    params = (nvioctl_command_buffer_map){
+        .handle = map->map.handle,
+    };
+
+    err = nvioctlChannel_MapCommandBuffer(map->owner, &params, 1, false);
+    if (R_FAILED(err))
+        return AVERROR(err);
+
+    map->iova = params.iova;
+
+    return 0;
+#endif
+}
+
+int av_nvtegra_map_unmap(AVNVTegraMap *map) {
+    int err;
+#ifndef __SWITCH__
+    if (!map->cpu_addr)
+        return 0;
+
+    err = munmap(map->cpu_addr, map->size);
+    if (err < 0)
+        return AVERROR(errno);
+
+    map->cpu_addr = NULL;
+
+    return 0;
+#else
+    nvioctl_command_buffer_map params;
+
+    if (!map->iova)
+        return 0;
+
+    params = (nvioctl_command_buffer_map){
+        .handle = map->map.handle,
+        .iova   = map->iova,
+    };
+
+    err = nvioctlChannel_UnmapCommandBuffer(map->owner, &params, 1, false);
+    if (R_FAILED(err))
+        return AVERROR(err);
+
+    map->iova = 0;
+
+    return 0;
+#endif
+}
+
+int av_nvtegra_map_cache_op(AVNVTegraMap *map, int op, void *addr, size_t len) {
+#ifndef __SWITCH__
+    struct nvmap_cache_op args;
+
+    args = (struct nvmap_cache_op){
+        .addr   = (uintptr_t)addr,
+        .len    = len,
+        .handle = av_nvtegra_map_get_handle(map),
+        .op     = op,
+    };
+
+    return AVERROR(ioctl(get_nvmap_fd(), NVMAP_IOC_CACHE, &args));
+#else
+    if (!map->map.is_cpu_cacheable)
+        return 0;
+
+    switch (op) {
+        case NVMAP_CACHE_OP_WB:
+            armDCacheClean(addr, len);
+            break;
+        default:
+        case NVMAP_CACHE_OP_INV:
+        case NVMAP_CACHE_OP_WB_INV:
+            /* libnx internally performs a clean-invalidate, since invalidate is a privileged instruction */
+            armDCacheFlush(addr, len);
+            break;
+    }
+
+    return 0;
+#endif
+}
+
+int av_nvtegra_map_realloc(AVNVTegraMap *map, uint32_t size, uint32_t align,
+                           int heap_mask, int flags)
+{
+    AVNVTegraChannel channel;
+    AVNVTegraMap tmp = {0};
+    int err;
+
+    if (av_nvtegra_map_get_size(map) >= size)
+        return 0;
+
+    /* Dummy channel object to hold the owner fd */
+    channel = (AVNVTegraChannel){
+#ifdef __SWITCH__
+        .channel.fd = map->owner,
+#endif
+    };
+
+    err = av_nvtegra_map_create(&tmp, &channel, size, align, heap_mask, flags);
+    if (err < 0)
+        goto fail;
+
+    memcpy(av_nvtegra_map_get_addr(&tmp), av_nvtegra_map_get_addr(map), av_nvtegra_map_get_size(map));
+
+    err = av_nvtegra_map_destroy(map);
+    if (err < 0)
+        goto fail;
+
+    *map = tmp;
+
+    return 0;
+
+fail:
+    av_nvtegra_map_destroy(&tmp);
+    return err;
+}
+
+int av_nvtegra_cmdbuf_init(AVNVTegraCmdbuf *cmdbuf) {
+    cmdbuf->num_cmdbufs      = 0;
+#ifndef __SWITCH__
+    cmdbuf->num_relocs       = 0;
+    cmdbuf->num_waitchks     = 0;
+#endif
+    cmdbuf->num_syncpt_incrs = 0;
+
+#define NUM_INITIAL_CMDBUFS      3
+#define NUM_INITIAL_RELOCS       15
+#define NUM_INITIAL_SYNCPT_INCRS 3
+
+    cmdbuf->cmdbufs      = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->cmdbufs));
+#ifndef __SWITCH__
+    cmdbuf->cmdbuf_exts  = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->cmdbuf_exts));
+    cmdbuf->class_ids    = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->class_ids));
+#endif
+
+#ifndef __SWITCH__
+    if (!cmdbuf->cmdbufs || !cmdbuf->cmdbuf_exts || !cmdbuf->class_ids)
+#else
+    if (!cmdbuf->cmdbufs)
+#endif
+        return AVERROR(ENOMEM);
+
+#ifndef __SWITCH__
+    cmdbuf->relocs       = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->relocs));
+    cmdbuf->reloc_types  = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->reloc_types));
+    cmdbuf->reloc_shifts = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->reloc_shifts));
+    if (!cmdbuf->relocs || !cmdbuf->reloc_types || !cmdbuf->reloc_shifts)
+        return AVERROR(ENOMEM);
+#endif
+
+    cmdbuf->syncpt_incrs = av_malloc_array(NUM_INITIAL_SYNCPT_INCRS, sizeof(*cmdbuf->syncpt_incrs));
+#ifndef __SWITCH__
+    cmdbuf->fences       = av_malloc_array(NUM_INITIAL_SYNCPT_INCRS, sizeof(*cmdbuf->fences));
+#endif
+
+#ifndef __SWITCH__
+    if (!cmdbuf->syncpt_incrs || !cmdbuf->fences)
+#else
+    if (!cmdbuf->syncpt_incrs)
+#endif
+        return AVERROR(ENOMEM);
+
+    return 0;
+}
+
+int av_nvtegra_cmdbuf_deinit(AVNVTegraCmdbuf *cmdbuf) {
+    av_freep(&cmdbuf->cmdbufs);
+    av_freep(&cmdbuf->syncpt_incrs);
+
+#ifndef __SWITCH__
+    av_freep(&cmdbuf->cmdbuf_exts), av_freep(&cmdbuf->class_ids);
+    av_freep(&cmdbuf->relocs), av_freep(&cmdbuf->reloc_types), av_freep(&cmdbuf->reloc_shifts);
+    av_freep(&cmdbuf->fences);
+#endif
+
+    return 0;
+}
+
+int av_nvtegra_cmdbuf_add_memory(AVNVTegraCmdbuf *cmdbuf, AVNVTegraMap *map, uint32_t offset, uint32_t size) {
+    uint8_t *mem;
+
+    mem = av_nvtegra_map_get_addr(map);
+
+    cmdbuf->map        = map;
+    cmdbuf->mem_offset = offset;
+    cmdbuf->mem_size   = size;
+
+    cmdbuf->cur_word = (uint32_t *)(mem + cmdbuf->mem_offset);
+
+    return 0;
+}
+
+int av_nvtegra_cmdbuf_clear(AVNVTegraCmdbuf *cmdbuf) {
+    uint8_t *mem;
+
+    mem = av_nvtegra_map_get_addr(cmdbuf->map);
+
+    cmdbuf->num_cmdbufs = 0, cmdbuf->num_syncpt_incrs = 0;
+#ifndef __SWITCH__
+    cmdbuf->num_relocs = 0, cmdbuf->num_waitchks = 0;
+#endif
+
+    cmdbuf->cur_word = (uint32_t *)(mem + cmdbuf->mem_offset);
+    return 0;
+}
+
+int av_nvtegra_cmdbuf_begin(AVNVTegraCmdbuf *cmdbuf, uint32_t class_id) {
+    uint8_t *mem;
+    void *tmp1;
+#ifndef __SWITCH__
+    void *tmp2, *tmp3;
+#endif
+
+    mem = av_nvtegra_map_get_addr(cmdbuf->map);
+
+    tmp1 = av_realloc_array(cmdbuf->cmdbufs,     cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->cmdbufs));
+#ifndef __SWITCH__
+    tmp2 = av_realloc_array(cmdbuf->cmdbuf_exts, cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->cmdbuf_exts));
+    tmp3 = av_realloc_array(cmdbuf->class_ids,   cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->class_ids));
+#endif
+
+#ifndef __SWITCH__
+    if (!tmp1 || !tmp2 || !tmp3)
+#else
+    if (!tmp1)
+#endif
+        return AVERROR(ENOMEM);
+
+    cmdbuf->cmdbufs = tmp1;
+
+#ifndef __SWITCH__
+    cmdbuf->cmdbuf_exts = tmp2, cmdbuf->class_ids = tmp3;
+#endif
+
+    cmdbuf->cmdbufs[cmdbuf->num_cmdbufs] = (struct nvhost_cmdbuf){
+        .mem       = av_nvtegra_map_get_handle(cmdbuf->map),
+        .offset    = (uint8_t *)cmdbuf->cur_word - mem,
+    };
+
+#ifndef __SWITCH__
+    cmdbuf->cmdbuf_exts[cmdbuf->num_cmdbufs] = (struct nvhost_cmdbuf_ext){
+        .pre_fence = -1,
+    };
+
+    cmdbuf->class_ids[cmdbuf->num_cmdbufs] = class_id;
+#endif
+
+#ifdef __SWITCH__
+    if (cmdbuf->num_cmdbufs == 0)
+        av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_setclass(class_id, 0, 0));
+#endif
+
+    return 0;
+}
+
+int av_nvtegra_cmdbuf_end(AVNVTegraCmdbuf *cmdbuf) {
+    cmdbuf->num_cmdbufs++;
+    return 0;
+}
+
+int av_nvtegra_cmdbuf_push_word(AVNVTegraCmdbuf *cmdbuf, uint32_t word) {
+    uintptr_t mem_start = (uintptr_t)av_nvtegra_map_get_addr(cmdbuf->map) + cmdbuf->mem_offset;
+
+    if ((uintptr_t)cmdbuf->cur_word - mem_start >= cmdbuf->mem_size)
+        return AVERROR(ENOMEM);
+
+    *cmdbuf->cur_word++ = word;
+    cmdbuf->cmdbufs[cmdbuf->num_cmdbufs].words += 1;
+    return 0;
+}
+
+int av_nvtegra_cmdbuf_push_value(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, uint32_t word) {
+    int err;
+
+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_incr(NV_THI_METHOD0>>2, 2));
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, offset);
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, word);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+int av_nvtegra_cmdbuf_push_reloc(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, AVNVTegraMap *target, uint32_t target_offset,
+                                 int reloc_type, int shift)
+{
+    int err;
+#ifndef __SWITCH__
+    uint8_t *mem;
+    void *tmp1, *tmp2, *tmp3;
+
+    mem = av_nvtegra_map_get_addr(cmdbuf->map);
+
+    tmp1 = av_realloc_array(cmdbuf->relocs,       cmdbuf->num_relocs + 1, sizeof(*cmdbuf->relocs));
+    tmp2 = av_realloc_array(cmdbuf->reloc_types,  cmdbuf->num_relocs + 1, sizeof(*cmdbuf->reloc_types));
+    tmp3 = av_realloc_array(cmdbuf->reloc_shifts, cmdbuf->num_relocs + 1, sizeof(*cmdbuf->reloc_shifts));
+    if (!tmp1 || !tmp2 || !tmp3)
+        return AVERROR(ENOMEM);
+
+    cmdbuf->relocs = tmp1, cmdbuf->reloc_types = tmp2, cmdbuf->reloc_shifts = tmp3;
+
+    err = av_nvtegra_cmdbuf_push_value(cmdbuf, offset, 0xdeadbeef);
+    if (err < 0)
+        return err;
+
+    cmdbuf->relocs[cmdbuf->num_relocs]       = (struct nvhost_reloc){
+        .cmdbuf_mem    = av_nvtegra_map_get_handle(cmdbuf->map),
+        .cmdbuf_offset = (uint8_t *)cmdbuf->cur_word - mem - sizeof(uint32_t),
+        .target        = av_nvtegra_map_get_handle(target),
+        .target_offset = target_offset,
+    };
+
+    cmdbuf->reloc_types[cmdbuf->num_relocs]  = (struct nvhost_reloc_type){
+        .reloc_type    = reloc_type,
+    };
+
+    cmdbuf->reloc_shifts[cmdbuf->num_relocs] = (struct nvhost_reloc_shift){
+        .shift         = shift,
+    };
+
+    cmdbuf->num_relocs++;
+
+    return 0;
+#else
+    err = av_nvtegra_cmdbuf_push_value(cmdbuf, offset, (target->iova + target_offset) >> shift);
+    if (err < 0)
+        return err;
+
+    return 0;
+#endif
+}
+
+int av_nvtegra_cmdbuf_push_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt) {
+    int err;
+
+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_nonincr(NV_THI_INCR_SYNCPT>>2, 1));
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_cmdbuf_push_word(cmdbuf,
+                                      AV_NVTEGRA_VALUE(NV_THI_INCR_SYNCPT, INDX, syncpt) |
+                                      AV_NVTEGRA_ENUM (NV_THI_INCR_SYNCPT, COND, OP_DONE));
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+int av_nvtegra_cmdbuf_push_wait(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence) {
+    int err;
+
+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_setclass(HOST1X_CLASS_HOST1X, 0, 0));
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_mask(NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD>>2,
+                                      (1<<(NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD - NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD)) |
+                                      (1<<(NV_CLASS_HOST_WAIT_SYNCPT         - NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD))));
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, fence);
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, syncpt);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+int av_nvtegra_cmdbuf_add_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence)
+{
+    void *tmp1;
+#ifndef __SWITCH__
+    void *tmp2;
+#endif
+
+    tmp1 = av_realloc_array(cmdbuf->syncpt_incrs, cmdbuf->num_syncpt_incrs + 1, sizeof(*cmdbuf->syncpt_incrs));
+#ifndef __SWITCH__
+    tmp2 = av_realloc_array(cmdbuf->fences,       cmdbuf->num_syncpt_incrs + 1, sizeof(*cmdbuf->fences));
+#endif
+
+#ifndef __SWITCH__
+    if (!tmp1 || !tmp2)
+#else
+    if (!tmp1)
+#endif
+        return AVERROR(ENOMEM);
+
+    cmdbuf->syncpt_incrs = tmp1;
+#ifndef __SWITCH__
+    cmdbuf->fences       = tmp2;
+#endif
+
+    cmdbuf->syncpt_incrs[cmdbuf->num_syncpt_incrs] = (struct nvhost_syncpt_incr){
+        .syncpt_id    = syncpt,
+        .syncpt_incrs = 1,
+    };
+
+#ifndef __SWITCH__
+    cmdbuf->fences[cmdbuf->num_syncpt_incrs]       = fence;
+#endif
+
+    cmdbuf->num_syncpt_incrs++;
+
+    return av_nvtegra_cmdbuf_push_syncpt_incr(cmdbuf, syncpt);
+}
+
+int av_nvtegra_cmdbuf_add_waitchk(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence) {
+#ifndef __SWITCH__
+    uint8_t *mem;
+    void *tmp;
+
+    mem = av_nvtegra_map_get_addr(cmdbuf->map);
+
+    tmp = av_realloc_array(cmdbuf->waitchks, cmdbuf->num_waitchks + 1, sizeof(*cmdbuf->waitchks));
+    if (!tmp)
+        return AVERROR(ENOMEM);
+
+    cmdbuf->waitchks = tmp;
+
+    cmdbuf->waitchks[cmdbuf->num_waitchks] = (struct nvhost_waitchk){
+        .mem       = av_nvtegra_map_get_handle(cmdbuf->map),
+        .offset    = (uint8_t *)cmdbuf->cur_word - mem - sizeof(uint32_t),
+        .syncpt_id = syncpt,
+        .thresh    = fence,
+    };
+
+    cmdbuf->num_waitchks++;
+#endif
+
+    return av_nvtegra_cmdbuf_push_wait(cmdbuf, syncpt, fence);
+}
+
+static void nvtegra_job_free(void *opaque, uint8_t *data) {
+    AVNVTegraJob *job = (AVNVTegraJob *)data;
+
+    if (!job)
+        return;
+
+    av_nvtegra_cmdbuf_deinit(&job->cmdbuf);
+    av_nvtegra_map_destroy(&job->input_map);
+
+    av_freep(&job);
+}
+
+static AVBufferRef *nvtegra_job_alloc(void *opaque, size_t size) {
+    AVNVTegraJobPool *pool = opaque;
+
+    AVBufferRef  *buffer;
+    AVNVTegraJob *job;
+    int err;
+
+    job = av_mallocz(sizeof(*job));
+    if (!job)
+        return NULL;
+
+    err = av_nvtegra_map_create(&job->input_map, pool->channel, pool->input_map_size, 0x100,
+                                NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
+    if (err < 0)
+        goto fail;
+
+    err = av_nvtegra_cmdbuf_init(&job->cmdbuf);
+    if (err < 0)
+        goto fail;
+
+    err = av_nvtegra_cmdbuf_add_memory(&job->cmdbuf, &job->input_map, pool->cmdbuf_off, pool->max_cmdbuf_size);
+    if (err < 0)
+        goto fail;
+
+    buffer = av_buffer_create((uint8_t *)job, sizeof(*job), nvtegra_job_free, pool, 0);
+    if (!buffer)
+        goto fail;
+
+    return buffer;
+
+fail:
+    av_nvtegra_cmdbuf_deinit(&job->cmdbuf);
+    av_nvtegra_map_destroy(&job->input_map);
+    av_freep(job);
+    return NULL;
+}
+
+int av_nvtegra_job_pool_init(AVNVTegraJobPool *pool, AVNVTegraChannel *channel,
+                             size_t input_map_size, off_t cmdbuf_off, size_t max_cmdbuf_size)
+{
+    pool->channel         = channel;
+    pool->input_map_size  = input_map_size;
+    pool->cmdbuf_off      = cmdbuf_off;
+    pool->max_cmdbuf_size = max_cmdbuf_size;
+    pool->pool            = av_buffer_pool_init2(sizeof(AVNVTegraJob), pool,
+                                                 nvtegra_job_alloc, NULL);
+    if (!pool->pool)
+        return AVERROR(ENOMEM);
+
+    return 0;
+}
+
+int av_nvtegra_job_pool_uninit(AVNVTegraJobPool *pool) {
+    av_buffer_pool_uninit(&pool->pool);
+    return 0;
+}
+
+AVBufferRef *av_nvtegra_job_pool_get(AVNVTegraJobPool *pool) {
+    return av_buffer_pool_get(pool->pool);
+}
+
+int av_nvtegra_job_submit(AVNVTegraJobPool *pool, AVNVTegraJob *job) {
+    return av_nvtegra_channel_submit(pool->channel, &job->cmdbuf, &job->fence);
+}
+
+int av_nvtegra_job_wait(AVNVTegraJobPool *pool, AVNVTegraJob *job, int timeout) {
+    return av_nvtegra_syncpt_wait(pool->channel, job->fence, timeout);
+}
diff --git a/libavutil/nvtegra.h b/libavutil/nvtegra.h
new file mode 100644
index 0000000000..3b63335d6c
--- /dev/null
+++ b/libavutil/nvtegra.h
@@ -0,0 +1,258 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef AVUTIL_NVTEGRA_H
+#define AVUTIL_NVTEGRA_H
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include "buffer.h"
+
+#include "nvhost_ioctl.h"
+#include "nvmap_ioctl.h"
+
+typedef struct AVNVTegraChannel {
+#ifndef __SWITCH__
+    int fd;
+    int module_id;
+#else
+    NvChannel channel;
+#endif
+
+    uint32_t syncpt;
+
+#ifdef __SWITCH__
+    MmuRequest mmu_request;
+#endif
+    uint32_t clock;
+} AVNVTegraChannel;
+
+typedef struct AVNVTegraMap {
+#ifndef __SWITCH__
+    uint32_t handle;
+    uint32_t size;
+    void *cpu_addr;
+#else
+    NvMap map;
+    uint32_t iova;
+    uint32_t owner;
+#endif
+    bool is_linear;
+} AVNVTegraMap;
+
+typedef struct AVNVTegraCmdbuf {
+    AVNVTegraMap *map;
+
+    uint32_t mem_offset, mem_size;
+
+    uint32_t *cur_word;
+
+    struct nvhost_cmdbuf       *cmdbufs;
+#ifndef __SWITCH__
+    struct nvhost_cmdbuf_ext   *cmdbuf_exts;
+    uint32_t                   *class_ids;
+#endif
+    uint32_t num_cmdbufs;
+
+#ifndef __SWITCH__
+    struct nvhost_reloc        *relocs;
+    struct nvhost_reloc_type   *reloc_types;
+    struct nvhost_reloc_shift  *reloc_shifts;
+    uint32_t num_relocs;
+#endif
+
+    struct nvhost_syncpt_incr  *syncpt_incrs;
+#ifndef __SWITCH__
+    uint32_t                   *fences;
+#endif
+    uint32_t num_syncpt_incrs;
+
+#ifndef __SWITCH__
+    struct nvhost_waitchk      *waitchks;
+    uint32_t num_waitchks;
+#endif
+} AVNVTegraCmdbuf;
+
+typedef struct AVNVTegraJobPool {
+    /*
+     * Pool object for job allocation
+     */
+    AVBufferPool *pool;
+
+    /*
+     * Hardware channel the jobs will be submitted to
+     */
+    AVNVTegraChannel *channel;
+
+    /*
+     * Total size of the input memory-mapped buffer
+     */
+    size_t input_map_size;
+
+    /*
+     * Offset of the command data within the input map
+     */
+    off_t cmdbuf_off;
+
+    /*
+     * Maximum memory usable by the command buffer
+     */
+    size_t max_cmdbuf_size;
+} AVNVTegraJobPool;
+
+typedef struct AVNVTegraJob {
+    /*
+     * Memory-mapped buffer for command buffers, metadata structures, ...
+     */
+    AVNVTegraMap input_map;
+
+    /*
+     * Object for command recording
+     */
+    AVNVTegraCmdbuf cmdbuf;
+
+    /*
+     * Fence indicating completion of the job
+     */
+    uint32_t fence;
+} AVNVTegraJob;
+
+AVBufferRef *av_nvtegra_driver_init(void);
+
+int av_nvtegra_channel_open(AVNVTegraChannel *channel, const char *dev);
+int av_nvtegra_channel_close(AVNVTegraChannel *channel);
+int av_nvtegra_channel_get_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t *clock_rate);
+int av_nvtegra_channel_set_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t clock_rate);
+int av_nvtegra_channel_submit(AVNVTegraChannel *channel, AVNVTegraCmdbuf *cmdbuf, uint32_t *fence);
+int av_nvtegra_channel_set_submit_timeout(AVNVTegraChannel *channel, uint32_t timeout_ms);
+
+int av_nvtegra_syncpt_wait(AVNVTegraChannel *channel, uint32_t threshold, int32_t timeout);
+
+int av_nvtegra_map_allocate(AVNVTegraMap *map, AVNVTegraChannel *owner, uint32_t size,
+                            uint32_t align, int heap_mask, int flags);
+int av_nvtegra_map_free(AVNVTegraMap *map);
+int av_nvtegra_map_from_va(AVNVTegraMap *map, AVNVTegraChannel *owner, void *mem,
+                           uint32_t size, uint32_t align, uint32_t flags);
+int av_nvtegra_map_close(AVNVTegraMap *map);
+int av_nvtegra_map_map(AVNVTegraMap *map);
+int av_nvtegra_map_unmap(AVNVTegraMap *map);
+int av_nvtegra_map_cache_op(AVNVTegraMap *map, int op, void *addr, size_t len);
+int av_nvtegra_map_realloc(AVNVTegraMap *map, uint32_t size, uint32_t align, int heap_mask, int flags);
+
+static inline int av_nvtegra_map_create(AVNVTegraMap *map, AVNVTegraChannel *owner, uint32_t size, uint32_t align,
+                                        int heap_mask, int flags)
+{
+    int err;
+
+    err = av_nvtegra_map_allocate(map, owner, size, align, heap_mask, flags);
+    if (err < 0)
+        return err;
+
+    return av_nvtegra_map_map(map);
+}
+
+static inline int av_nvtegra_map_destroy(AVNVTegraMap *map) {
+    int err;
+
+    err = av_nvtegra_map_unmap(map);
+    if (err < 0)
+        return err;
+
+    return av_nvtegra_map_free(map);
+}
+
+int av_nvtegra_cmdbuf_init(AVNVTegraCmdbuf *cmdbuf);
+int av_nvtegra_cmdbuf_deinit(AVNVTegraCmdbuf *cmdbuf);
+int av_nvtegra_cmdbuf_add_memory(AVNVTegraCmdbuf *cmdbuf, AVNVTegraMap *map, uint32_t offset, uint32_t size);
+int av_nvtegra_cmdbuf_clear(AVNVTegraCmdbuf *cmdbuf);
+int av_nvtegra_cmdbuf_begin(AVNVTegraCmdbuf *cmdbuf, uint32_t class_id);
+int av_nvtegra_cmdbuf_end(AVNVTegraCmdbuf *cmdbuf);
+int av_nvtegra_cmdbuf_push_word(AVNVTegraCmdbuf *cmdbuf, uint32_t word);
+int av_nvtegra_cmdbuf_push_value(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, uint32_t word);
+int av_nvtegra_cmdbuf_push_reloc(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, AVNVTegraMap *target, uint32_t target_offset,
+                                 int reloc_type, int shift);
+int av_nvtegra_cmdbuf_push_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt);
+int av_nvtegra_cmdbuf_push_wait(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence);
+int av_nvtegra_cmdbuf_add_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence);
+int av_nvtegra_cmdbuf_add_waitchk(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence);
+
+/*
+ * Job allocation and submission routines
+ */
+int av_nvtegra_job_pool_init(AVNVTegraJobPool *pool, AVNVTegraChannel *channel,
+                             size_t input_map_size, off_t cmdbuf_off, size_t max_cmdbuf_size);
+int av_nvtegra_job_pool_uninit(AVNVTegraJobPool *pool);
+AVBufferRef *av_nvtegra_job_pool_get(AVNVTegraJobPool *pool);
+
+int av_nvtegra_job_submit(AVNVTegraJobPool *pool, AVNVTegraJob *job);
+int av_nvtegra_job_wait(AVNVTegraJobPool *pool, AVNVTegraJob *job, int timeout);
+
+static inline uint32_t av_nvtegra_map_get_handle(AVNVTegraMap *map) {
+#ifndef __SWITCH__
+    return map->handle;
+#else
+    return map->map.handle;
+#endif
+}
+
+static inline void *av_nvtegra_map_get_addr(AVNVTegraMap *map) {
+#ifndef __SWITCH__
+    return map->cpu_addr;
+#else
+    return map->map.cpu_addr;
+#endif
+}
+
+static inline uint32_t av_nvtegra_map_get_size(AVNVTegraMap *map) {
+#ifndef __SWITCH__
+    return map->size;
+#else
+    return map->map.size;
+#endif
+}
+
+/* Addresses are shifted by 8 bits in the command buffer, requiring an alignment to 256 */
+#define AV_NVTEGRA_MAP_ALIGN (1 << 8)
+
+#define AV_NVTEGRA_VALUE(offset, field, value)                                                    \
+    ((value &                                                                                     \
+    ((uint32_t)((UINT64_C(1) << ((1?offset ## _ ## field) - (0?offset ## _ ## field) + 1)) - 1))) \
+    << (0?offset ## _ ## field))
+
+#define AV_NVTEGRA_ENUM(offset, field, value)                                                     \
+    ((offset ## _ ## field ## _ ## value &                                                        \
+    ((uint32_t)((UINT64_C(1) << ((1?offset ## _ ## field) - (0?offset ## _ ## field) + 1)) - 1))) \
+    << (0?offset ## _ ## field))
+
+#define AV_NVTEGRA_PUSH_VALUE(cmdbuf, offset, value) ({                                  \
+    int _err = av_nvtegra_cmdbuf_push_value(cmdbuf, (offset) / sizeof(uint32_t), value); \
+    if (_err < 0)                                                                        \
+        return _err;                                                                     \
+})
+
+#define AV_NVTEGRA_PUSH_RELOC(cmdbuf, offset, target, target_offset, type) ({    \
+    int _err = av_nvtegra_cmdbuf_push_reloc(cmdbuf, (offset) / sizeof(uint32_t), \
+                                            target, target_offset, type, 8);     \
+    if (_err < 0)                                                                \
+        return _err;                                                             \
+})
+
+#endif /* AVUTIL_NVTEGRA_H */
diff --git a/libavutil/nvtegra_host1x.h b/libavutil/nvtegra_host1x.h
new file mode 100644
index 0000000000..25e37eae61
--- /dev/null
+++ b/libavutil/nvtegra_host1x.h
@@ -0,0 +1,94 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef AVUTIL_NVTEGRA_HOST1X_H
+#define AVUTIL_NVTEGRA_HOST1X_H
+
+#include <stdint.h>
+
+#include "macros.h"
+
+/* From L4T include/linux/host1x.h */
+enum host1x_class {
+    HOST1X_CLASS_HOST1X  = 0x01,
+    HOST1X_CLASS_NVENC   = 0x21,
+    HOST1X_CLASS_VI      = 0x30,
+    HOST1X_CLASS_ISPA    = 0x32,
+    HOST1X_CLASS_ISPB    = 0x34,
+    HOST1X_CLASS_GR2D    = 0x51,
+    HOST1X_CLASS_GR2D_SB = 0x52,
+    HOST1X_CLASS_VIC     = 0x5d,
+    HOST1X_CLASS_GR3D    = 0x60,
+    HOST1X_CLASS_NVJPG   = 0xc0,
+    HOST1X_CLASS_NVDEC   = 0xf0,
+};
+
+static inline uint32_t host1x_opcode_setclass(unsigned class_id, unsigned offset, unsigned mask) {
+    return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
+}
+
+static inline uint32_t host1x_opcode_incr(unsigned offset, unsigned count) {
+    return (1 << 28) | (offset << 16) | count;
+}
+
+static inline uint32_t host1x_opcode_nonincr(unsigned offset, unsigned count) {
+    return (2 << 28) | (offset << 16) | count;
+}
+
+static inline uint32_t host1x_opcode_mask(unsigned offset, unsigned mask) {
+    return (3 << 28) | (offset << 16) | mask;
+}
+
+static inline uint32_t host1x_opcode_imm(unsigned offset, unsigned value) {
+    return (4 << 28) | (offset << 16) | value;
+}
+
+#define NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD                                  (0x00000138)
+#define NV_CLASS_HOST_WAIT_SYNCPT                                          (0x00000140)
+
+#define NV_THI_INCR_SYNCPT                                                 (0x00000000)
+#define NV_THI_INCR_SYNCPT_INDX                                            7:0
+#define NV_THI_INCR_SYNCPT_COND                                            15:8
+#define NV_THI_INCR_SYNCPT_COND_IMMEDIATE                                  (0x00000000)
+#define NV_THI_INCR_SYNCPT_COND_OP_DONE                                    (0x00000001)
+#define NV_THI_INCR_SYNCPT_ERR                                             (0x00000008)
+#define NV_THI_INCR_SYNCPT_ERR_COND_STS_IMM                                0:0
+#define NV_THI_INCR_SYNCPT_ERR_COND_STS_OPDONE                             1:1
+#define NV_THI_CTXSW_INCR_SYNCPT                                           (0x0000000c)
+#define NV_THI_CTXSW_INCR_SYNCPT_INDX                                      7:0
+#define NV_THI_CTXSW                                                       (0x00000020)
+#define NV_THI_CTXSW_CURR_CLASS                                            9:0
+#define NV_THI_CTXSW_AUTO_ACK                                              11:11
+#define NV_THI_CTXSW_CURR_CHANNEL                                          15:12
+#define NV_THI_CTXSW_NEXT_CLASS                                            25:16
+#define NV_THI_CTXSW_NEXT_CHANNEL                                          31:28
+#define NV_THI_CONT_SYNCPT_EOF                                             (0x00000028)
+#define NV_THI_CONT_SYNCPT_EOF_INDEX                                       7:0
+#define NV_THI_CONT_SYNCPT_EOF_COND                                        8:8
+#define NV_THI_METHOD0                                                     (0x00000040)
+#define NV_THI_METHOD0_OFFSET                                              11:0
+#define NV_THI_METHOD1                                                     (0x00000044)
+#define NV_THI_METHOD1_DATA                                                31:0
+#define NV_THI_INT_STATUS                                                  (0x00000078)
+#define NV_THI_INT_STATUS_FALCON_INT                                       0:0
+#define NV_THI_INT_MASK                                                    (0x0000007c)
+#define NV_THI_INT_MASK_FALCON_INT                                         0:0
+
+#endif /* AVUTIL_NVTEGRA_HOST1X_H */
diff --git a/libavutil/pixdesc.c b/libavutil/pixdesc.c
index 1c0bcf2232..bb14b1b306 100644
--- a/libavutil/pixdesc.c
+++ b/libavutil/pixdesc.c
@@ -2791,6 +2791,10 @@ static const AVPixFmtDescriptor av_pix_fmt_descriptors[AV_PIX_FMT_NB] = {
         },
         .flags = AV_PIX_FMT_FLAG_PLANAR,
     },
+    [AV_PIX_FMT_NVTEGRA] = {
+        .name = "nvtegra",
+        .flags = AV_PIX_FMT_FLAG_HWACCEL,
+    },
 };
 
 static const char * const color_range_names[] = {
diff --git a/libavutil/pixfmt.h b/libavutil/pixfmt.h
index a7f50e1690..a3213c792a 100644
--- a/libavutil/pixfmt.h
+++ b/libavutil/pixfmt.h
@@ -439,6 +439,14 @@ enum AVPixelFormat {
      */
     AV_PIX_FMT_D3D12,
 
+    /**
+     * Hardware surfaces for Tegra devices.
+     *
+     * data[0..2] points to memory-mapped buffer containing frame data
+     * buf[0] contains an AVBufferRef to an AVNTegraMap
+     */
+    AV_PIX_FMT_NVTEGRA,
+
     AV_PIX_FMT_NB         ///< number of pixel formats, DO NOT USE THIS if you want to link with shared libav* because the number of formats might differ between versions
 };
 
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (4 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra averne
@ 2024-05-30 19:43 ` averne
  2024-06-05 20:47   ` Mark Thompson
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 07/16] hwcontext_nvtegra: add dynamic frequency scaling routines averne
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

This includes hwdevice and hwframes objects.
As the multimedia engines work with tiled surfaces (block linear in nvidia jargon), two frame transfer methods are implemented.
The first makes use of the VIC to perform the copy. Since some revisions of the VIC (such as the one found in the tegra X1) did not support 10+ bit formats, these go through two separate copy steps for the luma and chroma planes.
The second method copies on the CPU, and is used as a fallback if the VIC constraints are not satisfied.

Signed-off-by: averne <averne381@gmail.com>
---
 libavutil/Makefile             |   7 +-
 libavutil/hwcontext.c          |   4 +
 libavutil/hwcontext.h          |   1 +
 libavutil/hwcontext_internal.h |   1 +
 libavutil/hwcontext_nvtegra.c  | 880 +++++++++++++++++++++++++++++++++
 libavutil/hwcontext_nvtegra.h  |  85 ++++
 6 files changed, 976 insertions(+), 2 deletions(-)
 create mode 100644 libavutil/hwcontext_nvtegra.c
 create mode 100644 libavutil/hwcontext_nvtegra.h

diff --git a/libavutil/Makefile b/libavutil/Makefile
index 733a23a8a3..44cd3f0dda 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -52,6 +52,7 @@ HEADERS = adler32.h                                                     \
           hwcontext_videotoolbox.h                                      \
           hwcontext_vdpau.h                                             \
           hwcontext_vulkan.h                                            \
+          hwcontext_nvtegra.h                                           \
           nvtegra.h                                                     \
           nvhost_ioctl.h                                                \
           nvmap_ioctl.h                                                 \
@@ -210,7 +211,7 @@ OBJS-$(CONFIG_VDPAU)                    += hwcontext_vdpau.o
 OBJS-$(CONFIG_VULKAN)                   += hwcontext_vulkan.o vulkan.o
 
 OBJS-$(!CONFIG_VULKAN)                  += hwcontext_stub.o
-OBJS-$(CONFIG_NVTEGRA)                  += nvtegra.o
+OBJS-$(CONFIG_NVTEGRA)                  += nvtegra.o hwcontext_nvtegra.o
 
 OBJS += $(COMPAT_OBJS:%=../compat/%)
 
@@ -233,7 +234,9 @@ SKIPHEADERS-$(CONFIG_VULKAN)           += hwcontext_vulkan.h vulkan.h   \
                                           vulkan_functions.h            \
                                           vulkan_loader.h
 SKIPHEADERS-$(CONFIG_NVTEGRA)          += nvtegra.h                     \
-                                          nvtegra_host1x.h
+                                          nvtegra_host1x.h              \
+                                          hwcontext_nvtegra.h
+
 
 TESTPROGS = adler32                                                     \
             aes                                                         \
diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
index fa99a0d8a4..8dd05147a4 100644
--- a/libavutil/hwcontext.c
+++ b/libavutil/hwcontext.c
@@ -65,6 +65,9 @@ static const HWContextType * const hw_table[] = {
 #endif
 #if CONFIG_VULKAN
     &ff_hwcontext_type_vulkan,
+#endif
+#if CONFIG_NVTEGRA
+    &ff_hwcontext_type_nvtegra,
 #endif
     NULL,
 };
@@ -82,6 +85,7 @@ static const char *const hw_type_names[] = {
     [AV_HWDEVICE_TYPE_VIDEOTOOLBOX] = "videotoolbox",
     [AV_HWDEVICE_TYPE_MEDIACODEC] = "mediacodec",
     [AV_HWDEVICE_TYPE_VULKAN] = "vulkan",
+    [AV_HWDEVICE_TYPE_NVTEGRA] = "nvtegra",
 };
 
 typedef struct FFHWDeviceContext {
diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
index bac30debae..d506281784 100644
--- a/libavutil/hwcontext.h
+++ b/libavutil/hwcontext.h
@@ -38,6 +38,7 @@ enum AVHWDeviceType {
     AV_HWDEVICE_TYPE_MEDIACODEC,
     AV_HWDEVICE_TYPE_VULKAN,
     AV_HWDEVICE_TYPE_D3D12VA,
+    AV_HWDEVICE_TYPE_NVTEGRA,
 };
 
 /**
diff --git a/libavutil/hwcontext_internal.h b/libavutil/hwcontext_internal.h
index e32b786238..478583abdd 100644
--- a/libavutil/hwcontext_internal.h
+++ b/libavutil/hwcontext_internal.h
@@ -163,5 +163,6 @@ extern const HWContextType ff_hwcontext_type_vdpau;
 extern const HWContextType ff_hwcontext_type_videotoolbox;
 extern const HWContextType ff_hwcontext_type_mediacodec;
 extern const HWContextType ff_hwcontext_type_vulkan;
+extern const HWContextType ff_hwcontext_type_nvtegra;
 
 #endif /* AVUTIL_HWCONTEXT_INTERNAL_H */
diff --git a/libavutil/hwcontext_nvtegra.c b/libavutil/hwcontext_nvtegra.c
new file mode 100644
index 0000000000..0f4d5a323b
--- /dev/null
+++ b/libavutil/hwcontext_nvtegra.c
@@ -0,0 +1,880 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdbool.h>
+
+#include "config.h"
+#include "pixdesc.h"
+#include "imgutils.h"
+#include "internal.h"
+#include "mem.h"
+#include "time.h"
+
+#include "hwcontext.h"
+#include "hwcontext_internal.h"
+
+#include "nvhost_ioctl.h"
+#include "nvmap_ioctl.h"
+#include "nvtegra_host1x.h"
+#include "clb0b6.h"
+#include "vic_drv.h"
+
+#include "hwcontext_nvtegra.h"
+
+typedef struct NVTegraDevicePriv {
+    /* The public AVNVTegraDeviceContext */
+    AVNVTegraDeviceContext p;
+
+    AVBufferRef *driver_state_ref;
+
+    AVNVTegraJobPool job_pool;
+    uint32_t vic_setup_off, vic_cmdbuf_off;
+} NVTegraDevicePriv;
+
+static const enum AVPixelFormat supported_sw_formats[] = {
+    AV_PIX_FMT_GRAY8,
+    AV_PIX_FMT_NV12,
+    AV_PIX_FMT_P010,
+    AV_PIX_FMT_YUV420P,
+};
+
+int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt) {
+    switch (fmt) {
+        case AV_PIX_FMT_GRAY8:
+            return NVB0B6_T_L8;
+        case AV_PIX_FMT_NV12:
+            return NVB0B6_T_Y8___U8V8_N420;
+        case AV_PIX_FMT_YUV420P:
+            return NVB0B6_T_Y8___U8___V8_N420;
+        case AV_PIX_FMT_RGB565:
+            return NVB0B6_T_R5G6B5;
+        case AV_PIX_FMT_RGB32:
+            return NVB0B6_T_A8R8G8B8;
+        case AV_PIX_FMT_BGR32:
+            return NVB0B6_T_A8B8G8R8;
+        case AV_PIX_FMT_RGB32_1:
+            return NVB0B6_T_R8G8B8A8;
+        case AV_PIX_FMT_BGR32_1:
+            return NVB0B6_T_B8G8R8A8;
+        case AV_PIX_FMT_0RGB32:
+            return NVB0B6_T_X8R8G8B8;
+        case AV_PIX_FMT_0BGR32:
+            return NVB0B6_T_X8B8G8R8;
+        default:
+            return -1;
+    }
+}
+
+static inline uint32_t nvtegra_surface_get_width_align(enum AVPixelFormat fmt, const AVComponentDescriptor *comp) {
+    int step = comp->step;
+
+    if (fmt != AV_PIX_FMT_NVTEGRA)
+        return 256 / step; /* Pitch linear surfaces must be aligned to 256B for VIC */
+
+    /*
+     * GOBs are 64B wide.
+     * In addition, we use a 32Bx8 cache width in VIC for block linear surfaces.
+     */
+    return 64 / step;
+}
+
+static inline uint32_t nvtegra_surface_get_height_align(enum AVPixelFormat fmt, const AVComponentDescriptor *comp) {
+    /* Height alignment is in terms of lines, not bytes, therefore we don't divide by the sample step */
+    if (fmt != AV_PIX_FMT_NVTEGRA)
+        return 4; /* We use 64Bx4 cache width in VIC for pitch linear surfaces */
+
+    /*
+     * GOBs are 8B high, and we use a GOB height of 2.
+     * In addition, we use a 32Bx8 cache width in VIC for block linear surfaces.
+     * We double this requirement to make sure it is respected for the subsampled chroma plane.
+     */
+    return 32;
+}
+
+static void nvtegra_device_uninit(AVHWDeviceContext *ctx) {
+    NVTegraDevicePriv       *priv = ctx->hwctx;
+    AVNVTegraDeviceContext *hwctx = &priv->p;
+
+    av_log(ctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA device\n");
+
+    av_nvtegra_job_pool_uninit(&priv->job_pool);
+
+    if (hwctx->nvdec_version) {
+        av_nvtegra_channel_close(&hwctx->nvdec_channel);
+#ifdef __SWITCH__
+        mmuRequestFinalize(&hwctx->nvdec_channel.mmu_request);
+#endif
+    }
+
+    if (hwctx->nvjpg_version) {
+        av_nvtegra_channel_close(&hwctx->nvjpg_channel);
+#ifdef __SWITCH__
+        mmuRequestFinalize(&hwctx->nvjpg_channel.mmu_request);
+#endif
+    }
+
+    av_nvtegra_channel_close(&hwctx->vic_channel);
+
+    av_buffer_unref(&priv->driver_state_ref);
+}
+
+/*
+ * Hardware modules on the Tegra X1 (see t210.c in l4t kernel sources)
+ * - nvdec v2.0
+ * - nvenc v5.0
+ * - nvjpg v1.0
+ * - vic   v4.0
+ */
+
+static int nvtegra_device_init(AVHWDeviceContext *ctx) {
+    NVTegraDevicePriv       *priv = ctx->hwctx;
+    AVNVTegraDeviceContext *hwctx = &priv->p;
+
+    uint32_t vic_map_size;
+    int err;
+
+    av_log(ctx, AV_LOG_DEBUG, "Initializing NVTEGRA device\n");
+
+    err = av_nvtegra_channel_open(&hwctx->nvdec_channel, "/dev/nvhost-nvdec");
+    if (!err)
+        hwctx->nvdec_version = AV_NVTEGRA_ENCODE_REV(2,0);
+
+    err = av_nvtegra_channel_open(&hwctx->nvjpg_channel, "/dev/nvhost-nvjpg");
+    if (!err)
+        hwctx->nvjpg_version = AV_NVTEGRA_ENCODE_REV(1,0);
+
+    err = av_nvtegra_channel_open(&hwctx->vic_channel, "/dev/nvhost-vic");
+    if (err < 0)
+        goto fail;
+
+    hwctx->vic_version = AV_NVTEGRA_ENCODE_REV(4,0);
+
+    /* Note: Official code only sets this for the nvdec channel */
+    if (hwctx->nvdec_version) {
+        err = av_nvtegra_channel_set_submit_timeout(&hwctx->nvdec_channel, 1000);
+        if (err < 0)
+            goto fail;
+    }
+
+    if (hwctx->nvjpg_version) {
+        err = av_nvtegra_channel_set_submit_timeout(&hwctx->nvjpg_channel, 1000);
+        if (err < 0)
+            goto fail;
+    }
+
+    priv->vic_setup_off  = 0;
+    priv->vic_cmdbuf_off = FFALIGN(priv->vic_setup_off  + sizeof(VicConfigStruct),
+                                   AV_NVTEGRA_MAP_ALIGN);
+    vic_map_size         = FFALIGN(priv->vic_cmdbuf_off + AV_NVTEGRA_MAP_ALIGN,
+                                   0x1000);
+
+    err = av_nvtegra_job_pool_init(&priv->job_pool, &hwctx->vic_channel, vic_map_size,
+                                   priv->vic_cmdbuf_off, vic_map_size - priv->vic_cmdbuf_off);
+    if (err < 0)
+        goto fail;
+
+#ifndef __SWITCH__
+    hwctx->nvdec_channel.module_id = 0x75;
+    hwctx->nvjpg_channel.module_id = 0x76;
+#else
+    /*
+     * The NVHOST_IOCTL_CHANNEL_SET_CLK_RATE ioctl also exists on HOS but the clock rate
+     * will be reset when the console goes to sleep.
+     */
+    if (hwctx->nvdec_version) {
+        err = AVERROR(mmuRequestInitialize(&hwctx->nvdec_channel.mmu_request, (MmuModuleId)5, 8, false));
+        if (err < 0)
+            goto fail;
+    }
+
+    if (hwctx->nvjpg_version) {
+        err = AVERROR(mmuRequestInitialize(&hwctx->nvjpg_channel.mmu_request, MmuModuleId_Nvjpg, 8, false));
+        if (err < 0)
+            goto fail;
+    }
+#endif
+
+    return 0;
+
+fail:
+    nvtegra_device_uninit(ctx);
+    return err;
+}
+
+static int nvtegra_device_create(AVHWDeviceContext *ctx, const char *device,
+                                 AVDictionary *opts, int flags)
+{
+    NVTegraDevicePriv *priv = ctx->hwctx;
+
+    av_log(ctx, AV_LOG_DEBUG, "Creating NVTEGRA device\n");
+
+    priv->driver_state_ref = av_nvtegra_driver_init();
+    if (!priv->driver_state_ref) {
+        av_log(ctx, AV_LOG_ERROR, "Failed to create driver context, "
+                                  "make sure you are using a Tegra device\n");
+        return AVERROR(ENOSYS);
+    }
+
+    return 0;
+}
+
+static int nvtegra_frames_get_constraints(AVHWDeviceContext *ctx, const void *hwconfig,
+                                          AVHWFramesConstraints *constraints)
+{
+    av_log(ctx, AV_LOG_DEBUG, "Getting frame constraints for NVTEGRA device\n");
+
+    constraints->valid_sw_formats = av_malloc_array(FF_ARRAY_ELEMS(supported_sw_formats) + 1,
+                                                    sizeof(*constraints->valid_sw_formats));
+    if (!constraints->valid_sw_formats)
+        return AVERROR(ENOMEM);
+
+    for (int i = 0; i < FF_ARRAY_ELEMS(supported_sw_formats); ++i)
+        constraints->valid_sw_formats[i] = supported_sw_formats[i];
+    constraints->valid_sw_formats[FF_ARRAY_ELEMS(supported_sw_formats)] = AV_PIX_FMT_NONE;
+
+    constraints->valid_hw_formats = av_malloc_array(2, sizeof(*constraints->valid_hw_formats));
+    if (!constraints->valid_hw_formats)
+        return AVERROR(ENOMEM);
+
+    constraints->valid_hw_formats[0] = AV_PIX_FMT_NVTEGRA;
+    constraints->valid_hw_formats[1] = AV_PIX_FMT_NONE;
+
+    return 0;
+}
+
+static void nvtegra_map_free(void *opaque, uint8_t *data) {
+    AVNVTegraMap *map = (AVNVTegraMap *)data;
+
+    if (!map)
+        return;
+
+    av_nvtegra_map_destroy(map);
+
+    av_freep(&map);
+}
+
+static void nvtegra_frame_free(void *opaque, uint8_t *data) {
+    AVNVTegraFrame *frame = (AVNVTegraFrame *)data;
+
+    if (!frame)
+        return;
+
+    av_buffer_unref(&frame->map_ref);
+
+    av_freep(&frame);
+}
+
+static AVBufferRef *nvtegra_pool_alloc(void *opaque, size_t size) {
+    AVHWFramesContext        *ctx = opaque;
+    AVNVTegraDeviceContext *hwctx = &((NVTegraDevicePriv *)ctx->device_ctx->hwctx)->p;
+
+    AVBufferRef *buffer = NULL;
+    AVNVTegraFrame *frame = NULL;
+    AVNVTegraMap *map = NULL;
+    int err;
+
+    av_log(ctx, AV_LOG_DEBUG, "Creating surface from NVTEGRA device\n");
+
+    map = av_mallocz(sizeof(*map));
+    if (!map)
+        goto fail;
+
+    frame = av_mallocz(sizeof(*frame));
+    if (!map)
+        goto fail;
+
+    /*
+     * Framebuffers are allocated as CPU-cacheable, since they might get copied from
+     * during transfer operations. Cache management is done manually.
+     */
+    err = av_nvtegra_map_create(map, &hwctx->nvdec_channel, size, 0x100,
+                                NVMAP_HEAP_CARVEOUT_GENERIC, NVMAP_HANDLE_CACHEABLE);
+    if (err < 0)
+        goto fail;
+
+    /* Flush the CPU cache */
+    av_nvtegra_map_cache_op(map, NVMAP_CACHE_OP_WB, av_nvtegra_map_get_addr(map),
+                            av_nvtegra_map_get_size(map));
+
+    frame->map_ref = av_buffer_create((uint8_t *)map, sizeof(*map), nvtegra_map_free, ctx, 0);
+    if (!frame->map_ref)
+        goto fail;
+
+    buffer = av_buffer_create((uint8_t *)frame, sizeof(*frame), nvtegra_frame_free, ctx, 0);
+    if (!buffer)
+        goto fail;
+
+    return buffer;
+
+fail:
+    av_log(ctx, AV_LOG_ERROR, "Failed to create buffer\n");
+    nvtegra_frame_free(opaque, (uint8_t *)frame);
+    return NULL;
+}
+
+static int nvtegra_frames_init(AVHWFramesContext *ctx) {
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(ctx->sw_format);
+
+    uint32_t width_aligned, height_aligned, size;
+
+    av_log(ctx, AV_LOG_DEBUG, "Initializing frame pool for the NVTEGRA device\n");
+
+    if (!ctx->pool) {
+        width_aligned  = FFALIGN(ctx->width,  nvtegra_surface_get_width_align (ctx->format, &desc->comp[0]));
+        height_aligned = FFALIGN(ctx->height, nvtegra_surface_get_height_align(ctx->format, &desc->comp[0]));
+
+        size = av_image_get_buffer_size(ctx->sw_format, width_aligned, height_aligned,
+                                        nvtegra_surface_get_width_align(ctx->format, &desc->comp[0]));
+
+        ffhwframesctx(ctx)->pool_internal = av_buffer_pool_init2(size, ctx, nvtegra_pool_alloc, NULL);
+        if (!ffhwframesctx(ctx)->pool_internal)
+            return AVERROR(ENOMEM);
+    }
+
+    return 0;
+}
+
+static void nvtegra_frames_uninit(AVHWFramesContext *ctx) {
+    av_log(ctx, AV_LOG_DEBUG, "Deinitializing frame pool for the NVTEGRA device\n");
+}
+
+static int nvtegra_get_buffer(AVHWFramesContext *ctx, AVFrame *frame) {
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(ctx->sw_format);
+
+    AVNVTegraMap *map;
+    uint32_t width_aligned, height_aligned;
+    int err;
+
+    av_log(ctx, AV_LOG_DEBUG, "Getting frame buffer for NVTEGRA device\n");
+
+    frame->buf[0] = av_buffer_pool_get(ctx->pool);
+    if (!frame->buf[0])
+        return AVERROR(ENOMEM);
+
+    map = av_nvtegra_frame_get_fbuf_map(frame);
+
+    width_aligned  = FFALIGN(ctx->width,  nvtegra_surface_get_width_align (ctx->format, &desc->comp[0]));
+    height_aligned = FFALIGN(ctx->height, nvtegra_surface_get_height_align(ctx->format, &desc->comp[0]));
+
+    err = av_image_fill_arrays(frame->data, frame->linesize, av_nvtegra_map_get_addr(map),
+                               ctx->sw_format, width_aligned, height_aligned,
+                               nvtegra_surface_get_width_align(ctx->format, &desc->comp[0]));
+    if (err < 0)
+        return err;
+
+    frame->format = AV_PIX_FMT_NVTEGRA;
+    frame->width  = ctx->width;
+    frame->height = ctx->height;
+
+    return 0;
+}
+
+static int nvtegra_transfer_get_formats(AVHWFramesContext *ctx,
+                                        enum AVHWFrameTransferDirection dir,
+                                        enum AVPixelFormat **formats)
+{
+    enum AVPixelFormat *fmts;
+
+    av_log(ctx, AV_LOG_DEBUG, "Getting transfer formats for NVTEGRA device\n");
+
+    fmts = av_malloc_array(2, sizeof(**formats));
+    if (!fmts)
+        return AVERROR(ENOMEM);
+
+    fmts[0] = ctx->sw_format;
+    fmts[1] = AV_PIX_FMT_NONE;
+
+    *formats = fmts;
+    return 0;
+}
+
+static inline void nvtegra_cpu_copy_plane(void *dst, int dst_stride,
+                                          void *src, int src_stride, int h, bool from)
+{
+    /*
+     * Adapted from https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/.
+     * We process 16x2 bytes at a time. Horizontally, this is the size of a linear atom
+     * in a 16Bx2 sector, conveniently also the size of a cache line and of a macroblock.
+     *
+     * NVDEC always uses a GOB height of 2 (block height of 16, in line with macroblock dimensions).
+     * The corresponding swizzling pattern is the following:
+     *    y3 y2 y1 y0 x5 x4 x3 x2 x1 x0
+     * x: ___x5_______x4____x3 x3 x1 x0
+     * y: y3____y2 y1____y0____________
+     *
+     * Addresses for the 4 lower bits can then be copied as-is (16 bytes).
+     * As a further optimization, the y0 bit is also handled within the same inner loop,
+     * which halves the total number of iterations.
+     *
+     * This function is declared inline with the expectation that the compiler will optimize
+     * the branches depending on the copy direction.
+     */
+
+    __uint128_t *src_ = src, *dst_ = dst, *src_line, *dst_line;
+    uint32_t ws = src_stride / sizeof(__uint128_t), wd = dst_stride / sizeof(__uint128_t),
+        w = FFMIN(ws, wd), offs_x = 0, offs_y = 0, offs_line;
+    uint32_t x_mask = -0x2e, y_mask = 0x2c;
+    int x, y;
+
+    for (y = 0; y < h; y += 2) {
+        dst_line = dst_ + (from ? y * wd : offs_y);
+        src_line = src_ + (from ? offs_y : y * ws);
+
+        offs_line = offs_x;
+        for (x = 0; x < w; ++x) {
+            dst_line[from ? x+0  : offs_line+0] = src_line[from ? offs_line+0 : x+0 ];
+            dst_line[from ? x+wd : offs_line+1] = src_line[from ? offs_line+1 : x+ws];
+            offs_line = (offs_line - x_mask) & x_mask;
+        }
+
+        offs_y = (offs_y - y_mask) & y_mask;
+
+        /* Wrap into next tile row */
+        if (!offs_y)
+            offs_x += from ? src_stride : dst_stride;
+    }
+}
+
+static int nvtegra_cpu_transfer_data(AVHWFramesContext *ctx, const AVFrame *dst, const AVFrame *src,
+                                     int num_planes, bool from)
+{
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(ctx->sw_format);
+    const AVFrame *hwframe, *swframe;
+    AVNVTegraMap *map;
+    int h, i;
+
+    hwframe = from ? src : dst, swframe = from ? dst : src;
+    map = av_nvtegra_frame_get_fbuf_map(hwframe);
+
+    if (swframe->format != ctx->sw_format) {
+        av_log(ctx, AV_LOG_ERROR, "Source and destination must have the same format for cpu transfers\n");
+        return AVERROR(EINVAL);
+    }
+
+    /* If we are transferring from a hardware frame, invalidate the CPU cache which might be stale */
+    if (from) {
+        av_nvtegra_map_cache_op(map, NVMAP_CACHE_OP_INV,
+                                av_nvtegra_map_get_addr(map), av_nvtegra_map_get_size(map));
+    }
+
+    /* Align the height to an even size */
+    h = FFALIGN(dst->height, 2);
+
+    for (i = 0; i < num_planes; ++i) {
+        if (map->is_linear) {
+            av_image_copy_plane(dst->data[i], dst->linesize[i], src->data[i], src->linesize[i],
+                                FFMIN(dst->linesize[i], src->linesize[i]),
+                                h >> (i ? desc->log2_chroma_h : 0));
+        } else {
+            /*
+             * Instanciate the same inlined function for both destinations,
+             * giving the compiler the opportunity to remove branching within the copy loops.
+             * (verified by decompilation at -O1 and higher for both gcc and clang)
+             */
+            if (from)
+                nvtegra_cpu_copy_plane(dst->data[i], dst->linesize[i], src->data[i], src->linesize[i],
+                                       h >> (i ? desc->log2_chroma_h : 0), true);
+            else
+                nvtegra_cpu_copy_plane(dst->data[i], dst->linesize[i], src->data[i], src->linesize[i],
+                                       h >> (i ? desc->log2_chroma_h : 0), false);
+        }
+    }
+
+    /* If we transferred to a hardware frame, flush the CPU cache to make the data visible to hardware engines */
+    if (!from) {
+        av_nvtegra_map_cache_op(map, NVMAP_CACHE_OP_WB,
+                                av_nvtegra_map_get_addr(map), av_nvtegra_map_get_size(map));
+    }
+
+    return 0;
+}
+
+static void nvtegra_vic_preprare_config(VicConfigStruct *config, const AVFrame *src, const AVFrame *dst,
+                                        enum AVPixelFormat fmt, bool is_16b_chroma)
+{
+    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
+    bool input_linear = (src->format != AV_PIX_FMT_NVTEGRA) || av_nvtegra_frame_get_fbuf_map(src)->is_linear,
+        output_linear = (dst->format != AV_PIX_FMT_NVTEGRA) || av_nvtegra_frame_get_fbuf_map(dst)->is_linear;
+
+    /*
+     * The VIC engine has an undocumented limitation regarding height alignment,
+     * which should be padded to an even size.
+     */
+
+    /* Subsampled dimensions when emulating 16-bit chroma transfers, as input is always NV12 */
+    int divider   = !is_16b_chroma ? 1 : 2;
+    int src_width = src->width / divider, src_height = FFALIGN(src->height, 2) / divider,
+        dst_width = dst->width / divider, dst_height = FFALIGN(dst->height, 2) / divider;
+
+    *config = (VicConfigStruct){
+        .pipeConfig = {
+            .DownsampleHoriz            = 1 << 2, /* U9.2 */
+            .DownsampleVert             = 1 << 2, /* U9.2 */
+        },
+        .outputConfig = {
+            .AlphaFillMode              = !is_16b_chroma ? NVB0B6_DXVAHD_ALPHA_FILL_MODE_OPAQUE :
+                                                           NVB0B6_DXVAHD_ALPHA_FILL_MODE_SOURCE_STREAM,
+            .BackgroundAlpha            = 0,
+            .BackgroundR                = 0,
+            .BackgroundG                = 0,
+            .BackgroundB                = 0,
+            .TargetRectLeft             = 0,
+            .TargetRectRight            = dst_width  - 1,
+            .TargetRectTop              = 0,
+            .TargetRectBottom           = dst_height - 1,
+        },
+        .outputSurfaceConfig = {
+            .OutPixelFormat             = av_nvtegra_pixfmt_to_vic(fmt),
+            .OutSurfaceWidth            = dst_width  - 1,
+            .OutSurfaceHeight           = dst_height - 1,
+            .OutBlkKind                 = !output_linear ? NVB0B6_BLK_KIND_GENERIC_16Bx2 : NVB0B6_BLK_KIND_PITCH,
+            .OutBlkHeight               = !output_linear ? 1 : 0, /* GOB height 2 */
+            .OutLumaWidth               = (dst->linesize[0] / desc->comp[0].step) - 1,
+            .OutLumaHeight              = FFALIGN(dst_height, !output_linear ? 32 : 2) - 1,
+            .OutChromaWidth             = (desc->flags & AV_PIX_FMT_FLAG_RGB) ?
+                                          -1 : (dst->linesize[1] / desc->comp[1].step) - 1,
+            .OutChromaHeight            = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? -1 :
+                                          (FFALIGN(dst_height, !output_linear ? 32 : 2) >> desc->log2_chroma_h) - 1,
+        },
+        .slotStruct = {
+            {
+                .slotConfig = {
+                    .SlotEnable         = 1,
+                    .CurrentFieldEnable = 1,
+                    .SoftClampLow       = 0,
+                    .SoftClampHigh      = 1023,
+                    .PlanarAlpha        = 1023,
+                    .ConstantAlpha      = 1,
+                    .SourceRectLeft     = 0,
+                    .SourceRectRight    = (src_width  - 1) << 16, /* U14.16 (for subpixel positioning) */
+                    .SourceRectTop      = 0,
+                    .SourceRectBottom   = (src_height - 1) << 16,
+                    .DestRectLeft       = 0,
+                    .DestRectRight      = src_width  - 1,
+                    .DestRectTop        = 0,
+                    .DestRectBottom     = src_height - 1,
+                },
+                .slotSurfaceConfig = {
+                    .SlotPixelFormat    = av_nvtegra_pixfmt_to_vic(fmt),
+                    .SlotChromaLocHoriz = ((desc->flags & AV_PIX_FMT_FLAG_RGB)          ||
+                                           src->chroma_location == AVCHROMA_LOC_TOPLEFT ||
+                                           src->chroma_location == AVCHROMA_LOC_LEFT    ||
+                                           src->chroma_location == AVCHROMA_LOC_BOTTOMLEFT) ? 0 : 1,
+                    .SlotChromaLocVert  = ((desc->flags & AV_PIX_FMT_FLAG_RGB)          ||
+                                           src->chroma_location == AVCHROMA_LOC_TOPLEFT ||
+                                           src->chroma_location == AVCHROMA_LOC_TOP) ? 0 :
+                                          (src->chroma_location == AVCHROMA_LOC_LEFT ||
+                                           src->chroma_location == AVCHROMA_LOC_CENTER) ? 1 : 2,
+                    .SlotBlkKind        = !input_linear ? NVB0B6_BLK_KIND_GENERIC_16Bx2 : NVB0B6_BLK_KIND_PITCH,
+                    .SlotBlkHeight      = !input_linear ? 1 : 0, /* GOB height 2 */
+                    .SlotCacheWidth     = !input_linear ? 1 : 3, /* 32Bx8 for block, 128Bx2 for pitch */
+                    .SlotSurfaceWidth   = src_width  - 1,
+                    .SlotSurfaceHeight  = src_height - 1,
+                    .SlotLumaWidth      = (src->linesize[0] / desc->comp[0].step) - 1,
+                    .SlotLumaHeight     = FFALIGN(src_height, !input_linear ? 32 : 2) - 1,
+                    .SlotChromaWidth    = (desc->flags & AV_PIX_FMT_FLAG_RGB) ?
+                                          -1 : (src->linesize[1] / desc->comp[1].step) - 1,
+                    .SlotChromaHeight   = (desc->flags & AV_PIX_FMT_FLAG_RGB) ? -1 :
+                                          (FFALIGN(src_height, !input_linear ? 32 : 2) >> desc->log2_chroma_h) - 1,
+                },
+            },
+        },
+    };
+}
+
+static int nvtegra_vic_prepare_cmdbuf(AVHWFramesContext *ctx, AVNVTegraJobPool *pool, AVNVTegraJob *job,
+                                      const AVFrame *src, const AVFrame *dst, enum AVPixelFormat fmt,
+                                      AVNVTegraMap **plane_maps, uint32_t *plane_offsets, int num_planes)
+{
+    NVTegraDevicePriv *priv = ctx->device_ctx->hwctx;
+    AVNVTegraCmdbuf *cmdbuf = &job->cmdbuf;
+
+    AVNVTegraMap *src_maps[4], *dst_maps[4];
+    uint32_t src_map_offsets[4], dst_map_offsets[4];
+    int src_reloc_type, dst_reloc_type, i, err;
+
+#define RELOC_VARS(frame) ({                                                             \
+    if (frame->format == AV_PIX_FMT_NVTEGRA) {                                           \
+        for (i = 0; i < FF_ARRAY_ELEMS(AV_JOIN(frame, _map_offsets)); ++i) {             \
+            AV_JOIN(frame, _maps       )[i] = av_nvtegra_frame_get_fbuf_map(frame);      \
+            AV_JOIN(frame, _map_offsets)[i] = frame->data[i] - frame->data[0];           \
+        }                                                                                \
+        AV_JOIN(frame, _reloc_type) = !av_nvtegra_frame_get_fbuf_map(frame)->is_linear ? \
+            NVHOST_RELOC_TYPE_BLOCK_LINEAR : NVHOST_RELOC_TYPE_PITCH_LINEAR;             \
+    } else {                                                                             \
+        for (i = 0; i < FF_ARRAY_ELEMS(AV_JOIN(frame, _map_offsets)); ++i) {             \
+            AV_JOIN(frame, _maps       )[i] = plane_maps   [i];                          \
+            AV_JOIN(frame, _map_offsets)[i] = plane_offsets[i];                          \
+        }                                                                                \
+        AV_JOIN(frame, _reloc_type) = NVHOST_RELOC_TYPE_PITCH_LINEAR;                    \
+    }                                                                                    \
+})
+
+    RELOC_VARS(src);
+    RELOC_VARS(dst);
+
+    err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_VIC);
+    if (err < 0)
+        return err;
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS,
+                          AV_NVTEGRA_VALUE(NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS, CONFIG_STRUCT_SIZE, sizeof(VicConfigStruct) >> 4) |
+                          AV_NVTEGRA_VALUE(NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS, GPTIMER_ON,         1)                            |
+                          AV_NVTEGRA_VALUE(NVB0B6_VIDEO_COMPOSITOR_SET_CONTROL_PARAMS, FALCON_CONTROL,     1));
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_CONFIG_STRUCT_OFFSET,
+                          &job->input_map, priv->vic_setup_off, NVHOST_RELOC_TYPE_DEFAULT);
+
+    switch (fmt) {
+        /* 16-bit transfer emulation */
+        case AV_PIX_FMT_RGB565:
+            /* Luma transfer */
+            AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_LUMA_OFFSET(0),
+                                  src_maps[0], src_map_offsets[0], src_reloc_type);
+            AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_LUMA_OFFSET,
+                                  dst_maps[0], dst_map_offsets[0], dst_reloc_type);
+            break;
+        case AV_PIX_FMT_RGB32:
+            /* Chroma transfer */
+            AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_LUMA_OFFSET(0),
+                                  src_maps[1], src_map_offsets[1], src_reloc_type);
+            AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_LUMA_OFFSET,
+                                  dst_maps[1], dst_map_offsets[1], dst_reloc_type);
+            break;
+
+        /* Normal transfers */
+        case AV_PIX_FMT_GRAY8:
+        case AV_PIX_FMT_NV12:
+        case AV_PIX_FMT_YUV420P:
+            for (i = 0; i < num_planes; ++i) {
+                AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_SURFACE0_LUMA_OFFSET(0)    + i * sizeof(uint32_t),
+                                      src_maps[i], src_map_offsets[i], src_reloc_type);
+                AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_SET_OUTPUT_SURFACE_LUMA_OFFSET + i * sizeof(uint32_t),
+                                      dst_maps[i], dst_map_offsets[i], dst_reloc_type);
+            }
+            break;
+        default:
+            return AVERROR(EINVAL);
+    }
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVB0B6_VIDEO_COMPOSITOR_EXECUTE,
+                          AV_NVTEGRA_ENUM(NVB0B6_VIDEO_COMPOSITOR_EXECUTE, AWAKEN, ENABLE));
+
+    err = av_nvtegra_cmdbuf_add_syncpt_incr(cmdbuf, pool->channel->syncpt, 0);
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_cmdbuf_end(cmdbuf);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_vic_copy_plane(AVHWFramesContext *ctx, AVNVTegraJob *job,
+                                  const AVFrame *src, const AVFrame *dst,
+                                  enum AVPixelFormat fmt, AVNVTegraMap **plane_maps, uint32_t *plane_offsets,
+                                  int num_planes, bool is_chroma)
+{
+    NVTegraDevicePriv *priv = ctx->device_ctx->hwctx;
+
+    uint8_t *mem;
+    int err;
+
+    mem = av_nvtegra_map_get_addr(&job->input_map);
+
+    nvtegra_vic_preprare_config((VicConfigStruct *)(mem + priv->vic_setup_off),
+                                src, dst, fmt, is_chroma);
+
+    err = av_nvtegra_cmdbuf_clear(&job->cmdbuf);
+    if (err < 0)
+        return err;
+
+    err = nvtegra_vic_prepare_cmdbuf(ctx, &priv->job_pool, job, src, dst, fmt,
+                                     plane_maps, plane_offsets, num_planes);
+    if (err < 0)
+        goto fail;
+
+    err = av_nvtegra_job_submit(&priv->job_pool, job);
+    if (err < 0)
+        goto fail;
+
+    err = av_nvtegra_job_wait(&priv->job_pool, job, -1);
+    if (err < 0)
+        goto fail;
+
+fail:
+    return err;
+}
+
+static int nvtegra_vic_transfer_data(AVHWFramesContext *ctx, const AVFrame *dst, const AVFrame *src,
+                                     int num_planes, bool from)
+{
+    NVTegraDevicePriv       *priv = ctx->device_ctx->hwctx;
+    AVNVTegraDeviceContext *hwctx = &priv->p;
+
+    AVBufferRef *job_ref;
+    AVNVTegraJob *job;
+    const AVFrame *swframe;
+    uint8_t *map_bases[4];
+    AVNVTegraMap maps[4] = {0};
+    AVNVTegraMap *plane_maps[4];
+    uint32_t plane_offsets[4];
+    int num_maps, i, j, err;
+
+    swframe = from ? dst : src;
+
+    job_ref = av_nvtegra_job_pool_get(&priv->job_pool);
+    if (!job_ref) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    job = (AVNVTegraJob *)job_ref->data;
+
+    /* Create a map for each frame backing buffer */
+    for (i = 0; i < FF_ARRAY_ELEMS(maps); num_maps = ++i) {
+        if (!swframe->buf[i])
+            break;
+
+        /*
+         * In order to avoid a full-frame copy on the CPU, the provided memory
+         * is mapped into VIC and used directly during the transfer.
+         * The address and size are aligned to page boundaries.
+         * Cache management is performed manually to not affect data outside the buffer.
+         */
+        map_bases[i] = (uint8_t *)((uintptr_t)swframe->buf[i]->data & ~0xfff);
+        err = av_nvtegra_map_from_va(&maps[i], &hwctx->vic_channel, map_bases[i],
+                                     swframe->buf[i]->size + ((uintptr_t)swframe->buf[i]->data & 0xfff),
+                                     0x100, NVMAP_HANDLE_CACHEABLE);
+        if (err < 0)
+            goto fail;
+
+        err = av_nvtegra_map_map(&maps[i]);
+        if (err < 0)
+            goto fail;
+
+        /* Flush-invalidate the CPU cache prior to the transfer */
+        av_nvtegra_map_cache_op(&maps[i], NVMAP_CACHE_OP_WB_INV,
+                                ((uint8_t *)av_nvtegra_map_get_addr(&maps[i])) +
+                                    ((uintptr_t)swframe->buf[i]->data & 0xfff),
+                                swframe->buf[i]->size);
+    }
+
+    /* Find the corresponding map object and its offset for each plane  */
+    for (i = 0; i < num_planes; ++i) {
+        for (j = 0; j < FF_ARRAY_ELEMS(swframe->buf); ++j) {
+            if ((swframe->buf[j]->data <= swframe->data[i]) &&
+                    (swframe->data[i] < swframe->buf[j]->data + swframe->buf[j]->size))
+                break;
+        }
+
+        plane_maps   [i] = &maps[j];
+        plane_offsets[i] = swframe->data[i] - map_bases[j];
+    }
+
+    /* VIC expects planes in the reversed order */
+    if (swframe->format == AV_PIX_FMT_YUV420P) {
+        FFSWAP(AVNVTegraMap *, plane_maps   [1], plane_maps   [2]);
+        FFSWAP(uint32_t,       plane_offsets[1], plane_offsets[2]);
+    }
+
+    /*
+     * VIC2 does not support 16-bit YUV surfaces (eg. P010, P012, ...).
+     * Here we emulate them using two separates transfers for the luma and chroma planes
+     * (16-bit and 32-bit widths respectively).
+     */
+    if (swframe->format == AV_PIX_FMT_P010) {
+        err = nvtegra_vic_copy_plane(ctx, job, src, dst, AV_PIX_FMT_RGB565,
+                                     plane_maps, plane_offsets, 1, false);
+        if (err < 0)
+            goto fail;
+
+        err = nvtegra_vic_copy_plane(ctx, job, src, dst, AV_PIX_FMT_RGB32,
+                                     plane_maps, plane_offsets, 1, true);
+        if (err < 0)
+            goto fail;
+    } else {
+        err = nvtegra_vic_copy_plane(ctx, job, src, dst, swframe->format,
+                                     plane_maps, plane_offsets, num_planes, false);
+        if (err < 0)
+            goto fail;
+    }
+
+fail:
+    for (i = 0; i < num_maps; ++i) {
+        av_nvtegra_map_unmap(&maps[i]);
+        av_nvtegra_map_close(&maps[i]);
+    }
+
+    av_buffer_unref(&job_ref);
+
+    return err;
+}
+
+static int nvtegra_transfer_data(AVHWFramesContext *ctx, AVFrame *dst, const AVFrame *src) {
+    const AVFrame *swframe;
+    bool from;
+    int num_planes, i;
+
+    from    = !dst->hw_frames_ctx;
+    swframe = from ? dst : src;
+
+    if (swframe->hw_frames_ctx)
+        return AVERROR(ENOSYS);
+
+    num_planes = av_pix_fmt_count_planes(swframe->format);
+
+    for (i = 0; i < num_planes; ++i) {
+        if (((uintptr_t)swframe->data[i] & 0xff) || (swframe->linesize[i] & 0xff)) {
+            av_log(ctx, AV_LOG_WARNING, "Frame address/pitch not aligned to 256, "
+                                        "falling back to cpu transfer\n");
+            return nvtegra_cpu_transfer_data(ctx, dst, src, num_planes, from);
+        }
+    }
+
+    return nvtegra_vic_transfer_data(ctx, dst, src, num_planes, from);
+}
+
+const HWContextType ff_hwcontext_type_nvtegra = {
+    .type                   = AV_HWDEVICE_TYPE_NVTEGRA,
+    .name                   = "nvtegra",
+
+    .device_hwctx_size      = sizeof(NVTegraDevicePriv),
+    .device_hwconfig_size   = 0,
+    .frames_hwctx_size      = 0,
+
+    .device_create          = &nvtegra_device_create,
+    .device_init            = &nvtegra_device_init,
+    .device_uninit          = &nvtegra_device_uninit,
+
+    .frames_get_constraints = &nvtegra_frames_get_constraints,
+    .frames_init            = &nvtegra_frames_init,
+    .frames_uninit          = &nvtegra_frames_uninit,
+    .frames_get_buffer      = &nvtegra_get_buffer,
+
+    .transfer_get_formats   = &nvtegra_transfer_get_formats,
+    .transfer_data_to       = &nvtegra_transfer_data,
+    .transfer_data_from     = &nvtegra_transfer_data,
+
+    .pix_fmts = (const enum AVPixelFormat[]) {
+        AV_PIX_FMT_NVTEGRA,
+        AV_PIX_FMT_NONE,
+    },
+};
diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h
new file mode 100644
index 0000000000..8a2383d304
--- /dev/null
+++ b/libavutil/hwcontext_nvtegra.h
@@ -0,0 +1,85 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef AVUTIL_HWCONTEXT_NVTEGRA_H
+#define AVUTIL_HWCONTEXT_NVTEGRA_H
+
+#include <stdint.h>
+
+#include "hwcontext.h"
+#include "buffer.h"
+#include "frame.h"
+#include "pixfmt.h"
+
+#include "nvtegra.h"
+
+/*
+ * Encode a hardware revision into a version number
+ */
+#define AV_NVTEGRA_ENCODE_REV(maj, min) (((maj & 0xff) << 8) | (min & 0xff))
+
+/*
+ * Decode a version number
+ */
+static inline void av_nvtegra_decode_rev(int rev, int *maj, int *min) {
+    *maj = (rev >> 8) & 0xff;
+    *min = (rev >> 0) & 0xff;
+}
+
+/**
+ * @file
+ * API-specific header for AV_HWDEVICE_TYPE_NVTEGRA.
+ *
+ * For user-allocated pools, AVHWFramesContext.pool must return AVBufferRefs
+ * with the data pointer set to an AVNVTegraMap.
+ */
+
+typedef struct AVNVTegraDeviceContext {
+    /*
+     * Hardware multimedia engines
+     */
+    AVNVTegraChannel nvdec_channel, nvenc_channel, nvjpg_channel, vic_channel;
+
+    /*
+     * Hardware revisions for associated engines, or 0 if invalid
+     */
+    int nvdec_version, nvenc_version, nvjpg_version, vic_version;
+} AVNVTegraDeviceContext;
+
+typedef struct AVNVTegraFrame {
+    /*
+     * Reference to an AVNVTegraMap object
+     */
+    AVBufferRef *map_ref;
+} AVNVTegraFrame;
+
+/*
+ * Helper to retrieve a map object from the corresponding frame
+ */
+static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame) {
+    return (AVNVTegraMap *)((AVNVTegraFrame *)frame->buf[0]->data)->map_ref->data;
+}
+
+/*
+ * Converts a pixel format to the equivalent code for the VIC engine
+ */
+int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt);
+
+#endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 07/16] hwcontext_nvtegra: add dynamic frequency scaling routines
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (5 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext averne
@ 2024-05-30 19:43 ` averne
  2024-06-05 20:50   ` Mark Thompson
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 08/16] nvtegra: add common hardware decoding code averne
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

To save on energy, the clock speed of multimedia engines should be adapted to their workload.

Signed-off-by: averne <averne381@gmail.com>
---
 libavutil/hwcontext_nvtegra.c | 165 ++++++++++++++++++++++++++++++++++
 libavutil/hwcontext_nvtegra.h |   7 ++
 2 files changed, 172 insertions(+)

diff --git a/libavutil/hwcontext_nvtegra.c b/libavutil/hwcontext_nvtegra.c
index 0f4d5a323b..6b72348082 100644
--- a/libavutil/hwcontext_nvtegra.c
+++ b/libavutil/hwcontext_nvtegra.c
@@ -46,6 +46,14 @@ typedef struct NVTegraDevicePriv {
 
     AVNVTegraJobPool job_pool;
     uint32_t vic_setup_off, vic_cmdbuf_off;
+
+    double framerate;
+    uint32_t dfs_lowcorner;
+    double dfs_decode_cycles_ema;
+    double dfs_ema_damping;
+    int dfs_bitrate_sum;
+    int dfs_cur_sample, dfs_num_samples;
+    int64_t dfs_sampling_start_ts, dfs_last_ts_delta;
 } NVTegraDevicePriv;
 
 static const enum AVPixelFormat supported_sw_formats[] = {
@@ -108,6 +116,28 @@ static inline uint32_t nvtegra_surface_get_height_align(enum AVPixelFormat fmt,
     return 32;
 }
 
+static int nvtegra_channel_set_freq(AVNVTegraChannel *channel, uint32_t freq) {
+    int err;
+#ifndef __SWITCH__
+    err = av_nvtegra_channel_set_clock_rate(channel, channel->module_id, freq);
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_channel_get_clock_rate(channel, channel->module_id, &channel->clock);
+    if (err < 0)
+        return err;
+#else
+    err = AVERROR(mmuRequestSetAndWait(&channel->mmu_request, freq, -1));
+    if (err < 0)
+        return err;
+
+    err = AVERROR(mmuRequestGet(&channel->mmu_request, &channel->clock));
+    if (err < 0)
+        return err;
+#endif
+    return 0;
+}
+
 static void nvtegra_device_uninit(AVHWDeviceContext *ctx) {
     NVTegraDevicePriv       *priv = ctx->hwctx;
     AVNVTegraDeviceContext *hwctx = &priv->p;
@@ -386,6 +416,141 @@ static int nvtegra_get_buffer(AVHWFramesContext *ctx, AVFrame *frame) {
     return 0;
 }
 
+/*
+ * Possible frequencies on Icosa and Mariko+, in MHz
+ * (see tegra210-core-dvfs.c and tegra210b01-core-dvfs.c in l4t kernel sources, respectively):
+ * for NVDEC:
+ *   268.8, 384.0, 448.0, 486.4, 550.4, 576.0, 614.4, 652.8, 678.4, 691.2, 716.8
+ *   460.8, 499.2, 556.8, 633.6, 652.8, 710.4, 748.8, 787.2, 825.6, 844.8, 883.2, 902.4, 921.6, 940.8, 960.0, 979.2
+ * for NVJPG:
+ *   192.0, 307.2, 345.6, 409.6, 486.4, 524.8, 550.4, 576.0, 588.8, 614.4, 627.2
+ *   422.4, 441.6, 499.2, 518.4, 537.6, 556.8, 576.0, 595.2, 614.4, 633.6, 652.8
+ */
+
+int av_nvtegra_dfs_init(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int width, int height,
+                        double framerate_hz)
+{
+    NVTegraDevicePriv *priv = ctx->hwctx;
+
+    uint32_t max_freq, lowcorner;
+    int num_mbs, err;
+
+    priv->dfs_num_samples = 20;
+    priv->dfs_ema_damping = 0.1;
+
+    /*
+     * Initialize low-corner frequency (reproduces official code)
+     * Framerate might be unavailable (or variable), but this is official logic
+     */
+    num_mbs = width / 16 * height / 16;
+    if (num_mbs <= 3600)
+        lowcorner = 100000000;  /* 480p */
+    else if (num_mbs <= 8160)
+        lowcorner = 180000000;  /* 720p */
+    else if (num_mbs <= 32400)
+        lowcorner = 345000000;  /* 1080p */
+    else
+        lowcorner = 576000000;  /* 4k */
+
+    if (framerate_hz >= 0.1 && isfinite(framerate_hz))
+        lowcorner = FFMIN(lowcorner, lowcorner * framerate_hz / 30.0);
+
+    priv->framerate     = framerate_hz;
+    priv->dfs_lowcorner = lowcorner;
+
+    av_log(ctx, AV_LOG_DEBUG, "DFS: Initializing lowcorner to %d Hz, using %u samples\n",
+           priv->dfs_lowcorner, priv->dfs_num_samples);
+
+    /*
+     * Initialize channel to the max possible frequency (the kernel driver will clamp to an allowed value)
+     * Note: Official code passes INT_MAX kHz then multiplies by 1000 (to Hz) and converts to u32,
+     * resulting in this value.
+     */
+    max_freq = (UINT64_C(1)<<32) - 1000 & UINT32_MAX;
+
+    err = nvtegra_channel_set_freq(channel, max_freq);
+    if (err < 0)
+        return err;
+
+    priv->dfs_decode_cycles_ema = 0.0;
+    priv->dfs_bitrate_sum       = 0;
+    priv->dfs_cur_sample        = 0;
+    priv->dfs_sampling_start_ts = av_gettime_relative();
+    priv->dfs_last_ts_delta     = 0;
+
+    return 0;
+}
+
+int av_nvtegra_dfs_update(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int bitstream_len, int decode_cycles) {
+    NVTegraDevicePriv *priv = ctx->hwctx;
+
+    double frame_time, avg;
+    int64_t now, wl_dt;
+    uint32_t clock;
+    int err;
+
+    /*
+     * Official software implements DFS using a flat average of the decoder pool occupancy.
+     * We instead use the decode cycles as reported by NVDEC microcode, and the "bitrate"
+     * (bitstream bits fed to the hardware in a given clock time interval, NOT video time),
+     * to calculate a suitable frequency, and multiply it by 1.2 for good measure:
+     *   Freq = decode_cycles_per_bit * bits_per_second * 1.2
+     */
+
+    /* Convert to bits */
+    bitstream_len *= 8;
+
+    /* Exponential moving average of decode cycles per frame */
+    priv->dfs_decode_cycles_ema = priv->dfs_ema_damping * (double)decode_cycles/bitstream_len +
+        (1.0 - priv->dfs_ema_damping) * priv->dfs_decode_cycles_ema;
+
+    priv->dfs_bitrate_sum += bitstream_len;
+    priv->dfs_cur_sample   = (priv->dfs_cur_sample + 1) % priv->dfs_num_samples;
+
+    err = 0;
+
+    /* Reclock if we collected enough samples */
+    if (priv->dfs_cur_sample == 0) {
+        now   = av_gettime_relative();
+        wl_dt = now - priv->dfs_sampling_start_ts;
+
+        /*
+         * Try to filter bad sample sets caused by eg. pausing the video playback.
+         * We reject if one of these conditions is met:
+         * - the wall time is over 1.5x the framerate (10Hz is used as fallback if no framerate information is available)
+         * - the wall time is over 1.5x the ema-damped previous values
+         */
+
+        if (priv->framerate >= 0.1 && isfinite(priv->framerate))
+            frame_time = 1.0e6 / priv->framerate;
+        else
+            frame_time = 0.1e6;
+
+        if ((wl_dt < 1.5 * priv->dfs_num_samples * frame_time) ||
+                ((priv->dfs_last_ts_delta) && (wl_dt < 1.5 * priv->dfs_last_ts_delta))) {
+            avg   = priv->dfs_bitrate_sum * 1e6 / wl_dt;
+            clock = priv->dfs_decode_cycles_ema * avg * 1.2;
+            clock = FFMAX(clock, priv->dfs_lowcorner);
+
+            av_log(ctx, AV_LOG_DEBUG, "DFS: %.0f cycles/b (ema), %.0f b/s -> clock %u Hz (lowcorner %u Hz)\n",
+                priv->dfs_decode_cycles_ema, avg, clock, priv->dfs_lowcorner);
+
+            err = nvtegra_channel_set_freq(channel, clock);
+
+            priv->dfs_last_ts_delta = wl_dt;
+        }
+
+        priv->dfs_bitrate_sum       = 0;
+        priv->dfs_sampling_start_ts = now;
+    }
+
+    return err;
+}
+
+int av_nvtegra_dfs_uninit(AVHWDeviceContext *ctx, AVNVTegraChannel *channel) {
+    return nvtegra_channel_set_freq(channel, 0);
+}
+
 static int nvtegra_transfer_get_formats(AVHWFramesContext *ctx,
                                         enum AVHWFrameTransferDirection dir,
                                         enum AVPixelFormat **formats)
diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h
index 8a2383d304..7c845951d9 100644
--- a/libavutil/hwcontext_nvtegra.h
+++ b/libavutil/hwcontext_nvtegra.h
@@ -82,4 +82,11 @@ static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame)
  */
 int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt);
 
+/*
+ * Dynamic frequency scaling routines
+ */
+int av_nvtegra_dfs_init(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int width, int height, double framerate_hz);
+int av_nvtegra_dfs_update(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int bitstream_len, int decode_cycles);
+int av_nvtegra_dfs_uninit(AVHWDeviceContext *ctx, AVNVTegraChannel *channel);
+
 #endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 08/16] nvtegra: add common hardware decoding code
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (6 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 07/16] hwcontext_nvtegra: add dynamic frequency scaling routines averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 09/16] nvtegra: add mpeg1/2 hardware decoding averne
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

This includes decode common de/initialization code, decode-job management, and constraint checks.

Signed-off-by: averne <averne381@gmail.com>
---
 configure                   |   1 +
 libavcodec/Makefile         |   2 +
 libavcodec/hwconfig.h       |   2 +
 libavcodec/nvtegra_decode.c | 517 ++++++++++++++++++++++++++++++++++++
 libavcodec/nvtegra_decode.h |  94 +++++++
 5 files changed, 616 insertions(+)
 create mode 100644 libavcodec/nvtegra_decode.c
 create mode 100644 libavcodec/nvtegra_decode.h

diff --git a/configure b/configure
index 51f169bfbd..566bb37b8c 100755
--- a/configure
+++ b/configure
@@ -2022,6 +2022,7 @@ HWACCEL_LIBRARY_LIST="
     mmal
     omx
     opencl
+    nvtegra
 "
 
 DOCUMENT_LIST="
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 2443d2c6fd..f1e2dc6625 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -993,6 +993,7 @@ OBJS-$(CONFIG_VAAPI)                      += vaapi_decode.o
 OBJS-$(CONFIG_VIDEOTOOLBOX)               += videotoolbox.o
 OBJS-$(CONFIG_VDPAU)                      += vdpau.o
 OBJS-$(CONFIG_VULKAN)                     += vulkan.o vulkan_video.o
+OBJS-$(CONFIG_NVTEGRA)                    += nvtegra_decode.o
 
 OBJS-$(CONFIG_AV1_D3D11VA_HWACCEL)        += dxva2_av1.o
 OBJS-$(CONFIG_AV1_DXVA2_HWACCEL)          += dxva2_av1.o
@@ -1285,6 +1286,7 @@ SKIPHEADERS-$(CONFIG_VIDEOTOOLBOX)     += videotoolbox.h vt_internal.h
 SKIPHEADERS-$(CONFIG_VULKAN)           += vulkan.h vulkan_video.h vulkan_decode.h
 SKIPHEADERS-$(CONFIG_V4L2_M2M)         += v4l2_buffers.h v4l2_context.h v4l2_m2m.h
 SKIPHEADERS-$(CONFIG_ZLIB)             += zlib_wrapper.h
+SKIPHEADERS-$(CONFIG_NVTEGRA)          += nvtegra_decode.h
 
 TESTPROGS = avcodec                                                     \
             avpacket                                                    \
diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h
index ee29ca631d..a3c3402c77 100644
--- a/libavcodec/hwconfig.h
+++ b/libavcodec/hwconfig.h
@@ -79,6 +79,8 @@ void ff_hwaccel_uninit(AVCodecContext *avctx);
     HW_CONFIG_HWACCEL(0, 0, 1, D3D11VA_VLD,  NONE,         ff_ ## codec ## _d3d11va_hwaccel)
 #define HWACCEL_D3D12VA(codec) \
     HW_CONFIG_HWACCEL(1, 1, 0, D3D12,        D3D12VA,      ff_ ## codec ## _d3d12va_hwaccel)
+#define HWACCEL_NVTEGRA(codec) \
+    HW_CONFIG_HWACCEL(1, 1, 0, NVTEGRA,      NVTEGRA,      ff_ ## codec ## _nvtegra_hwaccel)
 
 #define HW_CONFIG_ENCODER(device, frames, ad_hoc, format, device_type_) \
     &(const AVCodecHWConfigInternal) { \
diff --git a/libavcodec/nvtegra_decode.c b/libavcodec/nvtegra_decode.c
new file mode 100644
index 0000000000..1978fcf644
--- /dev/null
+++ b/libavcodec/nvtegra_decode.c
@@ -0,0 +1,517 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "libavutil/hwcontext.h"
+#include "libavutil/hwcontext_nvtegra.h"
+#include "libavutil/nvtegra_host1x.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/pixfmt.h"
+#include "libavutil/intreadwrite.h"
+
+#include "avcodec.h"
+#include "codec_desc.h"
+#include "internal.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+static void nvtegra_input_map_free(void *opaque, uint8_t *data) {
+    AVNVTegraMap *map = (AVNVTegraMap *)data;
+
+    if (!data)
+        return;
+
+    av_nvtegra_map_destroy(map);
+
+    av_freep(&map);
+}
+
+static AVBufferRef *nvtegra_input_map_alloc(void *opaque, size_t size) {
+    FFNVTegraDecodeContext *ctx = opaque;
+
+    AVBufferRef  *buffer;
+    AVNVTegraMap *map;
+    int err;
+
+    map = av_mallocz(sizeof(*map));
+    if (!map)
+        return NULL;
+
+    err = av_nvtegra_map_create(map, ctx->channel, ctx->input_map_size, 0x100,
+                                NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
+    if (err < 0)
+        return NULL;
+
+    buffer = av_buffer_create((uint8_t *)map, sizeof(*map), nvtegra_input_map_free, ctx, 0);
+    if (!buffer)
+        goto fail;
+
+    ctx->new_input_buffer = true;
+
+    return buffer;
+
+fail:
+    av_log(ctx, AV_LOG_ERROR, "Failed to create buffer\n");
+    av_nvtegra_map_destroy(map);
+    av_freep(map);
+    return NULL;
+}
+
+int ff_nvtegra_decode_init(AVCodecContext *avctx, FFNVTegraDecodeContext *ctx) {
+    AVHWFramesContext      *frames_ctx;
+    AVHWDeviceContext      *hw_device_ctx;
+    AVNVTegraDeviceContext *device_hwctx;
+
+    int err;
+
+    err = ff_decode_get_hw_frames_ctx(avctx, AV_HWDEVICE_TYPE_NVTEGRA);
+    if (err < 0)
+        goto fail;
+
+    frames_ctx    = (AVHWFramesContext *)avctx->hw_frames_ctx->data;
+    hw_device_ctx = (AVHWDeviceContext *)frames_ctx->device_ref->data;
+    device_hwctx  = hw_device_ctx->hwctx;
+
+    if ((!ctx->is_nvjpg && !device_hwctx->nvdec_version) || (ctx->is_nvjpg && !device_hwctx->nvjpg_version))
+        return AVERROR(EACCES);
+
+    ctx->hw_device_ref = av_buffer_ref(frames_ctx->device_ref);
+    if (!ctx->hw_device_ref) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    ctx->decoder_pool = av_buffer_pool_init2(sizeof(AVNVTegraMap), ctx,
+                                             nvtegra_input_map_alloc, NULL);
+    if (!ctx->decoder_pool) {
+        err = AVERROR(ENOMEM);
+        goto fail;
+    }
+
+    ctx->channel = !ctx->is_nvjpg ? &device_hwctx->nvdec_channel : &device_hwctx->nvjpg_channel;
+
+    err = av_nvtegra_cmdbuf_init(&ctx->cmdbuf);
+    if (err < 0)
+        goto fail;
+
+    err = av_nvtegra_dfs_init(hw_device_ctx, ctx->channel, avctx->coded_width, avctx->coded_height,
+                              av_q2d(avctx->framerate));
+    if (err < 0)
+        goto fail;
+
+    return 0;
+
+fail:
+    ff_nvtegra_decode_uninit(avctx, ctx);
+    return err;
+}
+
+int ff_nvtegra_decode_uninit(AVCodecContext *avctx, FFNVTegraDecodeContext *ctx) {
+    AVHWFramesContext *frames_ctx;
+    AVHWDeviceContext *hw_device_ctx;
+
+    av_buffer_pool_uninit(&ctx->decoder_pool);
+
+    av_buffer_unref(&ctx->hw_device_ref);
+
+    av_nvtegra_cmdbuf_deinit(&ctx->cmdbuf);
+
+    if (avctx->hw_frames_ctx) {
+        frames_ctx    = (AVHWFramesContext *)avctx->hw_frames_ctx->data;
+        hw_device_ctx = (AVHWDeviceContext *)frames_ctx->device_ref->data;
+
+        av_nvtegra_dfs_uninit(hw_device_ctx, ctx->channel);
+    }
+
+
+    return 0;
+}
+
+static void nvtegra_fdd_priv_free(void *priv) {
+    FFNVTegraDecodeFrame    *tf = priv;
+    FFNVTegraDecodeContext *ctx = tf->ctx;
+
+    if (!tf)
+        return;
+
+    if (tf->in_flight)
+        av_nvtegra_syncpt_wait(ctx->channel, tf->fence, -1);
+
+    av_buffer_unref(&tf->input_map_ref);
+    av_freep(&tf);
+}
+
+int ff_nvtegra_wait_decode(void *logctx, AVFrame *frame) {
+    FrameDecodeData             *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame         *tf = fdd->hwaccel_priv;
+    FFNVTegraDecodeContext      *ctx = tf->ctx;
+    AVNVTegraMap          *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    AVHWDeviceContext *hw_device_ctx = (AVHWDeviceContext *)ctx->hw_device_ref->data;
+
+    nvdec_status_s *nvdec_status;
+    nvjpg_dec_status *nvjpg_status;
+    uint32_t decode_cycles;
+    uint8_t *mem;
+    int err;
+
+    if (!tf->in_flight)
+        return 0;
+
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    err = av_nvtegra_syncpt_wait(ctx->channel, tf->fence, -1);
+    if (err < 0)
+        return err;
+
+    tf->in_flight = false;
+
+    if (!ctx->is_nvjpg) {
+        nvdec_status = (nvdec_status_s *)(mem + ctx->status_off);
+        if (nvdec_status->error_status != 0 || nvdec_status->mbs_in_error != 0)
+            return AVERROR_UNKNOWN;
+
+        decode_cycles = nvdec_status->cycle_count * 16;
+    } else {
+        nvjpg_status = (nvjpg_dec_status *)(mem + ctx->status_off);
+        if (nvjpg_status->error_status != 0 || nvjpg_status->bytes_offset == 0)
+            return AVERROR_UNKNOWN;
+
+        decode_cycles = nvjpg_status->cycle_count;
+    }
+
+    /* Decode time in µs: decode_cycles * 1000000 / ctx->channel->clock */
+    err = av_nvtegra_dfs_update(hw_device_ctx, ctx->channel, tf->bitstream_len, decode_cycles);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+int ff_nvtegra_start_frame(AVCodecContext *avctx, AVFrame *frame, FFNVTegraDecodeContext *ctx) {
+    AVHWFramesContext *frames_ctx = (AVHWFramesContext *)avctx->hw_frames_ctx->data;
+    FrameDecodeData          *fdd = (FrameDecodeData *)frame->private_ref->data;
+
+    FFNVTegraDecodeFrame *tf = NULL;
+    int err;
+
+    /* Abort on resolution changes that wouldn't fit into the frame */
+    if ((frame->width > frames_ctx->width) || (frame->height > frames_ctx->height))
+        return AVERROR(EINVAL);
+
+    ctx->bitstream_len = ctx->num_slices = 0;
+
+    if (fdd->hwaccel_priv) {
+        /*
+        * For interlaced video, both fields use the same fdd,
+        * however by proceeding we might overwrite the input buffer
+        * during the decoding, so wait for the previous operation to complete.
+        */
+       err = ff_nvtegra_wait_decode(avctx, frame);
+        if (err < 0)
+            return err;
+    } else {
+        tf = av_mallocz(sizeof(*tf));
+        if (!tf)
+            return AVERROR(ENOMEM);
+
+        fdd->hwaccel_priv      = tf;
+        fdd->hwaccel_priv_free = nvtegra_fdd_priv_free;
+        fdd->post_process      = ff_nvtegra_wait_decode;
+
+        tf->ctx = ctx;
+
+        tf->input_map_ref = av_buffer_pool_get(ctx->decoder_pool);
+        if (!tf->input_map_ref) {
+            err = AVERROR(ENOMEM);
+            goto fail;
+        }
+    }
+
+    tf = fdd->hwaccel_priv;
+    tf->in_flight = false;
+
+    err = av_nvtegra_cmdbuf_add_memory(&ctx->cmdbuf, (AVNVTegraMap *)tf->input_map_ref->data,
+                                       ctx->cmdbuf_off, ctx->max_cmdbuf_size);
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_cmdbuf_clear(&ctx->cmdbuf);
+    if (err < 0)
+        return err;
+
+    return 0;
+
+fail:
+    nvtegra_fdd_priv_free(tf);
+    return err;
+}
+
+int ff_nvtegra_decode_slice(AVCodecContext *avctx, AVFrame *frame,
+                            const uint8_t *buf, uint32_t buf_size, bool add_startcode)
+{
+    FFNVTegraDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    FrameDecodeData        *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame    *tf = fdd->hwaccel_priv;
+    AVNVTegraMap     *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    bool need_bitstream_move = false;
+    uint32_t old_bitstream_off, startcode_size;
+    uint8_t *mem;
+    int err;
+
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    startcode_size = add_startcode ? 3 : 0;
+
+    /* Reserve 16 bytes for the termination sequence */
+    if (ctx->bitstream_len + buf_size + startcode_size >= ctx->max_bitstream_size - 16) {
+        ctx->input_map_size += ctx->max_bitstream_size + buf_size;
+        ctx->input_map_size  = FFALIGN(ctx->input_map_size, 0x1000);
+
+        ctx->max_bitstream_size = ctx->input_map_size - ctx->bitstream_off;
+
+        need_bitstream_move = false;
+    }
+
+    /* Reserve 4 bytes for the bitstream size */
+    if (ctx->max_num_slices &&  ctx->num_slices >= ctx->max_num_slices - 1) {
+        ctx->input_map_size += ctx->max_num_slices * sizeof(uint32_t);
+        ctx->input_map_size  = FFALIGN(ctx->input_map_size, 0x1000);
+
+        ctx->max_num_slices *= 2;
+
+        old_bitstream_off = ctx->bitstream_off;
+        ctx->bitstream_off = ctx->slice_offsets_off + ctx->max_num_slices * sizeof(uint32_t);
+
+        need_bitstream_move = true;
+    }
+
+    if (ctx->input_map_size != av_nvtegra_map_get_size(input_map)) {
+        err = av_nvtegra_map_realloc(input_map, ctx->input_map_size, 0x100,
+                                     NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
+        if (err < 0)
+            return err;
+
+        mem = av_nvtegra_map_get_addr(input_map);
+
+        err = av_nvtegra_cmdbuf_add_memory(&ctx->cmdbuf, input_map,
+                                           ctx->cmdbuf_off, ctx->max_cmdbuf_size);
+        if (err < 0)
+            return err;
+
+        /* Running out of slice offsets mem shouldn't happen so the extra memmove is fine */
+        if (need_bitstream_move)
+            memmove(mem + ctx->bitstream_off, mem + old_bitstream_off, ctx->bitstream_len);
+    }
+
+    if (ctx->max_num_slices)
+        ((uint32_t *)(mem + ctx->slice_offsets_off))[ctx->num_slices] = ctx->bitstream_len;
+
+    /* NAL startcode 000001 */
+    if (add_startcode) {
+        AV_WB24(mem + ctx->bitstream_off + ctx->bitstream_len, 1);
+        ctx->bitstream_len += 3;
+    }
+
+    memcpy(mem + ctx->bitstream_off + ctx->bitstream_len, buf, buf_size);
+    ctx->bitstream_len += buf_size;
+
+    ctx->num_slices++;
+
+    return 0;
+}
+
+int ff_nvtegra_end_frame(AVCodecContext *avctx, AVFrame *frame, FFNVTegraDecodeContext *ctx,
+                         const uint8_t *end_sequence, int end_sequence_size)
+{
+    FrameDecodeData     *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv;
+    AVNVTegraMap  *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    uint8_t *mem;
+    int err;
+
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    /* Last slice data range */
+    if (ctx->max_num_slices)
+        ((uint32_t *)(mem + ctx->slice_offsets_off))[ctx->num_slices] = ctx->bitstream_len;
+
+    /* Termination sequence for the bitstream data */
+    if (end_sequence_size)
+        memcpy(mem + ctx->bitstream_off + ctx->bitstream_len, end_sequence, end_sequence_size);
+
+    err = av_nvtegra_cmdbuf_begin(&ctx->cmdbuf, !ctx->is_nvjpg ? HOST1X_CLASS_NVDEC : HOST1X_CLASS_NVJPG);
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_cmdbuf_add_syncpt_incr(&ctx->cmdbuf, ctx->channel->syncpt, 0);
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_cmdbuf_end(&ctx->cmdbuf);
+    if (err < 0)
+        return err;
+
+    err = av_nvtegra_channel_submit(ctx->channel, &ctx->cmdbuf, &tf->fence);
+    if (err < 0)
+        return err;
+
+    tf->bitstream_len = ctx->bitstream_len;
+    tf->in_flight     = true;
+
+    ctx->frame_idx++;
+
+    ctx->new_input_buffer = false;
+
+    return 0;
+}
+
+static int nvtegra_get_size_constraints(enum AVCodecID codec,
+                                        int *min_width, int *min_height,
+                                        int *max_width, int *max_height,
+                                        int *align, int *max_mbs)
+{
+    switch (codec) {
+        case AV_CODEC_ID_MPEG1VIDEO:
+        case AV_CODEC_ID_MPEG2VIDEO:
+            *min_width = 48,    *min_height = 1;
+            *max_width = 4096,  *max_height = 4096;
+            *align     = 16,    *max_mbs    = 0x20000;
+            break;
+
+        case AV_CODEC_ID_MPEG4:
+            *min_width = 48,    *min_height = 1;
+            *max_width = 2048,  *max_height = 2048;
+            *align     = 16,    *max_mbs    = 0x2000;
+            break;
+
+        case AV_CODEC_ID_VC1:
+        case AV_CODEC_ID_WMV3:
+            *min_width = 48,    *min_height = 1;
+            *max_width = 2048,  *max_height = 2048;
+            *align     = 1,     *max_mbs    = -1;
+            break;
+
+        case AV_CODEC_ID_H264:
+            *min_width = 48,    *min_height = 1;
+            *max_width = 4096,  *max_height = 4096;
+            *align     = 16,    *max_mbs    = 0x20000;
+            break;
+
+        case AV_CODEC_ID_HEVC:
+            /* Note: on nvdec 4.0+ (tegra 194) max dimensions are 8192, and max mbs 0x80000 */
+            *min_width = 144,   *min_height = 144;
+            *max_width = 4096,  *max_height = 4096;
+            *align     = 64,    *max_mbs    = 0x20000;
+            break;
+
+        case AV_CODEC_ID_VP8:
+            *min_width = 48,    *min_height = 1;
+            *max_width = 4096,  *max_height = 4096;
+            *align     = 16,    *max_mbs    = 0x20000;
+            break;
+
+        case AV_CODEC_ID_VP9:
+            /* Note: on nvdec 4.0+ (tegra 194) max dimensions are 8192, and max mbs 0x40000 */
+            *min_width = 144,   *min_height = 144;
+            *max_width = 4096,  *max_height = 4096;
+            *align     = 16,    *max_mbs    = 0x10000;
+            break;
+
+        case AV_CODEC_ID_MJPEG:
+            *min_width = 1,     *min_height = 1;
+            *max_width = 16384, *max_height = 16384;
+            *align     = 1,     *max_mbs    = -1;
+            break;
+
+        #if 0
+        case AV_CODEC_ID_AV1:
+            /* Note: on nvdec 4.0+ (tegra 194) max dimensions are 8192, and max mbs 0x80000 */
+            *min_width = 128,   *min_height = 128;
+            *max_width = 4096,  *max_height = 4096;
+            *align     = 64,    *max_mbs    = 0x20000;
+            break;
+        #endif
+
+        default:
+            return AVERROR(EINVAL);
+    }
+
+    return 0;
+}
+
+int ff_nvtegra_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx) {
+    AVHWFramesContext *frames_ctx = (AVHWFramesContext *)hw_frames_ctx->data;
+    const AVPixFmtDescriptor *sw_desc;
+
+    int min_width, min_height, max_width, max_height, align, max_mbs,
+        aligned_width, aligned_height, num_mbs;
+    int err;
+
+    err = nvtegra_get_size_constraints(avctx->codec_id, &min_width, &min_height,
+                                       &max_width, &max_height, &align, &max_mbs);
+    if (err < 0)
+        return err;
+
+    aligned_width  = FFALIGN(avctx->coded_width,  align);
+    aligned_height = FFALIGN(avctx->coded_height, align);
+    num_mbs = (aligned_width / 16) * (aligned_height / 16);
+
+    if ((aligned_width  < min_width)  || (aligned_width  > max_width) ||
+        (aligned_height < min_height) || (aligned_height > max_height))
+    {
+        av_log(avctx, AV_LOG_ERROR, "Dimensions %dx%d (min. %dx%d, max. %dx%d) "
+                                    "are not supported by the hardware for codec %s\n",
+               avctx->coded_width, avctx->coded_height,
+               min_width, min_height, max_width, max_height,
+               avctx->codec_descriptor->name);
+        return AVERROR(EINVAL);
+    }
+
+    if ((max_mbs > 0) && (num_mbs > max_mbs)) {
+        av_log(avctx, AV_LOG_ERROR, "Number of macroblocks %d exceeds maximum %d "
+                                    "for codec %s\n",
+               num_mbs, max_mbs, avctx->codec_descriptor->name);
+        return AVERROR(EINVAL);
+    }
+
+    frames_ctx->format = AV_PIX_FMT_NVTEGRA;
+    frames_ctx->width  = FFALIGN(avctx->coded_width,  2); /* NVDEC only supports even sizes */
+    frames_ctx->height = FFALIGN(avctx->coded_height, 2);
+
+    sw_desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt);
+    if (!sw_desc)
+        return AVERROR_BUG;
+
+    switch (sw_desc->comp[0].depth) {
+        case 8:
+            frames_ctx->sw_format = (sw_desc->nb_components > 1) ?
+                                    AV_PIX_FMT_NV12 : AV_PIX_FMT_GRAY8;
+            break;
+        case 10:
+            frames_ctx->sw_format = (sw_desc->nb_components > 1) ?
+                                    AV_PIX_FMT_P010 : AV_PIX_FMT_GRAY10;
+            break;
+        default:
+            return AVERROR(EINVAL);
+    }
+
+    return 0;
+}
diff --git a/libavcodec/nvtegra_decode.h b/libavcodec/nvtegra_decode.h
new file mode 100644
index 0000000000..5260c8b3c5
--- /dev/null
+++ b/libavcodec/nvtegra_decode.h
@@ -0,0 +1,94 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef AVCODEC_NVTEGRA_DECODE_H
+#define AVCODEC_NVTEGRA_DECODE_H
+
+#include <stdbool.h>
+
+#include "avcodec.h"
+#include "libavutil/mem.h"
+#include "libavutil/hwcontext_nvtegra.h"
+
+#include "libavutil/nvdec_drv.h"
+#include "libavutil/nvjpg_drv.h"
+#include "libavutil/clc5b0.h"
+#include "libavutil/cle7d0.h"
+
+typedef struct FFNVTegraDecodeContext {
+    uint64_t frame_idx;
+
+    AVBufferRef *hw_device_ref;
+    AVBufferPool *decoder_pool;
+
+    bool is_nvjpg;
+    AVNVTegraChannel *channel;
+
+    AVNVTegraCmdbuf cmdbuf;
+
+    uint32_t pic_setup_off, status_off, cmdbuf_off,
+             bitstream_off, slice_offsets_off;
+    uint32_t input_map_size;
+    uint32_t max_cmdbuf_size, max_bitstream_size, max_num_slices;
+
+    uint32_t num_slices;
+    uint32_t bitstream_len;
+
+    bool new_input_buffer;
+} FFNVTegraDecodeContext;
+
+typedef struct FFNVTegraDecodeFrame {
+    FFNVTegraDecodeContext *ctx;
+    AVBufferRef *input_map_ref;
+    uint32_t fence;
+    uint32_t bitstream_len;
+    bool in_flight;
+} FFNVTegraDecodeFrame;
+
+static inline size_t ff_nvtegra_decode_pick_bitstream_buffer_size(AVCodecContext *avctx) {
+    /*
+     * Official software uses a static map of a predetermined size, usually around 0x600000 (6MiB).
+     * Our implementation supports dynamically resizing the input map, so be less conservative.
+     */
+    if ((avctx->coded_width >= 3840) || (avctx->coded_height >= 2160))  /* 4k */
+        return 0x100000;                                                /* 1MiB */
+    if ((avctx->coded_width >= 1920) || (avctx->coded_height >= 1080))  /* 1080p */
+        return 0x40000;                                                 /* 256KiB */
+    else
+        return 0x10000;                                                 /* 64KiB */
+}
+
+static inline AVFrame *ff_nvtegra_safe_get_ref(AVFrame *ref, AVFrame *fallback) {
+    return (ref && ref->private_ref) ? ref : fallback;
+}
+
+int ff_nvtegra_decode_init(AVCodecContext *avctx, FFNVTegraDecodeContext *ctx);
+int ff_nvtegra_decode_uninit(AVCodecContext *avctx, FFNVTegraDecodeContext *ctx);
+int ff_nvtegra_start_frame(AVCodecContext *avctx, AVFrame *frame, FFNVTegraDecodeContext *ctx);
+int ff_nvtegra_decode_slice(AVCodecContext *avctx, AVFrame *frame,
+                            const uint8_t *buf, uint32_t buf_size, bool add_startcode);
+int ff_nvtegra_end_frame(AVCodecContext *avctx, AVFrame *frame, FFNVTegraDecodeContext *ctx,
+                         const uint8_t *end_sequence, int end_sequence_size);
+
+int ff_nvtegra_wait_decode(void *logctx, AVFrame *frame);
+
+int ff_nvtegra_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx);
+
+#endif /* AVCODEC_NVTEGRA_DECODE_H */
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 09/16] nvtegra: add mpeg1/2 hardware decoding
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (7 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 08/16] nvtegra: add common hardware decoding code averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 10/16] nvtegra: add mpeg4 " averne
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

This is probably the most straightforward codec to implement on NVDEC. Since mpeg2 is a superset of mpeg1, both are supported by the same backend.

Signed-off-by: averne <averne381@gmail.com>
---
 configure                   |   4 +
 libavcodec/Makefile         |   2 +
 libavcodec/hwaccels.h       |   2 +
 libavcodec/mpeg12dec.c      |  12 ++
 libavcodec/nvtegra_mpeg12.c | 319 ++++++++++++++++++++++++++++++++++++
 5 files changed, 339 insertions(+)
 create mode 100644 libavcodec/nvtegra_mpeg12.c

diff --git a/configure b/configure
index 566bb37b8c..67db4a2ed2 100755
--- a/configure
+++ b/configure
@@ -3221,6 +3221,8 @@ mpeg1_vdpau_hwaccel_deps="vdpau"
 mpeg1_vdpau_hwaccel_select="mpeg1video_decoder"
 mpeg1_videotoolbox_hwaccel_deps="videotoolbox"
 mpeg1_videotoolbox_hwaccel_select="mpeg1video_decoder"
+mpeg1_nvtegra_hwaccel_deps="nvtegra"
+mpeg1_nvtegra_hwaccel_select="mpeg1video_decoder"
 mpeg2_d3d11va_hwaccel_deps="d3d11va"
 mpeg2_d3d11va_hwaccel_select="mpeg2video_decoder"
 mpeg2_d3d11va2_hwaccel_deps="d3d11va"
@@ -3237,6 +3239,8 @@ mpeg2_vdpau_hwaccel_deps="vdpau"
 mpeg2_vdpau_hwaccel_select="mpeg2video_decoder"
 mpeg2_videotoolbox_hwaccel_deps="videotoolbox"
 mpeg2_videotoolbox_hwaccel_select="mpeg2video_decoder"
+mpeg2_nvtegra_hwaccel_deps="nvtegra"
+mpeg2_nvtegra_hwaccel_select="mpeg2video_decoder"
 mpeg4_nvdec_hwaccel_deps="nvdec"
 mpeg4_nvdec_hwaccel_select="mpeg4_decoder"
 mpeg4_vaapi_hwaccel_deps="vaapi"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index f1e2dc6625..e4dfcbce6c 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1026,6 +1026,7 @@ OBJS-$(CONFIG_MJPEG_VAAPI_HWACCEL)        += vaapi_mjpeg.o
 OBJS-$(CONFIG_MPEG1_NVDEC_HWACCEL)        += nvdec_mpeg12.o
 OBJS-$(CONFIG_MPEG1_VDPAU_HWACCEL)        += vdpau_mpeg12.o
 OBJS-$(CONFIG_MPEG1_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o
+OBJS-$(CONFIG_MPEG1_NVTEGRA_HWACCEL)      += nvtegra_mpeg12.o
 OBJS-$(CONFIG_MPEG2_D3D11VA_HWACCEL)      += dxva2_mpeg2.o
 OBJS-$(CONFIG_MPEG2_DXVA2_HWACCEL)        += dxva2_mpeg2.o
 OBJS-$(CONFIG_MPEG2_D3D12VA_HWACCEL)      += dxva2_mpeg2.o d3d12va_mpeg2.o
@@ -1034,6 +1035,7 @@ OBJS-$(CONFIG_MPEG2_QSV_HWACCEL)          += qsvdec.o
 OBJS-$(CONFIG_MPEG2_VAAPI_HWACCEL)        += vaapi_mpeg2.o
 OBJS-$(CONFIG_MPEG2_VDPAU_HWACCEL)        += vdpau_mpeg12.o
 OBJS-$(CONFIG_MPEG2_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o
+OBJS-$(CONFIG_MPEG2_NVTEGRA_HWACCEL)      += nvtegra_mpeg12.o
 OBJS-$(CONFIG_MPEG4_NVDEC_HWACCEL)        += nvdec_mpeg4.o
 OBJS-$(CONFIG_MPEG4_VAAPI_HWACCEL)        += vaapi_mpeg4.o
 OBJS-$(CONFIG_MPEG4_VDPAU_HWACCEL)        += vdpau_mpeg4.o
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index 5171e4c7d7..ad9e9366f2 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -52,6 +52,7 @@ extern const struct FFHWAccel ff_mjpeg_vaapi_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_vdpau_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_mpeg1_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_d3d11va_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_d3d11va2_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_d3d12va_hwaccel;
@@ -60,6 +61,7 @@ extern const struct FFHWAccel ff_mpeg2_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_vaapi_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_vdpau_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_mpeg2_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_mpeg4_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mpeg4_vaapi_hwaccel;
 extern const struct FFHWAccel ff_mpeg4_vdpau_hwaccel;
diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
index 9fd765f030..7d8ecae542 100644
--- a/libavcodec/mpeg12dec.c
+++ b/libavcodec/mpeg12dec.c
@@ -835,6 +835,9 @@ static const enum AVPixelFormat mpeg1_hwaccel_pixfmt_list_420[] = {
 #endif
 #if CONFIG_MPEG1_VDPAU_HWACCEL
     AV_PIX_FMT_VDPAU,
+#endif
+#if CONFIG_MPEG1_NVTEGRA_HWACCEL
+    AV_PIX_FMT_NVTEGRA,
 #endif
     AV_PIX_FMT_YUV420P,
     AV_PIX_FMT_NONE
@@ -862,6 +865,9 @@ static const enum AVPixelFormat mpeg2_hwaccel_pixfmt_list_420[] = {
 #endif
 #if CONFIG_MPEG2_VIDEOTOOLBOX_HWACCEL
     AV_PIX_FMT_VIDEOTOOLBOX,
+#endif
+#if CONFIG_MPEG2_NVTEGRA_HWACCEL
+    AV_PIX_FMT_NVTEGRA,
 #endif
     AV_PIX_FMT_YUV420P,
     AV_PIX_FMT_NONE
@@ -2624,6 +2630,9 @@ const FFCodec ff_mpeg1video_decoder = {
 #endif
 #if CONFIG_MPEG1_VIDEOTOOLBOX_HWACCEL
                                HWACCEL_VIDEOTOOLBOX(mpeg1),
+#endif
+#if CONFIG_MPEG1_NVTEGRA_HWACCEL
+                               HWACCEL_NVTEGRA(mpeg1),
 #endif
                                NULL
                            },
@@ -2696,6 +2705,9 @@ const FFCodec ff_mpeg2video_decoder = {
 #endif
 #if CONFIG_MPEG2_VIDEOTOOLBOX_HWACCEL
                         HWACCEL_VIDEOTOOLBOX(mpeg2),
+#endif
+#if CONFIG_MPEG2_NVTEGRA_HWACCEL
+                        HWACCEL_NVTEGRA(mpeg2),
 #endif
                         NULL
                     },
diff --git a/libavcodec/nvtegra_mpeg12.c b/libavcodec/nvtegra_mpeg12.c
new file mode 100644
index 0000000000..2206635a7d
--- /dev/null
+++ b/libavcodec/nvtegra_mpeg12.c
@@ -0,0 +1,319 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "config_components.h"
+
+#include <string.h>
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "mpegvideo.h"
+#include "mpegutils.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraMPEG12DecodeContext {
+    FFNVTegraDecodeContext core;
+
+    AVFrame *prev_frame, *next_frame;
+} NVTegraMPEG12DecodeContext;
+
+/* Size (width, height) of a macroblock */
+#define MB_SIZE 16
+
+static const uint8_t bitstream_end_sequence[16] = {
+    0x00, 0x00, 0x01, 0xb7, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0xb7, 0x00, 0x00, 0x00, 0x00,
+};
+
+static int nvtegra_mpeg12_decode_uninit(AVCodecContext *avctx) {
+    NVTegraMPEG12DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA MPEG12 decoder\n");
+
+    err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_mpeg12_decode_init(AVCodecContext *avctx) {
+    NVTegraMPEG12DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    uint32_t num_slices;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA MPEG12 decoder\n");
+
+    num_slices = (FFALIGN(avctx->coded_width,  MB_SIZE) / MB_SIZE) *
+                 (FFALIGN(avctx->coded_height, MB_SIZE) / MB_SIZE);
+    num_slices = FFMIN(num_slices, 8160);
+
+    /* Ignored: histogram map, size 0x400 */
+    ctx->core.pic_setup_off     = 0;
+    ctx->core.status_off        = FFALIGN(ctx->core.pic_setup_off     + sizeof(nvdec_mpeg2_pic_s),
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.cmdbuf_off        = FFALIGN(ctx->core.status_off        + sizeof(nvdec_status_s),
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.slice_offsets_off = FFALIGN(ctx->core.cmdbuf_off        + AV_NVTEGRA_MAP_ALIGN,
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.bitstream_off     = FFALIGN(ctx->core.slice_offsets_off + num_slices * sizeof(uint32_t),
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.input_map_size    = FFALIGN(ctx->core.bitstream_off     + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx),
+                                          0x1000);
+
+    ctx->core.max_cmdbuf_size    =  ctx->core.slice_offsets_off - ctx->core.cmdbuf_off;
+    ctx->core.max_num_slices     = (ctx->core.bitstream_off     - ctx->core.slice_offsets_off) / sizeof(uint32_t);
+    ctx->core.max_bitstream_size =  ctx->core.input_map_size    - ctx->core.bitstream_off;
+
+    err = ff_nvtegra_decode_init(avctx, &ctx->core);
+    if (err < 0)
+        goto fail;
+
+    return 0;
+
+fail:
+    nvtegra_mpeg12_decode_uninit(avctx);
+    return err;
+}
+
+static void nvtegra_mpeg12_prepare_frame_setup(nvdec_mpeg2_pic_s *setup, MpegEncContext *s,
+                                               NVTegraMPEG12DecodeContext *ctx)
+{
+    *setup = (nvdec_mpeg2_pic_s){
+        .gptimer_timeout_value      = 0, /* Default value */
+
+        .FrameWidth                 = FFALIGN(s->width,  MB_SIZE),
+        .FrameHeight                = FFALIGN(s->height, MB_SIZE),
+
+        .picture_structure          = s->picture_structure,
+        .picture_coding_type        = s->pict_type,
+        .intra_dc_precision         = s->intra_dc_precision,
+        .frame_pred_frame_dct       = s->frame_pred_frame_dct,
+        .concealment_motion_vectors = s->concealment_motion_vectors,
+        .intra_vlc_format           = s->intra_vlc_format,
+
+        .tileFormat                 = 0, /* TBL */
+        .gob_height                 = 0, /* GOB_2 */
+
+        .f_code                     = {
+            s->mpeg_f_code[0][0], s->mpeg_f_code[0][1],
+            s->mpeg_f_code[1][0], s->mpeg_f_code[1][1],
+        },
+
+        .PicWidthInMbs              = FFALIGN(s->width,  MB_SIZE) / MB_SIZE,
+        .FrameHeightInMbs           = FFALIGN(s->height, MB_SIZE) / MB_SIZE,
+        .pitch_luma                 = s->current_picture.f->linesize[0],
+        .pitch_chroma               = s->current_picture.f->linesize[1],
+        .luma_top_offset            = 0,
+        .luma_bot_offset            = 0,
+        .luma_frame_offset          = 0,
+        .chroma_top_offset          = 0,
+        .chroma_bot_offset          = 0,
+        .chroma_frame_offset        = 0,
+        .alternate_scan             = s->alternate_scan,
+        .secondfield                = s->picture_structure != PICT_FRAME && !s->first_field,
+        .rounding_type              = 0,
+        .q_scale_type               = s->q_scale_type,
+        .top_field_first            = s->top_field_first,
+        .full_pel_fwd_vector        = (s->codec_id != AV_CODEC_ID_MPEG2VIDEO) ? s->full_pel[0] : 0,
+        .full_pel_bwd_vector        = (s->codec_id != AV_CODEC_ID_MPEG2VIDEO) ? s->full_pel[1] : 0,
+        .output_memory_layout       = 0, /* NV12 */
+        .ref_memory_layout          = { 0, 0 }, /* NV12 */
+    };
+
+    for (int i = 0; i < FF_ARRAY_ELEMS(setup->quant_mat_8x8intra); ++i) {
+        setup->quant_mat_8x8intra   [i] = (NvU8)s->intra_matrix[i];
+        setup->quant_mat_8x8nonintra[i] = (NvU8)s->inter_matrix[i];
+    }
+}
+
+static int nvtegra_mpeg12_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, MpegEncContext *s, NVTegraMPEG12DecodeContext *ctx,
+                                         AVFrame *current_frame, AVFrame *prev_frame, AVFrame *next_frame)
+{
+    FrameDecodeData     *fdd = (FrameDecodeData *)current_frame->private_ref->data;
+    FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv;
+    AVNVTegraMap  *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    int err, codec_id;
+
+    err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC);
+    if (err < 0)
+        return err;
+
+    switch (s->codec_id) {
+        case AV_CODEC_ID_MPEG1VIDEO:
+            codec_id = NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_MPEG1;
+            break;
+        case AV_CODEC_ID_MPEG2VIDEO:
+            codec_id = NVC5B0_SET_CONTROL_PARAMS_CODEC_TYPE_MPEG2;
+            break;
+        default:
+            return AVERROR(EINVAL);
+    }
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID,
+                          AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, MPEG12));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS, codec_id                    |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1) |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON,     1));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX,
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx));
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET,
+                          input_map, ctx->core.pic_setup_off,     NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET,
+                          input_map, ctx->core.bitstream_off,     NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_SLICE_OFFSETS_BUF_OFFSET,
+                          input_map, ctx->core.slice_offsets_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET,
+                          input_map, ctx->core.status_off,        NVHOST_RELOC_TYPE_DEFAULT);
+
+#define PUSH_FRAME(fr, offset) ({                                                           \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0   + offset * 4,           \
+                          av_nvtegra_frame_get_fbuf_map(fr), 0, NVHOST_RELOC_TYPE_DEFAULT); \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4,           \
+                          av_nvtegra_frame_get_fbuf_map(fr), fr->data[1] - fr->data[0],     \
+                          NVHOST_RELOC_TYPE_DEFAULT);                                       \
+})
+
+    PUSH_FRAME(current_frame, 0);
+    PUSH_FRAME(prev_frame,    1);
+    PUSH_FRAME(next_frame,    2);
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE,
+                          AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE));
+
+    err = av_nvtegra_cmdbuf_end(cmdbuf);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_mpeg12_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) {
+    MpegEncContext               *s = avctx->priv_data;
+    AVFrame                  *frame = s->current_picture.f;
+    FrameDecodeData            *fdd = (FrameDecodeData *)frame->private_ref->data;
+    NVTegraMPEG12DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    FFNVTegraDecodeFrame *tf;
+    AVNVTegraMap *input_map;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Starting MPEG12-NVTEGRA frame with pixel format %s\n",
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    err = ff_nvtegra_start_frame(avctx, frame, &ctx->core);
+    if (err < 0)
+        return err;
+
+    tf = fdd->hwaccel_priv;
+    input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    nvtegra_mpeg12_prepare_frame_setup((nvdec_mpeg2_pic_s *)(mem + ctx->core.pic_setup_off), s, ctx);
+
+    ctx->prev_frame = (s->pict_type != AV_PICTURE_TYPE_I) ? s->last_picture.f : frame;
+    ctx->next_frame = (s->pict_type == AV_PICTURE_TYPE_B) ? s->next_picture.f : frame;
+
+    return 0;
+}
+
+static int nvtegra_mpeg12_end_frame(AVCodecContext *avctx) {
+    MpegEncContext               *s = avctx->priv_data;
+    NVTegraMPEG12DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    AVFrame                  *frame = s->current_picture.f;
+    FrameDecodeData            *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame        *tf = fdd->hwaccel_priv;
+
+    nvdec_mpeg2_pic_s *setup;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Ending MPEG12-NVTEGRA frame with %u slices -> %u bytes\n",
+           ctx->core.num_slices, ctx->core.bitstream_len);
+
+    if (!tf || !ctx->core.num_slices)
+        return 0;
+
+    mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data);
+
+    setup = (nvdec_mpeg2_pic_s *)(mem + ctx->core.pic_setup_off);
+    setup->stream_len  = ctx->core.bitstream_len + sizeof(bitstream_end_sequence);
+    setup->slice_count = ctx->core.num_slices;
+
+    err = nvtegra_mpeg12_prepare_cmdbuf(&ctx->core.cmdbuf, s, ctx, frame,
+                                        ctx->prev_frame, ctx->next_frame);
+    if (err < 0)
+        return err;
+
+    return ff_nvtegra_end_frame(avctx, frame, &ctx->core, bitstream_end_sequence,
+                                sizeof(bitstream_end_sequence));
+}
+
+static int nvtegra_mpeg12_decode_slice(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) {
+    MpegEncContext *s = avctx->priv_data;
+    AVFrame    *frame = s->current_picture.f;
+
+    return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, false);
+}
+
+#if CONFIG_MPEG1_NVTEGRA_HWACCEL
+const FFHWAccel ff_mpeg1_nvtegra_hwaccel = {
+    .p.name         = "mpeg1_nvtegra",
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_MPEG1VIDEO,
+    .p.pix_fmt      = AV_PIX_FMT_NVTEGRA,
+    .start_frame    = &nvtegra_mpeg12_start_frame,
+    .end_frame      = &nvtegra_mpeg12_end_frame,
+    .decode_slice   = &nvtegra_mpeg12_decode_slice,
+    .init           = &nvtegra_mpeg12_decode_init,
+    .uninit         = &nvtegra_mpeg12_decode_uninit,
+    .frame_params   = &ff_nvtegra_frame_params,
+    .priv_data_size = sizeof(NVTegraMPEG12DecodeContext),
+    .caps_internal  = HWACCEL_CAP_ASYNC_SAFE,
+};
+#endif
+
+#if CONFIG_MPEG2_NVTEGRA_HWACCEL
+const FFHWAccel ff_mpeg2_nvtegra_hwaccel = {
+    .p.name         = "mpeg2_nvtegra",
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_MPEG2VIDEO,
+    .p.pix_fmt      = AV_PIX_FMT_NVTEGRA,
+    .start_frame    = &nvtegra_mpeg12_start_frame,
+    .end_frame      = &nvtegra_mpeg12_end_frame,
+    .decode_slice   = &nvtegra_mpeg12_decode_slice,
+    .init           = &nvtegra_mpeg12_decode_init,
+    .uninit         = &nvtegra_mpeg12_decode_uninit,
+    .frame_params   = &ff_nvtegra_frame_params,
+    .priv_data_size = sizeof(NVTegraMPEG12DecodeContext),
+    .caps_internal  = HWACCEL_CAP_ASYNC_SAFE,
+};
+#endif
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 10/16] nvtegra: add mpeg4 hardware decoding
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (8 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 09/16] nvtegra: add mpeg1/2 hardware decoding averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 11/16] nvtegra: add vc1 " averne
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

Signed-off-by: averne <averne381@gmail.com>
---
 configure                  |   2 +
 libavcodec/Makefile        |   1 +
 libavcodec/h263dec.c       |   6 +
 libavcodec/hwaccels.h      |   1 +
 libavcodec/mpeg4videodec.c |   3 +
 libavcodec/nvtegra_mpeg4.c | 344 +++++++++++++++++++++++++++++++++++++
 6 files changed, 357 insertions(+)
 create mode 100644 libavcodec/nvtegra_mpeg4.c

diff --git a/configure b/configure
index 67db4a2ed2..0795f44a1e 100755
--- a/configure
+++ b/configure
@@ -3251,6 +3251,8 @@ mpeg4_videotoolbox_hwaccel_deps="videotoolbox"
 mpeg4_videotoolbox_hwaccel_select="mpeg4_decoder"
 prores_videotoolbox_hwaccel_deps="videotoolbox"
 prores_videotoolbox_hwaccel_select="prores_decoder"
+mpeg4_nvtegra_hwaccel_deps="nvtegra"
+mpeg4_nvtegra_hwaccel_select="mpeg4_decoder"
 vc1_d3d11va_hwaccel_deps="d3d11va"
 vc1_d3d11va_hwaccel_select="vc1_decoder"
 vc1_d3d11va2_hwaccel_deps="d3d11va"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index e4dfcbce6c..1ea9984876 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1040,6 +1040,7 @@ OBJS-$(CONFIG_MPEG4_NVDEC_HWACCEL)        += nvdec_mpeg4.o
 OBJS-$(CONFIG_MPEG4_VAAPI_HWACCEL)        += vaapi_mpeg4.o
 OBJS-$(CONFIG_MPEG4_VDPAU_HWACCEL)        += vdpau_mpeg4.o
 OBJS-$(CONFIG_MPEG4_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o
+OBJS-$(CONFIG_MPEG4_NVTEGRA_HWACCEL)      += nvtegra_mpeg4.o
 OBJS-$(CONFIG_VC1_D3D11VA_HWACCEL)        += dxva2_vc1.o
 OBJS-$(CONFIG_VC1_DXVA2_HWACCEL)          += dxva2_vc1.o
 OBJS-$(CONFIG_VC1_D3D12VA_HWACCEL)        += dxva2_vc1.o d3d12va_vc1.o
diff --git a/libavcodec/h263dec.c b/libavcodec/h263dec.c
index 48bd467f30..db25e09ff3 100644
--- a/libavcodec/h263dec.c
+++ b/libavcodec/h263dec.c
@@ -60,6 +60,9 @@ static const enum AVPixelFormat h263_hwaccel_pixfmt_list_420[] = {
 #endif
 #if CONFIG_H263_VIDEOTOOLBOX_HWACCEL || CONFIG_MPEG4_VIDEOTOOLBOX_HWACCEL
     AV_PIX_FMT_VIDEOTOOLBOX,
+#endif
+#if CONFIG_MPEG4_NVTEGRA_HWACCEL
+    AV_PIX_FMT_NVTEGRA,
 #endif
     AV_PIX_FMT_YUV420P,
     AV_PIX_FMT_NONE
@@ -690,6 +693,9 @@ static const AVCodecHWConfigInternal *const h263_hw_config_list[] = {
 #if CONFIG_MPEG4_VDPAU_HWACCEL
     HWACCEL_VDPAU(mpeg4),
 #endif
+#if CONFIG_MPEG4_NVTEGRA_HWACCEL
+    HWACCEL_NVTEGRA(mpeg4),
+#endif
 #if CONFIG_H263_VIDEOTOOLBOX_HWACCEL
     HWACCEL_VIDEOTOOLBOX(h263),
 #endif
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index ad9e9366f2..da2b4ae10e 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -67,6 +67,7 @@ extern const struct FFHWAccel ff_mpeg4_vaapi_hwaccel;
 extern const struct FFHWAccel ff_mpeg4_vdpau_hwaccel;
 extern const struct FFHWAccel ff_mpeg4_videotoolbox_hwaccel;
 extern const struct FFHWAccel ff_prores_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_mpeg4_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_vc1_d3d11va_hwaccel;
 extern const struct FFHWAccel ff_vc1_d3d11va2_hwaccel;
 extern const struct FFHWAccel ff_vc1_d3d12va_hwaccel;
diff --git a/libavcodec/mpeg4videodec.c b/libavcodec/mpeg4videodec.c
index df1e22207d..15e2da5e88 100644
--- a/libavcodec/mpeg4videodec.c
+++ b/libavcodec/mpeg4videodec.c
@@ -3882,6 +3882,9 @@ const FFCodec ff_mpeg4_decoder = {
 #endif
 #if CONFIG_MPEG4_VIDEOTOOLBOX_HWACCEL
                                HWACCEL_VIDEOTOOLBOX(mpeg4),
+#endif
+#if CONFIG_MPEG4_NVTEGRA_HWACCEL
+                               HWACCEL_NVTEGRA(mpeg4),
 #endif
                                NULL
                            },
diff --git a/libavcodec/nvtegra_mpeg4.c b/libavcodec/nvtegra_mpeg4.c
new file mode 100644
index 0000000000..2325380330
--- /dev/null
+++ b/libavcodec/nvtegra_mpeg4.c
@@ -0,0 +1,344 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "mpeg4video.h"
+#include "mpeg4videodec.h"
+#include "mpeg4videodefs.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraMPEG4DecodeContext {
+    FFNVTegraDecodeContext core;
+
+    AVNVTegraMap common_map;
+    uint32_t coloc_off, history_off, scratch_off;
+    uint32_t history_size, scratch_size;
+
+    AVFrame *prev_frame, *next_frame;
+} NVTegraMPEG4DecodeContext;
+
+/* Size (width, height) of a macroblock */
+#define MB_SIZE 16
+
+static const uint8_t bitstream_end_sequence[16] = {
+    0x00, 0x00, 0x01, 0xb1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0xb1, 0x00, 0x00, 0x00, 0x00,
+};
+
+static int nvtegra_mpeg4_decode_uninit(AVCodecContext *avctx) {
+    NVTegraMPEG4DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA MPEG4 decoder\n");
+
+    err = av_nvtegra_map_destroy(&ctx->common_map);
+    if (err < 0)
+        return err;
+
+    err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_mpeg4_decode_init(AVCodecContext *avctx) {
+    NVTegraMPEG4DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    AVHWDeviceContext      *hw_device_ctx;
+    AVNVTegraDeviceContext *device_hwctx;
+    uint32_t width_in_mbs, height_in_mbs,
+             coloc_size, history_size, scratch_size, common_map_size;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA MPEG4 decoder\n");
+
+    /* Ignored: histogram map, size 0x400 */
+    ctx->core.pic_setup_off  = 0;
+    ctx->core.status_off     = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_mpeg4_pic_s),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.cmdbuf_off     = FFALIGN(ctx->core.status_off    + sizeof(nvdec_status_s),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.bitstream_off  = FFALIGN(ctx->core.cmdbuf_off    + AV_NVTEGRA_MAP_ALIGN,
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx),
+                                       0x1000);
+
+    ctx->core.max_cmdbuf_size    = ctx->core.bitstream_off  - ctx->core.cmdbuf_off;
+    ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off;
+
+    err = ff_nvtegra_decode_init(avctx, &ctx->core);
+    if (err < 0)
+        goto fail;
+
+    hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data;
+    device_hwctx  = hw_device_ctx->hwctx;
+
+    width_in_mbs  = FFALIGN(avctx->coded_width,  MB_SIZE) / MB_SIZE;
+    height_in_mbs = FFALIGN(avctx->coded_height, MB_SIZE) / MB_SIZE;
+    coloc_size    = FFALIGN(FFALIGN(height_in_mbs, 2) * (width_in_mbs * 64) - 63, 0x100);
+    history_size  = FFALIGN(width_in_mbs * 0x100 + 0x1100, 0x100);
+    scratch_size  = 0x400;
+
+    ctx->coloc_off   = 0;
+    ctx->history_off = FFALIGN(ctx->coloc_off   + coloc_size,   AV_NVTEGRA_MAP_ALIGN);
+    ctx->scratch_off = FFALIGN(ctx->history_off + history_size, AV_NVTEGRA_MAP_ALIGN);
+    common_map_size  = FFALIGN(ctx->scratch_off + scratch_size, 0x1000);
+
+    err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100,
+                                NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
+    if (err < 0)
+        goto fail;
+
+    ctx->history_size = history_size;
+    ctx->scratch_size = scratch_size;
+
+    return 0;
+
+fail:
+    nvtegra_mpeg4_decode_uninit(avctx);
+    return err;
+}
+
+static void nvtegra_mpeg4_prepare_frame_setup(nvdec_mpeg4_pic_s *setup, AVCodecContext *avctx,
+                                              NVTegraMPEG4DecodeContext *ctx)
+{
+    Mpeg4DecContext *m = avctx->priv_data;
+    MpegEncContext  *s = &m->m;
+
+    int i;
+
+    *setup = (nvdec_mpeg4_pic_s){
+        .scratch_pic_buffer_size      = ctx->scratch_size,
+
+        .gptimer_timeout_value        = 0, /* Default value */
+
+        .FrameWidth                   = FFALIGN(s->width,  MB_SIZE),
+        .FrameHeight                  = FFALIGN(s->height, MB_SIZE),
+
+        .vop_time_increment_bitcount  = m->time_increment_bits,
+        .resync_marker_disable        = !m->resync_marker,
+
+        .tileFormat                   = 0, /* TBL */
+        .gob_height                   = 0, /* GOB_2 */
+
+        .width                        = FFALIGN(s->width,  MB_SIZE),
+        .height                       = FFALIGN(s->height, MB_SIZE),
+
+        .FrameStride                  = {
+            s->current_picture.f->linesize[0],
+            s->current_picture.f->linesize[1],
+        },
+
+        .luma_top_offset              = 0,
+        .luma_bot_offset              = 0,
+        .luma_frame_offset            = 0,
+        .chroma_top_offset            = 0,
+        .chroma_bot_offset            = 0,
+        .chroma_frame_offset          = 0,
+
+        .HistBufferSize               = ctx->history_size / 256,
+
+        .trd                          = { s->pp_time, s->pp_field_time >> 1 },
+        .trb                          = { s->pb_time, s->pb_field_time >> 1 },
+
+        .vop_fcode_forward            = s->f_code,
+        .vop_fcode_backward           = s->b_code,
+
+        .interlaced                   = s->interlaced_dct,
+        .quant_type                   = s->mpeg_quant,
+        .quarter_sample               = s->quarter_sample,
+        .short_video_header           = avctx->codec->id == AV_CODEC_ID_H263,
+
+        .curr_output_memory_layout    = 0, /* NV12 */
+
+        .ptype                        = s->pict_type - AV_PICTURE_TYPE_I,
+        .rnd                          = s->no_rounding,
+        .alternate_vertical_scan_flag = s->alternate_scan,
+
+        .ref_memory_layout            = { 0, 0 }, /* NV12 */
+    };
+
+    for (i = 0; i < 64; ++i) {
+        setup->intra_quant_mat   [i] = s->intra_matrix[i];
+        setup->nonintra_quant_mat[i] = s->inter_matrix[i];
+    }
+}
+
+static int nvtegra_mpeg4_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, MpegEncContext *s, NVTegraMPEG4DecodeContext *ctx,
+                                        AVFrame *cur_frame, AVFrame *prev_frame, AVFrame *next_frame)
+{
+    FrameDecodeData     *fdd = (FrameDecodeData *)cur_frame->private_ref->data;
+    FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv;
+    AVNVTegraMap  *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    int err;
+
+    err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC);
+    if (err < 0)
+        return err;
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID,
+                          AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, MPEG4));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS,
+                          AV_NVTEGRA_ENUM (NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE,     MPEG4) |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1)     |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON,     1));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX,
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx));
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET,
+                          input_map,        ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET,
+                          input_map,        ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET,
+                          input_map,        ctx->core.status_off,    NVHOST_RELOC_TYPE_DEFAULT);
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_COLOC_DATA_OFFSET,
+                          &ctx->common_map, ctx->coloc_off,          NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_HISTORY_OFFSET,
+                          &ctx->common_map, ctx->history_off,        NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PIC_SCRATCH_BUF_OFFSET,
+                          &ctx->common_map, ctx->scratch_off,        NVHOST_RELOC_TYPE_DEFAULT);
+
+#define PUSH_FRAME(fr, offset) ({                                                           \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0   + offset * 4,           \
+                          av_nvtegra_frame_get_fbuf_map(fr), 0, NVHOST_RELOC_TYPE_DEFAULT); \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4,           \
+                          av_nvtegra_frame_get_fbuf_map(fr), fr->data[1] - fr->data[0],     \
+                          NVHOST_RELOC_TYPE_DEFAULT);                                       \
+})
+
+    PUSH_FRAME(cur_frame,  0);
+    PUSH_FRAME(prev_frame, 1);
+    PUSH_FRAME(next_frame, 2);
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE,
+                          AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE));
+
+    err = av_nvtegra_cmdbuf_end(cmdbuf);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_mpeg4_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) {
+    Mpeg4DecContext             *m = avctx->priv_data;
+    MpegEncContext              *s = &m->m;
+    AVFrame                 *frame = s->current_picture.f;
+    FrameDecodeData           *fdd = (FrameDecodeData *)frame->private_ref->data;
+    NVTegraMPEG4DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    FFNVTegraDecodeFrame *tf;
+    AVNVTegraMap *input_map;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Starting MPEG4-NVTEGRA frame with pixel format %s\n",
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    err = ff_nvtegra_start_frame(avctx, frame, &ctx->core);
+    if (err < 0)
+        return err;
+
+    tf = fdd->hwaccel_priv;
+    input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    nvtegra_mpeg4_prepare_frame_setup((nvdec_mpeg4_pic_s *)(mem + ctx->core.pic_setup_off), avctx, ctx);
+
+    ctx->prev_frame = (s->pict_type != AV_PICTURE_TYPE_I) ? s->last_picture.f : frame;
+    ctx->next_frame = (s->pict_type == AV_PICTURE_TYPE_B) ? s->next_picture.f : frame;
+
+    return 0;
+}
+
+static int nvtegra_mpeg4_end_frame(AVCodecContext *avctx) {
+    Mpeg4DecContext             *m = avctx->priv_data;
+    MpegEncContext              *s = &m->m;
+    NVTegraMPEG4DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    AVFrame                 *frame = s->current_picture.f;
+    FrameDecodeData           *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame       *tf = fdd->hwaccel_priv;
+
+    nvdec_mpeg4_pic_s *setup;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Ending MPEG4-NVTEGRA frame with %u slices -> %u bytes\n",
+           ctx->core.num_slices, ctx->core.bitstream_len);
+
+    if (!tf || !ctx->core.num_slices)
+        return 0;
+
+    mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data);
+
+    setup = (nvdec_mpeg4_pic_s *)(mem + ctx->core.pic_setup_off);
+    setup->stream_len  = ctx->core.bitstream_len + sizeof(bitstream_end_sequence);
+    setup->slice_count = ctx->core.num_slices;
+
+    err = nvtegra_mpeg4_prepare_cmdbuf(&ctx->core.cmdbuf, s, ctx, frame,
+                                       ctx->prev_frame, ctx->next_frame);
+    if (err < 0)
+        return err;
+
+    return ff_nvtegra_end_frame(avctx, frame, &ctx->core, bitstream_end_sequence,
+                                sizeof(bitstream_end_sequence));
+}
+
+static int nvtegra_mpeg4_decode_slice(AVCodecContext *avctx, const uint8_t *buf,
+                                  uint32_t buf_size)
+{
+    Mpeg4DecContext *m = avctx->priv_data;
+    AVFrame     *frame = m->m.current_picture.f;
+
+    /* Rewind the bitstream looking for the VOP start marker */
+    while (*(uint32_t *)buf != AV_BE2NE32C(VOP_STARTCODE))
+        buf -= 1, buf_size += 1;
+
+    return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, false);
+}
+
+#if CONFIG_MPEG4_NVTEGRA_HWACCEL
+const FFHWAccel ff_mpeg4_nvtegra_hwaccel = {
+    .p.name         = "mpeg4_nvtegra",
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_MPEG4,
+    .p.pix_fmt      = AV_PIX_FMT_NVTEGRA,
+    .start_frame    = &nvtegra_mpeg4_start_frame,
+    .end_frame      = &nvtegra_mpeg4_end_frame,
+    .decode_slice   = &nvtegra_mpeg4_decode_slice,
+    .init           = &nvtegra_mpeg4_decode_init,
+    .uninit         = &nvtegra_mpeg4_decode_uninit,
+    .frame_params   = &ff_nvtegra_frame_params,
+    .priv_data_size = sizeof(NVTegraMPEG4DecodeContext),
+    .caps_internal  = HWACCEL_CAP_ASYNC_SAFE,
+};
+#endif
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 11/16] nvtegra: add vc1 hardware decoding
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (9 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 10/16] nvtegra: add mpeg4 " averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 12/16] nvtegra: add h264 " averne
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

Since L4T does not hook up the vc1 code to a user-facing library, this was written solely based on static reverse engineering.

Signed-off-by: averne <averne381@gmail.com>
---
 configure                |   3 +
 libavcodec/Makefile      |   1 +
 libavcodec/hwaccels.h    |   2 +
 libavcodec/nvtegra_vc1.c | 455 +++++++++++++++++++++++++++++++++++++++
 libavcodec/vc1dec.c      |   9 +
 5 files changed, 470 insertions(+)
 create mode 100644 libavcodec/nvtegra_vc1.c

diff --git a/configure b/configure
index 0795f44a1e..952e3aef7d 100755
--- a/configure
+++ b/configure
@@ -3267,6 +3267,8 @@ vc1_vaapi_hwaccel_deps="vaapi"
 vc1_vaapi_hwaccel_select="vc1_decoder"
 vc1_vdpau_hwaccel_deps="vdpau"
 vc1_vdpau_hwaccel_select="vc1_decoder"
+vc1_nvtegra_hwaccel_deps="nvtegra"
+vc1_nvtegra_hwaccel_select="vc1_decoder"
 vp8_nvdec_hwaccel_deps="nvdec"
 vp8_nvdec_hwaccel_select="vp8_decoder"
 vp8_vaapi_hwaccel_deps="vaapi"
@@ -3294,6 +3296,7 @@ wmv3_dxva2_hwaccel_select="vc1_dxva2_hwaccel"
 wmv3_nvdec_hwaccel_select="vc1_nvdec_hwaccel"
 wmv3_vaapi_hwaccel_select="vc1_vaapi_hwaccel"
 wmv3_vdpau_hwaccel_select="vc1_vdpau_hwaccel"
+wmv3_nvtegra_hwaccel_select="vc1_nvtegra_hwaccel"
 
 # hardware-accelerated codecs
 mediafoundation_deps="mftransform_h MFCreateAlignedMemoryBuffer"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 1ea9984876..e102d03e7d 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1048,6 +1048,7 @@ OBJS-$(CONFIG_VC1_NVDEC_HWACCEL)          += nvdec_vc1.o
 OBJS-$(CONFIG_VC1_QSV_HWACCEL)            += qsvdec.o
 OBJS-$(CONFIG_VC1_VAAPI_HWACCEL)          += vaapi_vc1.o
 OBJS-$(CONFIG_VC1_VDPAU_HWACCEL)          += vdpau_vc1.o
+OBJS-$(CONFIG_VC1_NVTEGRA_HWACCEL)        += nvtegra_vc1.o
 OBJS-$(CONFIG_VP8_NVDEC_HWACCEL)          += nvdec_vp8.o
 OBJS-$(CONFIG_VP8_VAAPI_HWACCEL)          += vaapi_vp8.o
 OBJS-$(CONFIG_VP9_D3D11VA_HWACCEL)        += dxva2_vp9.o
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index da2b4ae10e..a69e6a1977 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -75,6 +75,7 @@ extern const struct FFHWAccel ff_vc1_dxva2_hwaccel;
 extern const struct FFHWAccel ff_vc1_nvdec_hwaccel;
 extern const struct FFHWAccel ff_vc1_vaapi_hwaccel;
 extern const struct FFHWAccel ff_vc1_vdpau_hwaccel;
+extern const struct FFHWAccel ff_vc1_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_vp8_nvdec_hwaccel;
 extern const struct FFHWAccel ff_vp8_vaapi_hwaccel;
 extern const struct FFHWAccel ff_vp9_d3d11va_hwaccel;
@@ -92,5 +93,6 @@ extern const struct FFHWAccel ff_wmv3_dxva2_hwaccel;
 extern const struct FFHWAccel ff_wmv3_nvdec_hwaccel;
 extern const struct FFHWAccel ff_wmv3_vaapi_hwaccel;
 extern const struct FFHWAccel ff_wmv3_vdpau_hwaccel;
+extern const struct FFHWAccel ff_wmv3_nvtegra_hwaccel;
 
 #endif /* AVCODEC_HWACCELS_H */
diff --git a/libavcodec/nvtegra_vc1.c b/libavcodec/nvtegra_vc1.c
new file mode 100644
index 0000000000..b5ee85c9d4
--- /dev/null
+++ b/libavcodec/nvtegra_vc1.c
@@ -0,0 +1,455 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdbool.h>
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "vc1.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraVC1DecodeContext {
+    FFNVTegraDecodeContext core;
+
+    AVNVTegraMap common_map;
+    uint32_t coloc_off, history_off, scratch_off;
+    uint32_t history_size, scratch_size;
+
+    bool is_first_slice;
+
+    AVFrame *prev_frame, *next_frame;
+} NVTegraVC1DecodeContext;
+
+/* Size (width, height) of a macroblock */
+#define MB_SIZE 16
+
+static const uint8_t bitstream_end_sequence[] = {
+    0x00, 0x00, 0x01, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x0a, 0x00, 0x00, 0x00, 0x00,
+};
+
+static int nvtegra_vc1_decode_uninit(AVCodecContext *avctx) {
+    NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA VC1 decoder\n");
+
+    err = av_nvtegra_map_destroy(&ctx->common_map);
+    if (err < 0)
+        return err;
+
+    err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_vc1_decode_init(AVCodecContext *avctx) {
+    NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    AVHWDeviceContext      *hw_device_ctx;
+    AVNVTegraDeviceContext *device_hwctx;
+    uint32_t width_in_mbs, height_in_mbs, num_slices,
+             coloc_size, history_size, scratch_size, common_map_size;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA VC1 decoder\n");
+
+    width_in_mbs  = FFALIGN(avctx->coded_width,  MB_SIZE) / MB_SIZE;
+    height_in_mbs = FFALIGN(avctx->coded_height, MB_SIZE) / MB_SIZE;
+
+    num_slices = width_in_mbs * height_in_mbs;
+
+    /* Ignored: histogram map, size 0x400 */
+    ctx->core.pic_setup_off     = 0;
+    ctx->core.status_off        = FFALIGN(ctx->core.pic_setup_off     + sizeof(nvdec_vc1_pic_s),
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.cmdbuf_off        = FFALIGN(ctx->core.status_off        + sizeof(nvdec_status_s),
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.slice_offsets_off = FFALIGN(ctx->core.cmdbuf_off        + AV_NVTEGRA_MAP_ALIGN,
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.bitstream_off     = FFALIGN(ctx->core.slice_offsets_off + num_slices * sizeof(uint32_t),
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.input_map_size    = FFALIGN(ctx->core.bitstream_off     + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx),
+                                          0x1000);
+
+    ctx->core.max_cmdbuf_size    =  ctx->core.slice_offsets_off - ctx->core.cmdbuf_off;
+    ctx->core.max_num_slices     = (ctx->core.bitstream_off     - ctx->core.slice_offsets_off) / sizeof(uint32_t);
+    ctx->core.max_bitstream_size =  ctx->core.input_map_size    - ctx->core.bitstream_off;
+
+    err = ff_nvtegra_decode_init(avctx, &ctx->core);
+    if (err < 0)
+        goto fail;
+
+    hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data;
+    device_hwctx  = hw_device_ctx->hwctx;
+
+    coloc_size   = 3 * FFALIGN(width_in_mbs * FFALIGN(height_in_mbs, 2) * 64 - 63, AV_NVTEGRA_MAP_ALIGN);
+    history_size = FFALIGN(width_in_mbs, 2) * 0x300;
+    scratch_size = 0x400;
+
+    ctx->coloc_off   = 0;
+    ctx->history_off = FFALIGN(ctx->coloc_off   + coloc_size,   AV_NVTEGRA_MAP_ALIGN);
+    ctx->scratch_off = FFALIGN(ctx->history_off + history_size, AV_NVTEGRA_MAP_ALIGN);
+    common_map_size  = FFALIGN(ctx->scratch_off + scratch_size, 0x1000);
+
+    err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100,
+                                NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
+    if (err < 0)
+        goto fail;
+
+    mem = av_nvtegra_map_get_addr(&ctx->common_map);
+
+    memset(mem + ctx->coloc_off,   0, coloc_size);
+    memset(mem + ctx->history_off, 0, history_size);
+    memset(mem + ctx->scratch_off, 0, scratch_size);
+
+    ctx->history_size = history_size;
+    ctx->scratch_size = scratch_size;
+
+    return 0;
+
+fail:
+    nvtegra_vc1_decode_uninit(avctx);
+    return err;
+}
+
+static void nvtegra_vc1_prepare_frame_setup(nvdec_vc1_pic_s *setup, AVCodecContext *avctx,
+                                            NVTegraVC1DecodeContext *ctx)
+{
+    VC1Context     *v = avctx->priv_data;
+    MpegEncContext *s = &v->s;
+    AVFrame    *frame = s->current_picture_ptr->f;
+
+    /*
+     * Notes:
+     * - s->current_picture.f->linesize is unconsistently doubled for interlaced content
+     *   between I-frames and others, so s->current_pic_ptr is used
+     * - a lot of fields in this structure are unused by official software,
+     *   here we only set those
+     */
+    *setup = (nvdec_vc1_pic_s){
+        .scratch_pic_buffer_size = ctx->scratch_size,
+
+        .gptimer_timeout_value   = 0, /* Default value */
+
+        .bitstream_offset        = 0,
+
+        .FrameStride             = {
+            frame->linesize[0],
+            frame->linesize[1],
+        },
+
+        .luma_top_offset         = 0,
+        .luma_bot_offset         = 0,
+        .luma_frame_offset       = 0,
+        .chroma_top_offset       = 0,
+        .chroma_bot_offset       = 0,
+        .chroma_frame_offset     = 0,
+
+        .CodedWidth              = FFALIGN(avctx->coded_width,
+                                           (v->profile == PROFILE_ADVANCED) ? 1 : MB_SIZE),
+        .CodedHeight             = FFALIGN(avctx->coded_height,
+                                           (v->profile == PROFILE_ADVANCED) ? 1 : MB_SIZE),
+
+        .HistBufferSize          = ctx->history_size / 256,
+
+        .loopfilter              = s->loop_filter,
+
+        .output_memory_layout    = 0, /* NV12 */
+        .ref_memory_layout       = {
+            0, 0, /* NV12 */
+        },
+
+        .fastuvmc                = v->fastuvmc,
+
+        .FrameWidth              = FFALIGN(frame->width,
+                                           (v->profile == PROFILE_ADVANCED) ? 1 : MB_SIZE),
+        .FrameHeight             = FFALIGN(frame->height,
+                                           (v->profile == PROFILE_ADVANCED) ? 1 : MB_SIZE),
+
+        .profile                 = (v->profile != PROFILE_ADVANCED) ? 1 : 2,
+
+        .postprocflag            = v->postprocflag,
+        .pulldown                = v->broadcast,
+        .interlace               = v->interlace,
+
+        .tfcntrflag              = v->tfcntrflag,
+        .finterpflag             = v->finterpflag,
+
+        .tileFormat              = 0, /* TBL */
+
+        .psf                     = v->psf,
+
+        .multires                = v->multires,
+        .syncmarker              = v->resync_marker,
+        .rangered                = v->rangered,
+        .maxbframes              = s->max_b_frames,
+        .panscan_flag            = v->panscanflag,
+        .dquant                  = v->dquant,
+        .refdist_flag            = v->refdist_flag,
+        .quantizer               = v->quantizer_mode,
+        .overlap                 = v->overlap,
+        .vstransform             = v->vstransform,
+        .extended_mv             = v->extended_mv,
+        .extended_dmv            = v->extended_dmv,
+    };
+
+    if (v->profile == PROFILE_ADVANCED) {
+        setup->displayPara.enableTFOutput = 1;
+        setup->displayPara.VC1MapYFlag    = v->range_mapy_flag;
+        setup->displayPara.MapYValue      = v->range_mapy;
+        setup->displayPara.VC1MapUVFlag   = v->range_mapuv_flag;
+        setup->displayPara.MapUVValue     = v->range_mapuv;
+    } else if (v->rangered && v->rangeredfrm) {
+        setup->displayPara.enableTFOutput = 1;
+        setup->displayPara.VC1MapYFlag    = 1;
+        setup->displayPara.MapYValue      = 7;
+        setup->displayPara.VC1MapUVFlag   = 1;
+        setup->displayPara.MapUVValue     = 7;
+    }
+
+    if (v->range_mapy_flag || v->range_mapuv_flag) {
+        setup->displayPara.OutputBottom[0] = 0;
+        setup->displayPara.OutputBottom[1] = 0;
+        setup->displayPara.OutputStructure = v->interlace & 1;
+        setup->displayPara.OutStride       = frame->linesize[0] & 0xff;
+    }
+}
+
+static int nvtegra_vc1_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, VC1Context *v, NVTegraVC1DecodeContext *ctx,
+                                      AVFrame *cur_frame, AVFrame *prev_frame, AVFrame *next_frame)
+{
+    FrameDecodeData     *fdd = (FrameDecodeData *)cur_frame->private_ref->data;
+    FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv;
+    AVNVTegraMap  *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    int err;
+
+    err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC);
+    if (err < 0)
+        return err;
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID,
+                          AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, VC1));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS,
+                          AV_NVTEGRA_ENUM (NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE,     VC1) |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1)   |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON,     1));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX,
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx));
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET,
+                          input_map,        ctx->core.pic_setup_off,     NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET,
+                          input_map,        ctx->core.bitstream_off,     NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_SLICE_OFFSETS_BUF_OFFSET,
+                          input_map,        ctx->core.slice_offsets_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET,
+                          input_map,        ctx->core.status_off,        NVHOST_RELOC_TYPE_DEFAULT);
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_COLOC_DATA_OFFSET,
+                          &ctx->common_map, ctx->coloc_off,              NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_HISTORY_OFFSET,
+                          &ctx->common_map, ctx->history_off,            NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PIC_SCRATCH_BUF_OFFSET,
+                          &ctx->common_map, ctx->scratch_off,            NVHOST_RELOC_TYPE_DEFAULT);
+
+#define PUSH_FRAME(fr, offset) ({                                                           \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0   + offset * 4,           \
+                          av_nvtegra_frame_get_fbuf_map(fr), 0, NVHOST_RELOC_TYPE_DEFAULT); \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4,           \
+                          av_nvtegra_frame_get_fbuf_map(fr), fr->data[1] - fr->data[0],     \
+                          NVHOST_RELOC_TYPE_DEFAULT);                                       \
+})
+
+    PUSH_FRAME(cur_frame,  0);
+    PUSH_FRAME(prev_frame, 1);
+    PUSH_FRAME(next_frame, 2);
+
+    /*
+     * TODO: Bind a surface to the postproc output if we need range remapping
+    if (((v->profile != PROFILE_ADVANCED) && ((v->rangered != 0) || (v->rangeredfrm != 0))) ||
+            ((v->range_mapy_flag != 0) || (v->range_mapuv_flag != 0))) {
+        AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DISPLAY_BUF_LUMA_OFFSET,
+                              &output.luma, 0, NVHOST_RELOC_TYPE_DEFAULT);
+        AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DISPLAY_BUF_CHROMA_OFFSET,
+                              &output.chroma, 0, NVHOST_RELOC_TYPE_DEFAULT);
+    }
+     */
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE,
+                          AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE));
+
+    err = av_nvtegra_cmdbuf_end(cmdbuf);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_vc1_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) {
+    VC1Context                *v = avctx->priv_data;
+    MpegEncContext            *s = &v->s;
+    AVFrame               *frame = s->current_picture.f;
+    FrameDecodeData         *fdd = (FrameDecodeData *)frame->private_ref->data;
+    NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    FFNVTegraDecodeFrame *tf;
+    AVNVTegraMap *input_map;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Starting VC1-NVTEGRA frame with pixel format %s\n",
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    ctx->is_first_slice = true;
+
+    err = ff_nvtegra_start_frame(avctx, frame, &ctx->core);
+    if (err < 0)
+        return err;
+
+    tf = fdd->hwaccel_priv;
+    input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    nvtegra_vc1_prepare_frame_setup((nvdec_vc1_pic_s *)(mem + ctx->core.pic_setup_off), avctx, ctx);
+
+    ctx->prev_frame = ff_nvtegra_safe_get_ref(s->last_picture.f, frame);
+    ctx->next_frame = ff_nvtegra_safe_get_ref(s->next_picture.f, frame);
+
+    return 0;
+}
+
+static int nvtegra_vc1_end_frame(AVCodecContext *avctx) {
+    VC1Context                *v = avctx->priv_data;
+    NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    AVFrame               *frame = v->s.current_picture.f;
+    FrameDecodeData         *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame     *tf = fdd->hwaccel_priv;
+
+    nvdec_vc1_pic_s *setup;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Ending VC1-NVTEGRA frame with %u slices -> %u bytes\n",
+           ctx->core.num_slices, ctx->core.bitstream_len);
+
+    if (!tf || !ctx->core.num_slices)
+        return 0;
+
+    mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data);
+
+    setup = (nvdec_vc1_pic_s *)(mem + ctx->core.pic_setup_off);
+    setup->stream_len  = ctx->core.bitstream_len + sizeof(bitstream_end_sequence);
+    setup->slice_count = ctx->core.num_slices;
+
+    err = nvtegra_vc1_prepare_cmdbuf(&ctx->core.cmdbuf, v, ctx, frame,
+                                     ctx->prev_frame, ctx->next_frame);
+    if (err < 0)
+        return err;
+
+    return ff_nvtegra_end_frame(avctx, frame, &ctx->core, bitstream_end_sequence,
+                                sizeof(bitstream_end_sequence));
+}
+
+static int nvtegra_vc1_decode_slice(AVCodecContext *avctx, const uint8_t *buf,
+                                    uint32_t buf_size)
+{
+    VC1Context                *v = avctx->priv_data;
+    NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    AVFrame               *frame = v->s.current_picture.f;
+    FrameDecodeData         *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame     *tf = fdd->hwaccel_priv;
+    AVNVTegraMap      *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    nvdec_vc1_pic_s *setup;
+    uint8_t *mem;
+    enum VC1Code startcode;
+
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    setup = (nvdec_vc1_pic_s *)(mem + ctx->core.pic_setup_off);
+
+    if (ctx->is_first_slice) {
+        startcode = VC1_CODE_FRAME;
+
+        if (v->profile == PROFILE_ADVANCED &&
+                v->fcm == ILACE_FIELD && v->second_field)
+            startcode = VC1_CODE_FIELD;
+
+        /*
+         * Skip a dword if the bitstream already contains the startcode
+         * We could probably just not insert our startcode but this is what official code does
+         */
+        if ((buf_size >= 4) && (AV_RB32(buf) == startcode))
+            setup->bitstream_offset = 1;
+
+        AV_WB32(mem + ctx->core.bitstream_off + ctx->core.bitstream_len, startcode);
+        ctx->core.bitstream_len += 4;
+        ctx->is_first_slice = false;
+    }
+
+    return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, false);
+}
+
+#if CONFIG_VC1_NVTEGRA_HWACCEL
+const FFHWAccel ff_vc1_nvtegra_hwaccel = {
+    .p.name         = "vc1_nvtegra",
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_VC1,
+    .p.pix_fmt      = AV_PIX_FMT_NVTEGRA,
+    .start_frame    = &nvtegra_vc1_start_frame,
+    .end_frame      = &nvtegra_vc1_end_frame,
+    .decode_slice   = &nvtegra_vc1_decode_slice,
+    .init           = &nvtegra_vc1_decode_init,
+    .uninit         = &nvtegra_vc1_decode_uninit,
+    .frame_params   = &ff_nvtegra_frame_params,
+    .priv_data_size = sizeof(NVTegraVC1DecodeContext),
+    .caps_internal  = HWACCEL_CAP_ASYNC_SAFE,
+};
+#endif
+
+#if CONFIG_WMV3_NVTEGRA_HWACCEL
+const FFHWAccel ff_wmv3_nvtegra_hwaccel = {
+    .p.name         = "wmv3_nvtegra",
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_WMV3,
+    .p.pix_fmt      = AV_PIX_FMT_NVTEGRA,
+    .start_frame    = &nvtegra_vc1_start_frame,
+    .end_frame      = &nvtegra_vc1_end_frame,
+    .decode_slice   = &nvtegra_vc1_decode_slice,
+    .init           = &nvtegra_vc1_decode_init,
+    .uninit         = &nvtegra_vc1_decode_uninit,
+    .frame_params   = &ff_nvtegra_frame_params,
+    .priv_data_size = sizeof(NVTegraVC1DecodeContext),
+    .caps_internal  = HWACCEL_CAP_ASYNC_SAFE,
+};
+#endif
diff --git a/libavcodec/vc1dec.c b/libavcodec/vc1dec.c
index 3b5b016cf9..e907d26e14 100644
--- a/libavcodec/vc1dec.c
+++ b/libavcodec/vc1dec.c
@@ -71,6 +71,9 @@ static const enum AVPixelFormat vc1_hwaccel_pixfmt_list_420[] = {
 #endif
 #if CONFIG_VC1_VDPAU_HWACCEL
     AV_PIX_FMT_VDPAU,
+#endif
+#if CONFIG_VC1_NVTEGRA_HWACCEL
+    AV_PIX_FMT_NVTEGRA,
 #endif
     AV_PIX_FMT_YUV420P,
     AV_PIX_FMT_NONE
@@ -1415,6 +1418,9 @@ const FFCodec ff_vc1_decoder = {
 #endif
 #if CONFIG_VC1_VDPAU_HWACCEL
                         HWACCEL_VDPAU(vc1),
+#endif
+#if CONFIG_VC1_NVTEGRA_HWACCEL
+                        HWACCEL_NVTEGRA(vc1),
 #endif
                         NULL
                     },
@@ -1454,6 +1460,9 @@ const FFCodec ff_wmv3_decoder = {
 #endif
 #if CONFIG_WMV3_VDPAU_HWACCEL
                         HWACCEL_VDPAU(wmv3),
+#endif
+#if CONFIG_WMV3_NVTEGRA_HWACCEL
+                        HWACCEL_NVTEGRA(wmv3),
 #endif
                         NULL
                     },
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 12/16] nvtegra: add h264 hardware decoding
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (10 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 11/16] nvtegra: add vc1 " averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 13/16] nvtegra: add hevc " averne
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

Due to the hardware modus operandi, dpb references must stay at a fixed slot for their entire lifetime.

Signed-off-by: averne <averne381@gmail.com>
---
 configure                 |   2 +
 libavcodec/Makefile       |   1 +
 libavcodec/h264_slice.c   |   6 +-
 libavcodec/h264dec.c      |   3 +
 libavcodec/hwaccels.h     |   1 +
 libavcodec/nvtegra_h264.c | 506 ++++++++++++++++++++++++++++++++++++++
 6 files changed, 518 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/nvtegra_h264.c

diff --git a/configure b/configure
index 952e3aef7d..930cd3c9bd 100755
--- a/configure
+++ b/configure
@@ -3193,6 +3193,8 @@ h264_videotoolbox_hwaccel_deps="videotoolbox"
 h264_videotoolbox_hwaccel_select="h264_decoder"
 h264_vulkan_hwaccel_deps="vulkan"
 h264_vulkan_hwaccel_select="h264_decoder"
+h264_nvtegra_hwaccel_deps="nvtegra"
+h264_nvtegra_hwaccel_select="h264_decoder"
 hevc_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_HEVC"
 hevc_d3d11va_hwaccel_select="hevc_decoder"
 hevc_d3d11va2_hwaccel_deps="d3d11va DXVA_PicParams_HEVC"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index e102d03e7d..2cb0ec21a8 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1013,6 +1013,7 @@ OBJS-$(CONFIG_H264_VAAPI_HWACCEL)         += vaapi_h264.o
 OBJS-$(CONFIG_H264_VDPAU_HWACCEL)         += vdpau_h264.o
 OBJS-$(CONFIG_H264_VIDEOTOOLBOX_HWACCEL)  += videotoolbox.o
 OBJS-$(CONFIG_H264_VULKAN_HWACCEL)        += vulkan_decode.o vulkan_h264.o
+OBJS-$(CONFIG_H264_NVTEGRA_HWACCEL)       += nvtegra_h264.o
 OBJS-$(CONFIG_HEVC_D3D11VA_HWACCEL)       += dxva2_hevc.o
 OBJS-$(CONFIG_HEVC_DXVA2_HWACCEL)         += dxva2_hevc.o
 OBJS-$(CONFIG_HEVC_D3D12VA_HWACCEL)       += dxva2_hevc.o d3d12va_hevc.o
diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
index ce2c4caca1..dc4c5545c8 100644
--- a/libavcodec/h264_slice.c
+++ b/libavcodec/h264_slice.c
@@ -784,7 +784,8 @@ static enum AVPixelFormat get_pixel_format(H264Context *h, int force_callback)
                      CONFIG_H264_VAAPI_HWACCEL + \
                      CONFIG_H264_VIDEOTOOLBOX_HWACCEL + \
                      CONFIG_H264_VDPAU_HWACCEL + \
-                     CONFIG_H264_VULKAN_HWACCEL)
+                     CONFIG_H264_VULKAN_HWACCEL + \
+                     CONFIG_H264_NVTEGRA_HWACCEL)
     enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmt = pix_fmts;
 
     switch (h->ps.sps->bit_depth_luma) {
@@ -888,6 +889,9 @@ static enum AVPixelFormat get_pixel_format(H264Context *h, int force_callback)
 #endif
 #if CONFIG_H264_VAAPI_HWACCEL
             *fmt++ = AV_PIX_FMT_VAAPI;
+#endif
+#if CONFIG_H264_NVTEGRA_HWACCEL
+        *fmt++ = AV_PIX_FMT_NVTEGRA;
 #endif
             if (h->avctx->color_range == AVCOL_RANGE_JPEG)
                 *fmt++ = AV_PIX_FMT_YUVJ420P;
diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c
index fd23e367b4..51f53f07a9 100644
--- a/libavcodec/h264dec.c
+++ b/libavcodec/h264dec.c
@@ -1160,6 +1160,9 @@ const FFCodec ff_h264_decoder = {
 #endif
 #if CONFIG_H264_VULKAN_HWACCEL
                                HWACCEL_VULKAN(h264),
+#endif
+#if CONFIG_H264_NVTEGRA_HWACCEL
+                               HWACCEL_NVTEGRA(h264),
 #endif
                                NULL
                            },
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index a69e6a1977..463fd333a1 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -37,6 +37,7 @@ extern const struct FFHWAccel ff_h264_nvdec_hwaccel;
 extern const struct FFHWAccel ff_h264_vaapi_hwaccel;
 extern const struct FFHWAccel ff_h264_vdpau_hwaccel;
 extern const struct FFHWAccel ff_h264_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_h264_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_h264_vulkan_hwaccel;
 extern const struct FFHWAccel ff_hevc_d3d11va_hwaccel;
 extern const struct FFHWAccel ff_hevc_d3d11va2_hwaccel;
diff --git a/libavcodec/nvtegra_h264.c b/libavcodec/nvtegra_h264.c
new file mode 100644
index 0000000000..63073c44a6
--- /dev/null
+++ b/libavcodec/nvtegra_h264.c
@@ -0,0 +1,506 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdbool.h>
+#include <string.h>
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "h264dec.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraH264DecodeContext {
+    FFNVTegraDecodeContext core;
+
+    AVNVTegraMap common_map;
+    uint32_t coloc_off, mbhist_off, history_off;
+    uint32_t mbhist_size, history_size;
+
+    struct NVTegraH264RefFrame {
+        AVNVTegraMap *map;
+        uint32_t chroma_off;
+        int16_t frame_num;
+        int16_t pic_id;
+    } refs[16+1];
+
+    uint8_t ordered_dpb_map[16+1],
+        pic_id_map[16+1], scratch_ref, cur_frame;
+
+    uint64_t refs_mask, ordered_dpb_mask, pic_id_mask;
+} NVTegraH264DecodeContext;
+
+/* Size (width, height) of a macroblock */
+#define MB_SIZE 16
+
+static const uint8_t bitstream_end_sequence[16] = {
+    0x00, 0x00, 0x01, 0x0b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x0b, 0x00, 0x00, 0x00, 0x00,
+};
+
+static int nvtegra_h264_decode_uninit(AVCodecContext *avctx) {
+    NVTegraH264DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA H264 decoder\n");
+
+    err = av_nvtegra_map_destroy(&ctx->common_map);
+    if (err < 0)
+        return err;
+
+    err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_h264_decode_init(AVCodecContext *avctx) {
+    H264Context                *h = avctx->priv_data;
+    const SPS                *sps = h->ps.sps;
+    NVTegraH264DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    AVHWDeviceContext      *hw_device_ctx;
+    AVNVTegraDeviceContext *device_hwctx;
+    uint32_t aligned_width, aligned_height,
+             width_in_mbs, height_in_mbs, num_slices,
+             coloc_size, mbhist_size, history_size, common_map_size;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA H264 decoder\n");
+
+    aligned_width  = FFALIGN(avctx->coded_width,  MB_SIZE);
+    aligned_height = FFALIGN(avctx->coded_height, MB_SIZE);
+    width_in_mbs   = aligned_width  / MB_SIZE;
+    height_in_mbs  = aligned_height / MB_SIZE;
+
+    num_slices = width_in_mbs * height_in_mbs;
+
+    /* Ignored: histogram map, size 0x400 */
+    ctx->core.pic_setup_off     = 0;
+    ctx->core.status_off        = FFALIGN(ctx->core.pic_setup_off     + sizeof(nvdec_h264_pic_s),
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.cmdbuf_off        = FFALIGN(ctx->core.status_off        + sizeof(nvdec_status_s),
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.slice_offsets_off = FFALIGN(ctx->core.cmdbuf_off        + 3*AV_NVTEGRA_MAP_ALIGN,
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.bitstream_off     = FFALIGN(ctx->core.slice_offsets_off + num_slices * sizeof(uint32_t),
+                                          AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.input_map_size    = FFALIGN(ctx->core.bitstream_off     + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx),
+                                          0x1000);
+
+    ctx->core.max_cmdbuf_size    =  ctx->core.slice_offsets_off - ctx->core.cmdbuf_off;
+    ctx->core.max_num_slices     = (ctx->core.bitstream_off     - ctx->core.slice_offsets_off) / sizeof(uint32_t);
+    ctx->core.max_bitstream_size =  ctx->core.input_map_size    - ctx->core.bitstream_off;
+
+    err = ff_nvtegra_decode_init(avctx, &ctx->core);
+    if (err < 0)
+        goto fail;
+
+    hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data;
+    device_hwctx  = hw_device_ctx->hwctx;
+
+    coloc_size   = FFALIGN(FFALIGN(height_in_mbs, 2) * (width_in_mbs * 64) - 63, 0x100);
+    coloc_size  *= sps->ref_frame_count + 1; /* Max number of references frames, plus current frame */
+    mbhist_size  = FFALIGN(width_in_mbs * 104, 0x100);
+    history_size = FFALIGN(width_in_mbs * 0x200 + 0x1100, 0x200);
+
+    ctx->coloc_off   = 0;
+    ctx->mbhist_off  = FFALIGN(ctx->coloc_off   + coloc_size,   AV_NVTEGRA_MAP_ALIGN);
+    ctx->history_off = FFALIGN(ctx->mbhist_off  + mbhist_size,  AV_NVTEGRA_MAP_ALIGN);
+    common_map_size  = FFALIGN(ctx->history_off + history_size, 0x1000);
+
+    err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100,
+                                NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
+    if (err < 0)
+        goto fail;
+
+    ctx->mbhist_size  = mbhist_size;
+    ctx->history_size = history_size;
+
+    memset(ctx->ordered_dpb_map, -1, sizeof(ctx->ordered_dpb_map));
+    memset(ctx->pic_id_map,      -1, sizeof(ctx->pic_id_map));
+
+    return 0;
+
+fail:
+    nvtegra_h264_decode_uninit(avctx);
+    return err;
+}
+
+static inline int field_poc(int poc[2], bool top) {
+    return (poc[!top] != INT_MAX) ? poc[!top] : 0;
+}
+
+static void dpb_add(H264Context *h, nvdec_dpb_entry_s *dst,
+                    H264Picture *src, int pic_id)
+{
+    int marking;
+
+    marking = src->long_ref ? 2 : 1;
+    *dst = (nvdec_dpb_entry_s){
+        .index                = pic_id,
+        .col_idx              = pic_id,
+        .state                = src->reference,
+        .is_long_term         = src->long_ref,
+        .not_existing         = src->invalid_gap,
+        .is_field             = src->field_picture,
+        .top_field_marking    = (src->reference & PICT_TOP_FIELD)    ? marking : 0,
+        .bottom_field_marking = (src->reference & PICT_BOTTOM_FIELD) ? marking : 0,
+        .output_memory_layout = 0, /* NV12 */
+        .FieldOrderCnt        = {
+            field_poc(src->field_poc, true),
+            field_poc(src->field_poc, false),
+        },
+        .FrameIdx             = src->long_ref ? src->pic_id : src->frame_num,
+    };
+}
+
+static inline int find_slot(uint64_t *mask) {
+    int slot = ff_ctzll(~*mask);
+    *mask |= (1 << slot);
+    return slot;
+}
+
+static void nvtegra_h264_prepare_frame_setup(nvdec_h264_pic_s *setup, H264Context *h,
+                                             NVTegraH264DecodeContext *ctx)
+{
+    const PPS *pps = h->ps.pps;
+    const SPS *sps = h->ps.sps;
+
+    int dpb_size, i, j, diff;
+    H264Picture *refs [16+1] = {0};
+    uint8_t dpb_to_ref[16+1] = {0};
+
+    *setup = (nvdec_h264_pic_s){
+        .mbhist_buffer_size                     = ctx->mbhist_size,
+
+        .gptimer_timeout_value                  = 0, /* Default value */
+
+        .log2_max_pic_order_cnt_lsb_minus4      = FFMAX(sps->log2_max_poc_lsb - 4, 0),
+        .delta_pic_order_always_zero_flag       = sps->delta_pic_order_always_zero_flag,
+        .frame_mbs_only_flag                    = sps->frame_mbs_only_flag,
+
+        .PicWidthInMbs                          = h->mb_width,
+        .FrameHeightInMbs                       = h->mb_height,
+
+        .tileFormat                             = 0, /* TBL */
+        .gob_height                             = 0, /* GOB_2 */
+
+        .entropy_coding_mode_flag               = pps->cabac,
+        .pic_order_present_flag                 = pps->pic_order_present,
+        .num_ref_idx_l0_active_minus1           = pps->ref_count[0] - 1,
+        .num_ref_idx_l1_active_minus1           = pps->ref_count[1] - 1,
+        .deblocking_filter_control_present_flag = pps->deblocking_filter_parameters_present,
+        .redundant_pic_cnt_present_flag         = pps->redundant_pic_cnt_present,
+        .transform_8x8_mode_flag                = pps->transform_8x8_mode,
+
+        .pitch_luma                             = h->cur_pic_ptr->f->linesize[0],
+        .pitch_chroma                           = h->cur_pic_ptr->f->linesize[1],
+
+        .luma_top_offset                        = 0,
+        .luma_bot_offset                        = 0,
+        .luma_frame_offset                      = 0,
+        .chroma_top_offset                      = 0,
+        .chroma_bot_offset                      = 0,
+        .chroma_frame_offset                    = 0,
+
+        .HistBufferSize                         = ctx->history_size / 256,
+
+        .MbaffFrameFlag                         = sps->mb_aff && !FIELD_PICTURE(h),
+        .direct_8x8_inference_flag              = sps->direct_8x8_inference_flag,
+        .weighted_pred_flag                     = pps->weighted_pred,
+        .constrained_intra_pred_flag            = pps->constrained_intra_pred,
+        .ref_pic_flag                           = h->nal_ref_idc != 0,
+        .field_pic_flag                         = FIELD_PICTURE(h),
+        .bottom_field_flag                      = h->picture_structure == PICT_BOTTOM_FIELD,
+        .second_field                           = FIELD_PICTURE(h) && !h->first_field,
+        .log2_max_frame_num_minus4              = sps->log2_max_frame_num - 4,
+        .chroma_format_idc                      = sps->chroma_format_idc,
+        .pic_order_cnt_type                     = sps->poc_type,
+        .pic_init_qp_minus26                    = pps->init_qp - 26,
+        .chroma_qp_index_offset                 = pps->chroma_qp_index_offset[0],
+        .second_chroma_qp_index_offset          = pps->chroma_qp_index_offset[1],
+
+        .weighted_bipred_idc                    = pps->weighted_bipred_idc,
+        .frame_num                              = h->cur_pic_ptr->frame_num,
+        .output_memory_layout                   = 0, /* NV12 */
+
+        .CurrFieldOrderCnt                      = {
+            field_poc(h->cur_pic_ptr->field_poc, true),
+            field_poc(h->cur_pic_ptr->field_poc, false),
+        },
+
+        .lossless_ipred8x8_filter_enable        = true,
+        .qpprime_y_zero_transform_bypass_flag   = sps->transform_bypass,
+    };
+
+    /* Build concatenated ref list for this frame */
+    dpb_size = 0;
+    for (i = 0; i < h->short_ref_count; ++i)
+        refs[dpb_size++] = h->short_ref[i];
+
+    for (i = 0; i < 16; ++i)
+        if (h->long_ref[i])
+            refs[dpb_size++] = h->long_ref[i];
+
+    /* Remove stale references from our ref list */
+    for (i = 0; i < FF_ARRAY_ELEMS(ctx->refs); ++i) {
+        if (!(ctx->refs_mask & (1 << i)))
+            continue;
+
+        for (j = 0; j < dpb_size; ++j) {
+            if (av_nvtegra_frame_get_fbuf_map(refs[j]->f) == ctx->refs[i].map)
+                break;
+        }
+
+        if (j == dpb_size) {
+            ctx->pic_id_mask &= ~(1 << ctx->refs[i].pic_id);
+            ctx->pic_id_map[ctx->refs[i].pic_id] = -1;
+
+            ctx->refs_mask &= ~(1 << i);
+            ctx->refs[i].map = NULL;
+        } else {
+            dpb_to_ref[i] = j;
+        }
+    }
+
+    /* Update the ordered DPB mask */
+    for (i = 0; i < FF_ARRAY_ELEMS(ctx->ordered_dpb_map); ++i) {
+        if (!(ctx->ordered_dpb_mask & (1 << i)))
+            continue;
+        if (!ctx->refs[ctx->ordered_dpb_map[i]].map) {
+            ctx->ordered_dpb_mask &= ~(1 << i);
+            ctx->ordered_dpb_map[i] = -1;
+        }
+    }
+
+    /* Add new frames to the ordered DPB */
+    for (i = 0; i < FF_ARRAY_ELEMS(ctx->refs); ++i) {
+        if (!(ctx->refs_mask & (1 << i)))
+            continue;
+
+        for (j = 0; j < FF_ARRAY_ELEMS(ctx->ordered_dpb_map); ++j) {
+            if (ctx->ordered_dpb_map[j] == i)
+                break;
+        }
+
+        if (j == FF_ARRAY_ELEMS(ctx->ordered_dpb_map))
+            ctx->ordered_dpb_map[find_slot(&ctx->ordered_dpb_mask)] = i;
+    }
+
+    /*
+     * Add the current frame to our ref list
+     * In the case of interlaced video, the new frame can be the same as the last
+     */
+    if (ctx->refs[ctx->cur_frame].map != av_nvtegra_frame_get_fbuf_map(h->cur_pic_ptr->f)) {
+        /* Allocate a pic id for the current frame */
+        i = find_slot(&ctx->pic_id_mask);
+
+        /* Insert it in our ref list */
+        ctx->cur_frame = find_slot(&ctx->refs_mask);
+        ctx->pic_id_map[i] = ctx->cur_frame;
+        ctx->refs[ctx->cur_frame] = (struct NVTegraH264RefFrame){
+            .map        = av_nvtegra_frame_get_fbuf_map(h->cur_pic_ptr->f),
+            .chroma_off = h->cur_pic_ptr->f->data[1] - h->cur_pic_ptr->f->data[0],
+            .frame_num  = h->cur_pic_ptr->frame_num,
+            .pic_id     = i,
+        };
+    }
+
+    setup->CurrPicIdx = setup->CurrColIdx = ctx->refs[ctx->cur_frame].pic_id;
+
+    /* Find the temporally closest frame to be used as a scratch ref, or use the current one */
+    diff = INT_MAX;
+    ctx->scratch_ref = ctx->cur_frame;
+    for (i = 0; i < FF_ARRAY_ELEMS(ctx->ordered_dpb_map); ++i) {
+        j = ctx->ordered_dpb_map[i];
+        if ((ctx->ordered_dpb_mask & (1 << i)) &&
+                FFABS(h->cur_pic_ptr->frame_num - refs[dpb_to_ref[j]]->frame_num) < diff)
+            ctx->scratch_ref = j;
+    }
+
+    /* Build the NVDEC DPB */
+    for (i = 0; i < FF_ARRAY_ELEMS(setup->dpb); ++i) {
+        if (ctx->ordered_dpb_mask & (1 << i)) {
+            j = ctx->ordered_dpb_map[i];
+            dpb_add(h, &setup->dpb[i], refs[dpb_to_ref[j]], ctx->refs[j].pic_id);
+        }
+    }
+
+    memcpy(setup->WeightScale,       pps->scaling_matrix4,    sizeof(setup->WeightScale));
+    memcpy(setup->WeightScale8x8[0], pps->scaling_matrix8[0], sizeof(setup->WeightScale8x8[0]));
+    memcpy(setup->WeightScale8x8[1], pps->scaling_matrix8[3], sizeof(setup->WeightScale8x8[1]));
+}
+
+static int nvtegra_h264_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, H264Context *h,
+                                       AVFrame *cur_frame, NVTegraH264DecodeContext *ctx)
+{
+    FrameDecodeData     *fdd = (FrameDecodeData *)cur_frame->private_ref->data;
+    FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv;
+    AVNVTegraMap  *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    int err, i;
+
+    err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC);
+    if (err < 0)
+        return err;
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID,
+                          AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, H264));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS,
+                          AV_NVTEGRA_ENUM (NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE,     H264) |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1)    |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON,     1));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX,
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx));
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET,
+                          input_map,        ctx->core.pic_setup_off,     NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET,
+                          input_map,        ctx->core.bitstream_off,     NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_SLICE_OFFSETS_BUF_OFFSET,
+                          input_map,        ctx->core.slice_offsets_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET,
+                          input_map,        ctx->core.status_off,        NVHOST_RELOC_TYPE_DEFAULT);
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_COLOC_DATA_OFFSET,
+                          &ctx->common_map, ctx->coloc_off,              NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_H264_SET_MBHIST_BUF_OFFSET,
+                          &ctx->common_map, ctx->mbhist_off,             NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_HISTORY_OFFSET,
+                          &ctx->common_map, ctx->history_off,            NVHOST_RELOC_TYPE_DEFAULT);
+
+#define PUSH_FRAME(ref, offset) ({                                                \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0   + offset * 4, \
+                          ref.map, 0, NVHOST_RELOC_TYPE_DEFAULT);                 \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4, \
+                          ref.map, ref.chroma_off, NVHOST_RELOC_TYPE_DEFAULT);    \
+})
+
+    for (i = 0; i < 16 + 1; ++i) {
+        if (i == ctx->cur_frame)
+            PUSH_FRAME(ctx->refs[i], i);
+        else if (ctx->pic_id_mask & (1 << i))
+            PUSH_FRAME(ctx->refs[ctx->pic_id_map[i]], i);
+        else
+            PUSH_FRAME(ctx->refs[ctx->scratch_ref], i);
+    }
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE,
+                          AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE));
+
+    err = av_nvtegra_cmdbuf_end(cmdbuf);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_h264_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) {
+    H264Context                *h = avctx->priv_data;
+    AVFrame                *frame = h->cur_pic_ptr->f;
+    FrameDecodeData          *fdd = (FrameDecodeData *)frame->private_ref->data;
+    NVTegraH264DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    FFNVTegraDecodeFrame *tf;
+    AVNVTegraMap *input_map;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Starting H264-NVTEGRA frame with pixel format %s\n",
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    err = ff_nvtegra_start_frame(avctx, frame, &ctx->core);
+    if (err < 0)
+        return err;
+
+    tf = fdd->hwaccel_priv;
+    input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    nvtegra_h264_prepare_frame_setup((nvdec_h264_pic_s *)(mem + ctx->core.pic_setup_off), h, ctx);
+
+    return 0;
+}
+
+static int nvtegra_h264_end_frame(AVCodecContext *avctx) {
+    H264Context                *h = avctx->priv_data;
+    NVTegraH264DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    AVFrame                *frame = h->cur_pic_ptr->f;
+    FrameDecodeData          *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame      *tf = fdd->hwaccel_priv;
+
+    nvdec_h264_pic_s *setup;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Ending H264-NVTEGRA frame with %u slices -> %u bytes\n",
+           ctx->core.num_slices, ctx->core.bitstream_len);
+
+    if (!tf || !ctx->core.num_slices)
+        return 0;
+
+    mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data);
+
+    setup = (nvdec_h264_pic_s *)(mem + ctx->core.pic_setup_off);
+    setup->stream_len  = ctx->core.bitstream_len + sizeof(bitstream_end_sequence);
+    setup->slice_count = ctx->core.num_slices;
+
+    err = nvtegra_h264_prepare_cmdbuf(&ctx->core.cmdbuf, h, frame, ctx);
+    if (err < 0)
+        return err;
+
+    return ff_nvtegra_end_frame(avctx, frame, &ctx->core, bitstream_end_sequence,
+                                sizeof(bitstream_end_sequence));
+}
+
+static int nvtegra_h264_decode_slice(AVCodecContext *avctx, const uint8_t *buf,
+                                     uint32_t buf_size)
+{
+    H264Context *h = avctx->priv_data;
+    AVFrame *frame = h->cur_pic_ptr->f;
+
+    return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, true);
+}
+
+#if CONFIG_H264_NVTEGRA_HWACCEL
+const FFHWAccel ff_h264_nvtegra_hwaccel = {
+    .p.name         = "h264_nvtegra",
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_H264,
+    .p.pix_fmt      = AV_PIX_FMT_NVTEGRA,
+    .start_frame    = &nvtegra_h264_start_frame,
+    .end_frame      = &nvtegra_h264_end_frame,
+    .decode_slice   = &nvtegra_h264_decode_slice,
+    .init           = &nvtegra_h264_decode_init,
+    .uninit         = &nvtegra_h264_decode_uninit,
+    .frame_params   = &ff_nvtegra_frame_params,
+    .priv_data_size = sizeof(NVTegraH264DecodeContext),
+    .caps_internal  = HWACCEL_CAP_ASYNC_SAFE,
+};
+#endif
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 13/16] nvtegra: add hevc hardware decoding
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (11 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 12/16] nvtegra: add h264 " averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 14/16] nvtegra: add vp8 " averne
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

Same remark as for h264. In addition, a number of bits to be skipped must be calculated. This is done in the main header parsing routine, instead of reimplementing it in the hwaccel backend.
On the tegra 210, this is the only hardware codec that can output 10-bit data.

Signed-off-by: averne <averne381@gmail.com>
---
 configure                 |   2 +
 libavcodec/Makefile       |   1 +
 libavcodec/hevcdec.c      |  17 +-
 libavcodec/hevcdec.h      |   2 +
 libavcodec/hwaccels.h     |   1 +
 libavcodec/nvtegra_hevc.c | 633 ++++++++++++++++++++++++++++++++++++++
 6 files changed, 655 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/nvtegra_hevc.c

diff --git a/configure b/configure
index 930cd3c9bd..ba4c5287e3 100755
--- a/configure
+++ b/configure
@@ -3213,6 +3213,8 @@ hevc_videotoolbox_hwaccel_deps="videotoolbox"
 hevc_videotoolbox_hwaccel_select="hevc_decoder"
 hevc_vulkan_hwaccel_deps="vulkan"
 hevc_vulkan_hwaccel_select="hevc_decoder"
+hevc_nvtegra_hwaccel_deps="nvtegra"
+hevc_nvtegra_hwaccel_select="hevc_decoder"
 mjpeg_nvdec_hwaccel_deps="nvdec"
 mjpeg_nvdec_hwaccel_select="mjpeg_decoder"
 mjpeg_vaapi_hwaccel_deps="vaapi"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 2cb0ec21a8..de667b8a4b 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1022,6 +1022,7 @@ OBJS-$(CONFIG_HEVC_QSV_HWACCEL)           += qsvdec.o
 OBJS-$(CONFIG_HEVC_VAAPI_HWACCEL)         += vaapi_hevc.o h265_profile_level.o
 OBJS-$(CONFIG_HEVC_VDPAU_HWACCEL)         += vdpau_hevc.o h265_profile_level.o
 OBJS-$(CONFIG_HEVC_VULKAN_HWACCEL)        += vulkan_decode.o vulkan_hevc.o
+OBJS-$(CONFIG_HEVC_NVTEGRA_HWACCEL)       += nvtegra_hevc.o
 OBJS-$(CONFIG_MJPEG_NVDEC_HWACCEL)        += nvdec_mjpeg.o
 OBJS-$(CONFIG_MJPEG_VAAPI_HWACCEL)        += vaapi_mjpeg.o
 OBJS-$(CONFIG_MPEG1_NVDEC_HWACCEL)        += nvdec_mpeg12.o
diff --git a/libavcodec/hevcdec.c b/libavcodec/hevcdec.c
index b41dc46053..41bde57920 100644
--- a/libavcodec/hevcdec.c
+++ b/libavcodec/hevcdec.c
@@ -406,7 +406,8 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps)
                      CONFIG_HEVC_VAAPI_HWACCEL + \
                      CONFIG_HEVC_VIDEOTOOLBOX_HWACCEL + \
                      CONFIG_HEVC_VDPAU_HWACCEL + \
-                     CONFIG_HEVC_VULKAN_HWACCEL)
+                     CONFIG_HEVC_VULKAN_HWACCEL + \
+                     CONFIG_HEVC_NVTEGRA_HWACCEL)
     enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmt = pix_fmts;
 
     switch (sps->pix_fmt) {
@@ -436,6 +437,9 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps)
 #endif
 #if CONFIG_HEVC_VULKAN_HWACCEL
         *fmt++ = AV_PIX_FMT_VULKAN;
+#endif
+#if CONFIG_HEVC_NVTEGRA_HWACCEL
+        *fmt++ = AV_PIX_FMT_NVTEGRA;
 #endif
         break;
     case AV_PIX_FMT_YUV420P10:
@@ -463,6 +467,9 @@ static enum AVPixelFormat get_format(HEVCContext *s, const HEVCSPS *sps)
 #endif
 #if CONFIG_HEVC_NVDEC_HWACCEL
         *fmt++ = AV_PIX_FMT_CUDA;
+#endif
+#if CONFIG_HEVC_NVTEGRA_HWACCEL
+        *fmt++ = AV_PIX_FMT_NVTEGRA;
 #endif
         break;
     case AV_PIX_FMT_YUV444P:
@@ -598,6 +605,7 @@ static int hls_slice_header(HEVCContext *s)
     GetBitContext *gb = &s->HEVClc->gb;
     SliceHeader *sh   = &s->sh;
     int i, ret;
+    int nvidia_skip_len_start;
 
     // Coded parameters
     sh->first_slice_in_pic_flag = get_bits1(gb);
@@ -700,6 +708,8 @@ static int hls_slice_header(HEVCContext *s)
             return AVERROR_INVALIDDATA;
         }
 
+        nvidia_skip_len_start = get_bits_left(gb);
+
         // when flag is not present, picture is inferred to be output
         sh->pic_output_flag = 1;
         if (s->ps.pps->output_flag_present_flag)
@@ -753,6 +763,7 @@ static int hls_slice_header(HEVCContext *s)
             }
             sh->long_term_ref_pic_set_size = pos - get_bits_left(gb);
 
+            sh->nvidia_skip_length = nvidia_skip_len_start - get_bits_left(gb);
             if (s->ps.sps->sps_temporal_mvp_enabled_flag)
                 sh->slice_temporal_mvp_enabled_flag = get_bits1(gb);
             else
@@ -765,6 +776,7 @@ static int hls_slice_header(HEVCContext *s)
             sh->short_term_rps                  = NULL;
             sh->long_term_ref_pic_set_size      = 0;
             sh->slice_temporal_mvp_enabled_flag = 0;
+            sh->nvidia_skip_length              = nvidia_skip_len_start - get_bits_left(gb);
         }
 
         /* 8.3.1 */
@@ -3743,6 +3755,9 @@ const FFCodec ff_hevc_decoder = {
 #endif
 #if CONFIG_HEVC_VULKAN_HWACCEL
                                HWACCEL_VULKAN(hevc),
+#endif
+#if CONFIG_HEVC_NVTEGRA_HWACCEL
+                               HWACCEL_NVTEGRA(hevc),
 #endif
                                NULL
                            },
diff --git a/libavcodec/hevcdec.h b/libavcodec/hevcdec.h
index e82daf6679..2df96ed629 100644
--- a/libavcodec/hevcdec.h
+++ b/libavcodec/hevcdec.h
@@ -277,6 +277,8 @@ typedef struct SliceHeader {
     int16_t chroma_offset_l1[16][2];
 
     int slice_ctb_addr_rs;
+
+    int nvidia_skip_length;
 } SliceHeader;
 
 typedef struct CodingUnit {
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index 463fd333a1..77892dc2b2 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -47,6 +47,7 @@ extern const struct FFHWAccel ff_hevc_nvdec_hwaccel;
 extern const struct FFHWAccel ff_hevc_vaapi_hwaccel;
 extern const struct FFHWAccel ff_hevc_vdpau_hwaccel;
 extern const struct FFHWAccel ff_hevc_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_hevc_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_hevc_vulkan_hwaccel;
 extern const struct FFHWAccel ff_mjpeg_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mjpeg_vaapi_hwaccel;
diff --git a/libavcodec/nvtegra_hevc.c b/libavcodec/nvtegra_hevc.c
new file mode 100644
index 0000000000..97c585d755
--- /dev/null
+++ b/libavcodec/nvtegra_hevc.c
@@ -0,0 +1,633 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "hevcdec.h"
+#include "hevc_data.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraHEVCDecodeContext {
+    FFNVTegraDecodeContext core;
+
+    AVNVTegraMap common_map;
+    uint32_t tile_sizes_off, scaling_list_off,
+             coloc_off, filter_off;
+
+    unsigned int colmv_size, sao_offset, bsd_offset;
+    uint8_t pattern_id;
+
+    struct NVTegraHEVCRefFrame {
+        AVNVTegraMap *map;
+        uint32_t chroma_off;
+        int poc;
+    } refs[16+1];
+
+    uint64_t refs_mask;
+    int8_t scratch_ref;
+} NVTegraHEVCDecodeContext;
+
+/* Size (width, height) of a macroblock */
+#define MB_SIZE 16
+
+/* Maximum size (width, height) of a coding tree unit */
+#define CTU_SIZE 64
+
+#define FILTER_SIZE 480
+#define SAO_SIZE    3840
+#define BSD_SIZE    60
+
+static int nvtegra_hevc_decode_uninit(AVCodecContext *avctx) {
+    NVTegraHEVCDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA HEVC decoder\n");
+
+    err = av_nvtegra_map_destroy(&ctx->common_map);
+    if (err < 0)
+        return err;
+
+    err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_hevc_decode_init(AVCodecContext *avctx) {
+    NVTegraHEVCDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    AVHWDeviceContext      *hw_device_ctx;
+    AVNVTegraDeviceContext *device_hwctx;
+    uint32_t aligned_width, aligned_height,
+             coloc_size, filter_buffer_size, common_map_size;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA HEVC decoder\n");
+
+    ctx->core.pic_setup_off  = 0;
+    ctx->core.status_off     = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_hevc_pic_s),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.cmdbuf_off     = FFALIGN(ctx->core.status_off    + sizeof(nvdec_status_s),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->tile_sizes_off      = FFALIGN(ctx->core.cmdbuf_off    + 3*AV_NVTEGRA_MAP_ALIGN,
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->scaling_list_off    = FFALIGN(ctx->tile_sizes_off     + 0x900,
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.bitstream_off  = FFALIGN(ctx->scaling_list_off   + 0x400,
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx),
+                                       0x1000);
+
+    ctx->core.max_cmdbuf_size    = ctx->tile_sizes_off      - ctx->core.cmdbuf_off;
+    ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off;
+
+    err = ff_nvtegra_decode_init(avctx, &ctx->core);
+    if (err < 0)
+        goto fail;
+
+    hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data;
+    device_hwctx  = hw_device_ctx->hwctx;
+
+    aligned_width      = FFALIGN(avctx->coded_width,  CTU_SIZE);
+    aligned_height     = FFALIGN(avctx->coded_height, CTU_SIZE);
+    coloc_size         = (aligned_width * aligned_height) + (aligned_width * aligned_height / MB_SIZE);
+    filter_buffer_size = (FILTER_SIZE + SAO_SIZE + BSD_SIZE) * aligned_height;
+
+    ctx->coloc_off  = 0;
+    ctx->filter_off = FFALIGN(ctx->coloc_off  + coloc_size,         AV_NVTEGRA_MAP_ALIGN);
+    common_map_size = FFALIGN(ctx->filter_off + filter_buffer_size, 0x1000);
+
+    err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100,
+                                NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
+    if (err < 0)
+        goto fail;
+
+    ctx->colmv_size = aligned_width * aligned_height / 16;
+    ctx->sao_offset =  FILTER_SIZE             * aligned_height;
+    ctx->bsd_offset = (FILTER_SIZE + SAO_SIZE) * aligned_height;
+
+    return 0;
+
+fail:
+    nvtegra_hevc_decode_uninit(avctx);
+    return err;
+}
+
+static void nvtegra_hevc_set_scaling_list(nvdec_hevc_scaling_list_s *list, HEVCContext *s) {
+    const ScalingList *sl = s->ps.pps->scaling_list_data_present_flag ?
+                            &s->ps.pps->scaling_list : &s->ps.sps->scaling_list;
+
+    int i;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(list->ScalingListDCCoeff16x16); ++i)
+        list->ScalingListDCCoeff16x16[i] = sl->sl_dc[0][i];
+    for (i = 0; i < FF_ARRAY_ELEMS(list->ScalingListDCCoeff32x32); ++i)
+        list->ScalingListDCCoeff32x32[i] = sl->sl_dc[1][i * 3];
+
+    for (i = 0; i < 6; ++i)
+        memcpy(list->ScalingList4x4[i],   sl->sl[0][i], 16);
+    for (i = 0; i < 6; ++i)
+        memcpy(list->ScalingList8x8[i],   sl->sl[1][i], 64);
+    for (i = 0; i < 6; ++i)
+        memcpy(list->ScalingList16x16[i], sl->sl[2][i], 64);
+    memcpy(list->ScalingList32x32[0], sl->sl[3][0], 64);
+    memcpy(list->ScalingList32x32[1], sl->sl[3][3], 64);
+}
+
+static void nvtegra_hevc_set_tile_sizes(uint16_t *sizes, HEVCContext *s) {
+    const HEVCPPS *pps = s->ps.pps;
+    const HEVCSPS *sps = s->ps.sps;
+
+    int i, j, sum;
+
+    uint16_t *tile_thing = sizes + 0x380;
+    if (pps->uniform_spacing_flag) {
+        for (i = 0; i < pps->num_tile_columns; ++i)
+            *tile_thing++ = (i + 1) * sps->ctb_width  / pps->num_tile_columns <<
+                (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4);
+        for (i = 0; i < pps->num_tile_rows; ++i)
+            *tile_thing++ = (i + 1) * sps->ctb_height / pps->num_tile_rows    <<
+                (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4);
+    } else {
+        sum = 0;
+        for (i = 0; i < pps->num_tile_columns; ++i)
+            *tile_thing++ = (sum += pps->column_width[i]) <<
+                (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4);
+        sum = 0;
+        for (i = 0; i < pps->num_tile_rows; ++i)
+            *tile_thing++ = (sum += pps->row_height[i])   <<
+                (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4);
+    }
+
+    for (i = 0; i < pps->num_tile_rows; ++i) {
+        for (j = 0; j < pps->num_tile_columns; ++j) {
+            sizes[0] = pps->column_width[j];
+            sizes[1] = pps->row_height  [i];
+            sizes += 2;
+        }
+    }
+}
+
+static enum RPSType find_ref_rps_type(HEVCContext *s, HEVCFrame *f) {
+    int i;
+
+#define CHECK_SET(set) ({                       \
+    for (i = 0; i < s->rps[set].nb_refs; ++i) { \
+        if (s->rps[set].ref[i] == f)            \
+            return set;                         \
+    }                                           \
+})
+
+    CHECK_SET(ST_CURR_BEF);
+    CHECK_SET(ST_CURR_AFT);
+    CHECK_SET(ST_FOLL);
+    CHECK_SET(LT_CURR);
+    CHECK_SET(LT_FOLL);
+
+    return -1;
+}
+
+static inline int find_slot(uint64_t *mask) {
+    int slot = ff_ctzll(~*mask);
+    *mask |= (1 << slot);
+    return slot;
+}
+
+static void nvtegra_hevc_prepare_frame_setup(nvdec_hevc_pic_s *setup, AVCodecContext *avctx,
+                                             AVFrame *frame, NVTegraHEVCDecodeContext *ctx)
+{
+    FrameDecodeData          *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame      *tf = fdd->hwaccel_priv;
+    AVNVTegraMap       *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    AVHWFramesContext *frames_ctx = (AVHWFramesContext *)avctx->hw_frames_ctx->data;
+    HEVCContext                *s = avctx->priv_data;
+    SliceHeader               *sh = &s->sh;
+    const HEVCPPS            *pps = s->ps.pps;
+    const HEVCSPS            *sps = s->ps.sps;
+
+    HEVCFrame *fr;
+    enum RPSType st;
+    uint8_t *mem;
+    uint16_t *tile_sizes;
+    int output_mode, cur_frame, scratch_ref_diff_poc, i, j;
+    int8_t dpb_to_ref[16+1] = {0}, ref_to_dpb[16+1] = {0};
+    int8_t rps_stcurrbef[8], rps_stcurraft[8], rps_ltcurr[8];
+
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    /* Match source color depth regardless of colorspace */
+    /* TODO: Dithered down 8-bit post-processing (needs DISPLAY_BUF mappings) */
+    if (frames_ctx->sw_format == AV_PIX_FMT_P010 && sps->bit_depth == 10) {
+        output_mode = 1;                /* 10-bit bt709 */
+    } else {
+        if (sps->bit_depth == 8) {
+            output_mode = 0;            /* 8-bit bt709 */
+        } else {
+            switch (avctx->colorspace) {
+                default:
+                case AVCOL_SPC_BT709:
+                    output_mode = 2;    /* 10-bit bt709 truncated to 8-bit */
+                    break;
+                case AVCOL_SPC_BT2020_CL:
+                case AVCOL_SPC_BT2020_NCL:
+                    output_mode = 3;    /* 10-bit bt2020 truncated to 8-bit */
+                    break;
+            }
+        }
+    }
+
+    *setup = (nvdec_hevc_pic_s){
+        .gptimer_timeout_value                       = 0, /* Default value */
+
+        .tileformat                                  = 0, /* TBL */
+        .gob_height                                  = 0, /* GOB_2 */
+
+        .sw_start_code_e                             = 1,
+        .disp_output_mode                            = output_mode,
+
+        /* Divide by two if we are decoding to a 2bpp surface */
+        .framestride                                 = {
+            s->frame->linesize[0] / ((output_mode == 1) ? 2 : 1),
+            s->frame->linesize[1] / ((output_mode == 1) ? 2 : 1),
+        },
+
+        .colMvBuffersize                             = ctx->colmv_size / 256,
+        .HevcSaoBufferOffset                         = ctx->sao_offset / 256,
+        .HevcBsdCtrlOffset                           = ctx->bsd_offset / 256,
+
+        .pic_width_in_luma_samples                   = sps->width,
+        .pic_height_in_luma_samples                  = sps->height,
+
+        .chroma_format_idc                           = 1, /* 4:2:0 */
+        .bit_depth_luma                              = sps->bit_depth,
+        .bit_depth_chroma                            = sps->bit_depth,
+        .log2_min_luma_coding_block_size             = sps->log2_min_cb_size,
+        .log2_max_luma_coding_block_size             = sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size,
+        .log2_min_transform_block_size               = sps->log2_min_tb_size,
+        .log2_max_transform_block_size               = sps->log2_max_trafo_size,
+
+        .max_transform_hierarchy_depth_inter         = sps->max_transform_hierarchy_depth_inter,
+        .max_transform_hierarchy_depth_intra         = sps->max_transform_hierarchy_depth_intra,
+        .scalingListEnable                           = sps->scaling_list_enable_flag,
+        .amp_enable_flag                             = sps->amp_enabled_flag,
+        .sample_adaptive_offset_enabled_flag         = sps->sao_enabled,
+        .pcm_enabled_flag                            = sps->pcm_enabled_flag,
+        .pcm_sample_bit_depth_luma                   = sps->pcm_enabled_flag ? sps->pcm.bit_depth                : 0,
+        .pcm_sample_bit_depth_chroma                 = sps->pcm_enabled_flag ? sps->pcm.bit_depth_chroma         : 0,
+        .log2_min_pcm_luma_coding_block_size         = sps->pcm_enabled_flag ? sps->pcm.log2_min_pcm_cb_size     : 0,
+        .log2_max_pcm_luma_coding_block_size         = sps->pcm_enabled_flag ? sps->pcm.log2_max_pcm_cb_size     : 0,
+        .pcm_loop_filter_disabled_flag               = sps->pcm_enabled_flag ? sps->pcm.loop_filter_disable_flag : 0,
+        .sps_temporal_mvp_enabled_flag               = sps->sps_temporal_mvp_enabled_flag,
+        .strong_intra_smoothing_enabled_flag         = sps->sps_strong_intra_smoothing_enable_flag,
+
+        .dependent_slice_segments_enabled_flag       = pps->dependent_slice_segments_enabled_flag,
+        .output_flag_present_flag                    = pps->output_flag_present_flag,
+        .num_extra_slice_header_bits                 = pps->num_extra_slice_header_bits,
+        .sign_data_hiding_enabled_flag               = pps->sign_data_hiding_flag,
+        .cabac_init_present_flag                     = pps->cabac_init_present_flag,
+        .num_ref_idx_l0_default_active               = pps->num_ref_idx_l0_default_active,
+        .num_ref_idx_l1_default_active               = pps->num_ref_idx_l1_default_active,
+        .init_qp                                     = pps->pic_init_qp_minus26 + 26 + (sps->bit_depth - 8) * 6,
+        .constrained_intra_pred_flag                 = pps->constrained_intra_pred_flag,
+        .transform_skip_enabled_flag                 = pps->transform_skip_enabled_flag,
+        .cu_qp_delta_enabled_flag                    = pps->cu_qp_delta_enabled_flag,
+        .diff_cu_qp_delta_depth                      = pps->diff_cu_qp_delta_depth,
+
+        .pps_cb_qp_offset                            = pps->cb_qp_offset,
+        .pps_cr_qp_offset                            = pps->cr_qp_offset,
+        .pps_beta_offset                             = pps->beta_offset,
+        .pps_tc_offset                               = pps->tc_offset,
+        .pps_slice_chroma_qp_offsets_present_flag    = pps->pic_slice_level_chroma_qp_offsets_present_flag,
+        .weighted_pred_flag                          = pps->weighted_pred_flag,
+        .weighted_bipred_flag                        = pps->weighted_bipred_flag,
+        .transquant_bypass_enabled_flag              = pps->transquant_bypass_enable_flag,
+        .tiles_enabled_flag                          = pps->tiles_enabled_flag,
+        .entropy_coding_sync_enabled_flag            = pps->entropy_coding_sync_enabled_flag,
+        .num_tile_columns                            = pps->tiles_enabled_flag ? pps->num_tile_columns : 0,
+        .num_tile_rows                               = pps->tiles_enabled_flag ? pps->num_tile_rows    : 0,
+        .loop_filter_across_tiles_enabled_flag       = pps->tiles_enabled_flag ? pps->loop_filter_across_tiles_enabled_flag : 0,
+        .loop_filter_across_slices_enabled_flag      = pps->seq_loop_filter_across_slices_enabled_flag,
+        .deblocking_filter_control_present_flag      = pps->deblocking_filter_control_present_flag,
+        .deblocking_filter_override_enabled_flag     = pps->deblocking_filter_override_enabled_flag,
+        .pps_deblocking_filter_disabled_flag         = pps->disable_dbf,
+        .lists_modification_present_flag             = pps->lists_modification_present_flag,
+        .log2_parallel_merge_level                   = pps->log2_parallel_merge_level,
+        .slice_segment_header_extension_present_flag = pps->slice_header_extension_present_flag,
+
+        .num_ref_frames                              = ff_hevc_frame_nb_refs(s),
+
+        .IDR_picture_flag                            = IS_IDR(s),
+        .RAP_picture_flag                            = IS_IRAP(s),
+        .pattern_id                                  = ((output_mode == 0) || (output_mode == 1)) ? 2 : ctx->pattern_id, /* Disable/enable dithering */
+        .sw_hdr_skip_length                          = sh->nvidia_skip_length,
+
+        /*
+         * Ignored in official code
+        .separate_colour_plane_flag                  = sps->separate_colour_plane_flag,
+        .log2_max_pic_order_cnt_lsb_minus4           = sps->log2_max_poc_lsb - 4,
+        .num_short_term_ref_pic_sets                 = sps->nb_st_rps,
+        .num_long_term_ref_pics_sps                  = sps->num_long_term_ref_pics_sps,
+        .num_delta_pocs_of_rps_idx                   = s->sh.short_term_rps ? s->sh.short_term_rps->rps_idx_num_delta_pocs : 0,
+        .long_term_ref_pics_present_flag             = sps->long_term_ref_pics_present_flag,
+        .num_bits_short_term_ref_pics_in_slice       = sh->short_term_ref_pic_set_size;
+        */
+    };
+
+    /* Remove stale references from our ref list */
+    for (i = 0; i < FF_ARRAY_ELEMS(ctx->refs); ++i) {
+        if (!(ctx->refs_mask & (1 << i)))
+            continue;
+
+        for (j = 0; j < FF_ARRAY_ELEMS(s->DPB); ++j) {
+            if (s->DPB[j].frame && s->DPB[j].poc == ctx->refs[i].poc)
+                break;
+        }
+
+        if (j == FF_ARRAY_ELEMS(s->DPB) || s->DPB[j].poc == s->ref->poc ||
+                !(s->DPB[j].flags & (HEVC_FRAME_FLAG_SHORT_REF | HEVC_FRAME_FLAG_LONG_REF)))
+            ctx->refs_mask &= ~(1 << i), ctx->refs[i].map = NULL;
+        else
+            dpb_to_ref[i] = j, ref_to_dpb[j] = i;
+    }
+
+    /* Try to find a valid reference */
+    ctx->scratch_ref = -1, scratch_ref_diff_poc = 0;
+    for (i = 0; i < FF_ARRAY_ELEMS(ctx->refs); ++i) {
+        if (!(ctx->refs_mask & (1 << i)) ||
+                (ctx->refs[i].map == av_nvtegra_frame_get_fbuf_map(s->frame)))
+            continue;
+
+        st = find_ref_rps_type(s, &s->DPB[dpb_to_ref[i]]);
+        if ((st != ST_CURR_BEF) && (st != ST_CURR_AFT) && (st != LT_CURR))
+            continue;
+
+        ctx->scratch_ref     = i;
+        scratch_ref_diff_poc = av_clip_int8(s->ref->poc - s->DPB[dpb_to_ref[i]].poc);
+        break;
+    }
+
+    /* Add the current frame to our ref list */
+    setup->curr_pic_idx = cur_frame = find_slot(&ctx->refs_mask);
+    ctx->refs[cur_frame] = (struct NVTegraHEVCRefFrame){
+        .map        = av_nvtegra_frame_get_fbuf_map(s->frame),
+        .chroma_off = s->frame->data[1] - s->frame->data[0],
+        .poc        = s->ref->poc,
+    };
+
+    /* If there were no valid references, use the current frame */
+    if (ctx->scratch_ref == -1)
+        ctx->scratch_ref = cur_frame;
+
+    /* Fill the POC metadata */
+    for (i = 0; i < FF_ARRAY_ELEMS(setup->RefDiffPicOrderCnts); ++i) {
+        if (i == cur_frame)
+            continue;
+
+        if (ctx->refs_mask & (1 << i)) {
+            fr = &s->DPB[dpb_to_ref[i]];
+            setup->RefDiffPicOrderCnts[i] = av_clip_int8(s->ref->poc - fr->poc);
+            setup->longtermflag |= !!(fr->flags & HEVC_FRAME_FLAG_LONG_REF) << (15 - i);
+        } else {
+            setup->RefDiffPicOrderCnts[i] = scratch_ref_diff_poc;
+        }
+    }
+
+#define RPS_TO_DPB_IDX(set, array) ({                  \
+    for (i = 0; i < s->rps[set].nb_refs; ++i) {        \
+        for (j = 0; j < FF_ARRAY_ELEMS(s->DPB); ++j) { \
+            if (s->rps[set].ref[i] == &s->DPB[j]) {    \
+                array[i] = ref_to_dpb[j];              \
+                break;                                 \
+            }                                          \
+        }                                              \
+    }                                                  \
+})
+
+    RPS_TO_DPB_IDX(ST_CURR_BEF, rps_stcurrbef);
+    RPS_TO_DPB_IDX(ST_CURR_AFT, rps_stcurraft);
+    RPS_TO_DPB_IDX(LT_CURR,     rps_ltcurr);
+
+#define FILL_REFLIST(list, set, array) ({         \
+    int len = FFMIN(s->rps[set].nb_refs, 16 - i); \
+    memcpy(&setup->list[i], array, len);          \
+    i += len;                                     \
+})
+
+    /* Fill the RPS metadata */
+    if (s->rps[ST_CURR_BEF].nb_refs + s->rps[ST_CURR_AFT].nb_refs + s->rps[LT_CURR].nb_refs) {
+        for (i = 0; i < 16;) {
+            FILL_REFLIST(initreflistidxl0, ST_CURR_BEF, rps_stcurrbef);
+            FILL_REFLIST(initreflistidxl0, ST_CURR_AFT, rps_stcurraft);
+            FILL_REFLIST(initreflistidxl0, LT_CURR,     rps_ltcurr);
+        }
+
+        for (i = 0; i < 16;) {
+            FILL_REFLIST(initreflistidxl1, ST_CURR_AFT, rps_stcurraft);
+            FILL_REFLIST(initreflistidxl1, ST_CURR_BEF, rps_stcurrbef);
+            FILL_REFLIST(initreflistidxl1, LT_CURR,     rps_ltcurr);
+        }
+    }
+
+    ctx->pattern_id ^= 1;
+
+    if (sps->scaling_list_enable_flag)
+        nvtegra_hevc_set_scaling_list((nvdec_hevc_scaling_list_s *)(mem + ctx->scaling_list_off), s);
+
+    tile_sizes = (uint16_t *)(mem + ctx->tile_sizes_off);
+    if (pps->tiles_enabled_flag) {
+        nvtegra_hevc_set_tile_sizes(tile_sizes, s);
+    } else {
+        tile_sizes[0] = pps->column_width[0];
+        tile_sizes[1] = pps->row_height  [0];
+    }
+}
+
+static int nvtegra_hevc_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, HEVCContext *s,
+                                       NVTegraHEVCDecodeContext *ctx, AVFrame *cur_frame)
+{
+    FrameDecodeData     *fdd = (FrameDecodeData *)cur_frame->private_ref->data;
+    FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv;
+    AVNVTegraMap  *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    int i;
+    int err;
+
+    err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC);
+    if (err < 0)
+        return err;
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID,
+                          AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, HEVC));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS,
+                          AV_NVTEGRA_ENUM (NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE,     HEVC) |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1)    |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON,     1));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX,
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx));
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET,
+                          input_map,        ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET,
+                          input_map,        ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET,
+                          input_map,        ctx->core.status_off,    NVHOST_RELOC_TYPE_DEFAULT);
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_HEVC_SET_SCALING_LIST_OFFSET,
+                          input_map,        ctx->scaling_list_off,   NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_HEVC_SET_TILE_SIZES_OFFSET,
+                          input_map,        ctx->tile_sizes_off,     NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_HEVC_SET_FILTER_BUFFER_OFFSET,
+                          &ctx->common_map, ctx->filter_off,         NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_COLOC_DATA_OFFSET,
+                          &ctx->common_map, ctx->coloc_off,          NVHOST_RELOC_TYPE_DEFAULT);
+
+#define PUSH_FRAME(ref, offset) ({                                                \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0   + offset * 4, \
+                          ref.map, 0, NVHOST_RELOC_TYPE_DEFAULT);                 \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4, \
+                          ref.map, ref.chroma_off, NVHOST_RELOC_TYPE_DEFAULT);    \
+})
+
+    for (i = 0; i < FF_ARRAY_ELEMS(ctx->refs); ++i) {
+        if (ctx->refs_mask & (1 << i))
+            PUSH_FRAME(ctx->refs[i],                i);
+        else
+            PUSH_FRAME(ctx->refs[ctx->scratch_ref], i);
+    }
+
+    /* TODO: Dithered 8-bit post-processing, binding to DISPLAY_BUF */
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE,
+                          AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE));
+
+    err = av_nvtegra_cmdbuf_end(cmdbuf);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_hevc_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) {
+    HEVCContext                *s = avctx->priv_data;
+    AVFrame                *frame = s->frame;
+    FrameDecodeData          *fdd = (FrameDecodeData *)frame->private_ref->data;
+    NVTegraHEVCDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    FFNVTegraDecodeFrame *tf;
+    AVNVTegraMap *input_map;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Starting HEVC-NVTEGRA frame with pixel format %s\n",
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    err = ff_nvtegra_start_frame(avctx, frame, &ctx->core);
+    if (err < 0)
+        return err;
+
+    tf = fdd->hwaccel_priv;
+    input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    nvtegra_hevc_prepare_frame_setup((nvdec_hevc_pic_s *)(mem + ctx->core.pic_setup_off),
+                                     avctx, frame, ctx);
+
+    return 0;
+}
+
+static int nvtegra_hevc_end_frame(AVCodecContext *avctx) {
+    HEVCContext                *s = avctx->priv_data;
+    NVTegraHEVCDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    AVFrame                *frame = s->ref->frame;
+    FrameDecodeData          *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame      *tf = fdd->hwaccel_priv;
+
+    nvdec_hevc_pic_s *setup;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Ending HEVC-NVTEGRA frame with %u slices -> %u bytes\n",
+           ctx->core.num_slices, ctx->core.bitstream_len);
+
+    if (!tf || !ctx->core.num_slices)
+        return 0;
+
+    mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data);
+
+    setup = (nvdec_hevc_pic_s *)(mem + ctx->core.pic_setup_off);
+    setup->stream_len = ctx->core.bitstream_len;
+
+    err = nvtegra_hevc_prepare_cmdbuf(&ctx->core.cmdbuf, s, ctx, frame);
+    if (err < 0)
+        return err;
+
+    return ff_nvtegra_end_frame(avctx, frame, &ctx->core, NULL, 0);
+}
+
+static int nvtegra_hevc_decode_slice(AVCodecContext *avctx, const uint8_t *buf,
+                                     uint32_t buf_size)
+{
+    HEVCContext                *s = avctx->priv_data;
+    AVFrame                *frame = s->ref->frame;
+    FrameDecodeData          *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame      *tf = fdd->hwaccel_priv;
+    AVNVTegraMap       *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    NVTegraHEVCDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    uint8_t *mem;
+
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    /*
+     * Official code adds a 4-byte 00000001 startcode,
+     * though decoding was observed to work without it
+     */
+    AV_WB8(mem + ctx->core.bitstream_off + ctx->core.bitstream_len, 0);
+    ctx->core.bitstream_len += 1;
+
+    return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, AV_RB24(buf) != 1);
+}
+
+#if CONFIG_HEVC_NVTEGRA_HWACCEL
+const FFHWAccel ff_hevc_nvtegra_hwaccel = {
+    .p.name         = "hevc_nvtegra",
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_HEVC,
+    .p.pix_fmt      = AV_PIX_FMT_NVTEGRA,
+    .start_frame    = &nvtegra_hevc_start_frame,
+    .end_frame      = &nvtegra_hevc_end_frame,
+    .decode_slice   = &nvtegra_hevc_decode_slice,
+    .init           = &nvtegra_hevc_decode_init,
+    .uninit         = &nvtegra_hevc_decode_uninit,
+    .frame_params   = &ff_nvtegra_frame_params,
+    .priv_data_size = sizeof(NVTegraHEVCDecodeContext),
+    .caps_internal  = HWACCEL_CAP_ASYNC_SAFE,
+};
+#endif
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 14/16] nvtegra: add vp8 hardware decoding
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (12 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 13/16] nvtegra: add hevc " averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 15/16] nvtegra: add vp9 " averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 16/16] nvtegra: add mjpeg " averne
  15 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

Signed-off-by: averne <averne381@gmail.com>
---
 configure                |   2 +
 libavcodec/Makefile      |   1 +
 libavcodec/hwaccels.h    |   1 +
 libavcodec/nvtegra_vp8.c | 334 +++++++++++++++++++++++++++++++++++++++
 libavcodec/vp8.c         |   6 +
 5 files changed, 344 insertions(+)
 create mode 100644 libavcodec/nvtegra_vp8.c

diff --git a/configure b/configure
index ba4c5287e3..a347337dd4 100755
--- a/configure
+++ b/configure
@@ -3277,6 +3277,8 @@ vp8_nvdec_hwaccel_deps="nvdec"
 vp8_nvdec_hwaccel_select="vp8_decoder"
 vp8_vaapi_hwaccel_deps="vaapi"
 vp8_vaapi_hwaccel_select="vp8_decoder"
+vp8_nvtegra_hwaccel_deps="nvtegra"
+vp8_nvtegra_hwaccel_select="vp8_decoder"
 vp9_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_VP9"
 vp9_d3d11va_hwaccel_select="vp9_decoder"
 vp9_d3d11va2_hwaccel_deps="d3d11va DXVA_PicParams_VP9"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index de667b8a4b..89c5986aab 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1053,6 +1053,7 @@ OBJS-$(CONFIG_VC1_VDPAU_HWACCEL)          += vdpau_vc1.o
 OBJS-$(CONFIG_VC1_NVTEGRA_HWACCEL)        += nvtegra_vc1.o
 OBJS-$(CONFIG_VP8_NVDEC_HWACCEL)          += nvdec_vp8.o
 OBJS-$(CONFIG_VP8_VAAPI_HWACCEL)          += vaapi_vp8.o
+OBJS-$(CONFIG_VP8_NVTEGRA_HWACCEL)        += nvtegra_vp8.o
 OBJS-$(CONFIG_VP9_D3D11VA_HWACCEL)        += dxva2_vp9.o
 OBJS-$(CONFIG_VP9_DXVA2_HWACCEL)          += dxva2_vp9.o
 OBJS-$(CONFIG_VP9_D3D12VA_HWACCEL)        += dxva2_vp9.o d3d12va_vp9.o
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index 77892dc2b2..7d43aeccec 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -80,6 +80,7 @@ extern const struct FFHWAccel ff_vc1_vdpau_hwaccel;
 extern const struct FFHWAccel ff_vc1_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_vp8_nvdec_hwaccel;
 extern const struct FFHWAccel ff_vp8_vaapi_hwaccel;
+extern const struct FFHWAccel ff_vp8_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_vp9_d3d11va_hwaccel;
 extern const struct FFHWAccel ff_vp9_d3d11va2_hwaccel;
 extern const struct FFHWAccel ff_vp9_d3d12va_hwaccel;
diff --git a/libavcodec/nvtegra_vp8.c b/libavcodec/nvtegra_vp8.c
new file mode 100644
index 0000000000..a3aa69fe62
--- /dev/null
+++ b/libavcodec/nvtegra_vp8.c
@@ -0,0 +1,334 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "vp8.h"
+#include "vp8data.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraVP8DecodeContext {
+    FFNVTegraDecodeContext core;
+
+    AVNVTegraMap common_map;
+    uint32_t prob_data_off, history_off;
+    uint32_t history_size;
+
+    AVFrame *golden_frame, *altref_frame,
+            *previous_frame;
+} NVTegraVP8DecodeContext;
+
+/* Size (width, height) of a macroblock */
+#define MB_SIZE 16
+
+static int nvtegra_vp8_decode_uninit(AVCodecContext *avctx) {
+    NVTegraVP8DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA VP8 decoder\n");
+
+    err = av_nvtegra_map_destroy(&ctx->common_map);
+    if (err < 0)
+        return err;
+
+    err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static void nvtegra_vp8_init_probs(void *p) {
+    int i, j, k;
+    uint8_t *ptr = p;
+
+    memset(p, 0, 0x4cc);
+
+    for (i = 0; i < 4; ++i) {
+        for (j = 0; j < 8; ++j) {
+            for (k = 0; k < 3; ++k) {
+                memcpy(ptr, vp8_token_default_probs[i][j][k], NUM_DCT_TOKENS - 1);
+                ptr += NUM_DCT_TOKENS;
+            }
+        }
+    }
+
+    memcpy(ptr, vp8_pred16x16_prob_inter, sizeof(vp8_pred16x16_prob_inter));
+    ptr += 4;
+
+    memcpy(ptr, vp8_pred8x8c_prob_inter, sizeof(vp8_pred8x8c_prob_inter));
+    ptr += 4;
+
+    for (i = 0; i < 2; ++i) {
+        memcpy(ptr, vp8_mv_default_prob[i], 19);
+        ptr += 20;
+    }
+}
+
+static int nvtegra_vp8_decode_init(AVCodecContext *avctx) {
+    NVTegraVP8DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    AVHWDeviceContext      *hw_device_ctx;
+    AVNVTegraDeviceContext *device_hwctx;
+    uint32_t width_in_mbs, common_map_size;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA VP8 decoder\n");
+
+    /* Ignored: histogram map, size 0x400 */
+    ctx->core.pic_setup_off  = 0;
+    ctx->core.status_off     = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_vp8_pic_s),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.cmdbuf_off     = FFALIGN(ctx->core.status_off    + sizeof(nvdec_status_s),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.bitstream_off  = FFALIGN(ctx->core.cmdbuf_off    + AV_NVTEGRA_MAP_ALIGN,
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx),
+                                       0x1000);
+
+    ctx->core.max_cmdbuf_size    = ctx->core.bitstream_off  - ctx->core.cmdbuf_off;
+    ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off;
+
+    err = ff_nvtegra_decode_init(avctx, &ctx->core);
+    if (err < 0)
+        goto fail;
+
+    hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data;
+    device_hwctx  = hw_device_ctx->hwctx;
+
+    width_in_mbs = FFALIGN(avctx->coded_width, MB_SIZE) / MB_SIZE;
+    ctx->history_size = width_in_mbs * 0x200;
+
+    ctx->prob_data_off = 0;
+    ctx->history_off   = FFALIGN(ctx->prob_data_off + 0x4b00,            AV_NVTEGRA_MAP_ALIGN);
+    common_map_size    = FFALIGN(ctx->history_off   + ctx->history_size, 0x1000);
+
+    err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100,
+                                NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
+    if (err < 0)
+        goto fail;
+
+    nvtegra_vp8_init_probs((uint8_t *)av_nvtegra_map_get_addr(&ctx->common_map) + ctx->prob_data_off);
+
+    return 0;
+
+fail:
+    nvtegra_vp8_decode_uninit(avctx);
+    return err;
+}
+
+static void nvtegra_vp8_prepare_frame_setup(nvdec_vp8_pic_s *setup, VP8Context *h,
+                                            NVTegraVP8DecodeContext *ctx)
+{
+    *setup = (nvdec_vp8_pic_s){
+        .gptimer_timeout_value            = 0, /* Default value */
+
+        .FrameWidth                       = FFALIGN(h->framep[VP8_FRAME_CURRENT]->tf.f->width,  MB_SIZE),
+        .FrameHeight                      = FFALIGN(h->framep[VP8_FRAME_CURRENT]->tf.f->height, MB_SIZE),
+
+        .keyFrame                         = h->keyframe,
+        .version                          = h->profile,
+
+        .tileFormat                       = 0, /* TBL */
+        .gob_height                       = 0, /* GOB_2 */
+
+        .errorConcealOn                   = 1,
+
+        .firstPartSize                    = h->header_partition_size,
+
+        .HistBufferSize                   = ctx->history_size / 256,
+
+        .FrameStride                      = {
+            h->framep[VP8_FRAME_CURRENT]->tf.f->linesize[0] / MB_SIZE,
+            h->framep[VP8_FRAME_CURRENT]->tf.f->linesize[1] / MB_SIZE,
+        },
+
+        .luma_top_offset                  = 0,
+        .luma_bot_offset                  = 0,
+        .luma_frame_offset                = 0,
+        .chroma_top_offset                = 0,
+        .chroma_bot_offset                = 0,
+        .chroma_frame_offset              = 0,
+
+        .current_output_memory_layout     = 0,           /* NV12 */
+        .output_memory_layout             = { 0, 0, 0 }, /* NV12 */
+
+        /* ???: Official code sets this value at 0x8d (reserved1[0]), so just set both */
+        .segmentation_feature_data_update = h->segmentation.enabled ? h->segmentation.update_feature_data : 0,
+        .reserved1[0]                     = h->segmentation.enabled ? h->segmentation.update_feature_data : 0,
+
+        .resultValue                      = 0,
+    };
+}
+
+static int nvtegra_vp8_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, VP8Context *h,
+                                      NVTegraVP8DecodeContext *ctx, AVFrame *cur_frame)
+{
+    FrameDecodeData     *fdd = (FrameDecodeData *)cur_frame->private_ref->data;
+    FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv;
+    AVNVTegraMap  *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    int err;
+
+    err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC);
+    if (err < 0)
+        return err;
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID,
+                          AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, VP8));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS,
+                          AV_NVTEGRA_ENUM(NVC5B0_SET_CONTROL_PARAMS,  CODEC_TYPE,     VP8) |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1)   |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON,     1));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX,
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx));
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET,
+                          input_map,        ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET,
+                          input_map,        ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET,
+                          input_map,        ctx->core.status_off,    NVHOST_RELOC_TYPE_DEFAULT);
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP8_SET_PROB_DATA_OFFSET,
+                          &ctx->common_map, ctx->prob_data_off,      NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_HISTORY_OFFSET,
+                          &ctx->common_map, ctx->history_off,        NVHOST_RELOC_TYPE_DEFAULT);
+
+#define PUSH_FRAME(fr, offset) ({                                                           \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0   + offset * 4,           \
+                          av_nvtegra_frame_get_fbuf_map(fr), 0, NVHOST_RELOC_TYPE_DEFAULT); \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4,           \
+                          av_nvtegra_frame_get_fbuf_map(fr), fr->data[1] - fr->data[0],     \
+                          NVHOST_RELOC_TYPE_DEFAULT);                                       \
+})
+
+    PUSH_FRAME(ctx->golden_frame,   0);
+    PUSH_FRAME(ctx->altref_frame,   1);
+    PUSH_FRAME(ctx->previous_frame, 2);
+    PUSH_FRAME(cur_frame,           3);
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE,
+                          AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE));
+
+    err = av_nvtegra_cmdbuf_end(cmdbuf);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_vp8_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) {
+    VP8Context                *h = avctx->priv_data;
+    AVFrame               *frame = h->framep[VP8_FRAME_CURRENT]->tf.f;
+    FrameDecodeData         *fdd = (FrameDecodeData *)frame->private_ref->data;
+    NVTegraVP8DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    FFNVTegraDecodeFrame *tf;
+    AVNVTegraMap *input_map;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Starting VP8-NVTEGRA frame with pixel format %s\n",
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    err = ff_nvtegra_start_frame(avctx, frame, &ctx->core);
+    if (err < 0)
+        return err;
+
+    tf = fdd->hwaccel_priv;
+    input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    nvtegra_vp8_prepare_frame_setup((nvdec_vp8_pic_s *)(mem + ctx->core.pic_setup_off), h, ctx);
+
+#define SAFE_REF(type) (h->framep[(type)] ?: h->framep[VP8_FRAME_CURRENT])
+    ctx->golden_frame   = ff_nvtegra_safe_get_ref(SAFE_REF(VP8_FRAME_GOLDEN)  ->tf.f, frame);
+    ctx->altref_frame   = ff_nvtegra_safe_get_ref(SAFE_REF(VP8_FRAME_ALTREF)  ->tf.f, frame);
+    ctx->previous_frame = ff_nvtegra_safe_get_ref(SAFE_REF(VP8_FRAME_PREVIOUS)->tf.f, frame);
+
+    return 0;
+}
+
+static int nvtegra_vp8_end_frame(AVCodecContext *avctx) {
+    VP8Context                *h = avctx->priv_data;
+    NVTegraVP8DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    AVFrame               *frame = h->framep[VP8_FRAME_CURRENT]->tf.f;
+    FrameDecodeData         *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame     *tf = fdd->hwaccel_priv;
+
+    nvdec_vp8_pic_s *setup;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Ending VP8-NVTEGRA frame with %u slices -> %u bytes\n",
+           ctx->core.num_slices, ctx->core.bitstream_len);
+
+    if (!tf || !ctx->core.num_slices)
+        return 0;
+
+    mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data);
+
+    setup = (nvdec_vp8_pic_s *)(mem + ctx->core.pic_setup_off);
+    setup->VLDBufferSize = ctx->core.bitstream_len;
+
+    err = nvtegra_vp8_prepare_cmdbuf(&ctx->core.cmdbuf, h, ctx, frame);
+    if (err < 0)
+        return err;
+
+    return ff_nvtegra_end_frame(avctx, frame, &ctx->core, NULL, 0);
+}
+
+static int nvtegra_vp8_decode_slice(AVCodecContext *avctx, const uint8_t *buf,
+                                    uint32_t buf_size)
+{
+    VP8Context  *h = avctx->priv_data;
+    AVFrame *frame = h->framep[VP8_FRAME_CURRENT]->tf.f;
+
+    int offset = h->keyframe ? 10 : 3;
+
+    return ff_nvtegra_decode_slice(avctx, frame, buf + offset, buf_size - offset, false);
+}
+
+#if CONFIG_VP8_NVTEGRA_HWACCEL
+const FFHWAccel ff_vp8_nvtegra_hwaccel = {
+    .p.name         = "vp8_nvtegra",
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_VP8,
+    .p.pix_fmt      = AV_PIX_FMT_NVTEGRA,
+    .start_frame    = &nvtegra_vp8_start_frame,
+    .end_frame      = &nvtegra_vp8_end_frame,
+    .decode_slice   = &nvtegra_vp8_decode_slice,
+    .init           = &nvtegra_vp8_decode_init,
+    .uninit         = &nvtegra_vp8_decode_uninit,
+    .frame_params   = &ff_nvtegra_frame_params,
+    .priv_data_size = sizeof(NVTegraVP8DecodeContext),
+    .caps_internal  = HWACCEL_CAP_ASYNC_SAFE,
+};
+#endif
diff --git a/libavcodec/vp8.c b/libavcodec/vp8.c
index 8e91613068..8b4676e3ff 100644
--- a/libavcodec/vp8.c
+++ b/libavcodec/vp8.c
@@ -184,6 +184,9 @@ static enum AVPixelFormat get_pixel_format(VP8Context *s)
 #endif
 #if CONFIG_VP8_NVDEC_HWACCEL
         AV_PIX_FMT_CUDA,
+#endif
+#if CONFIG_VP8_NVTEGRA_HWACCEL
+        AV_PIX_FMT_NVTEGRA,
 #endif
         AV_PIX_FMT_YUV420P,
         AV_PIX_FMT_NONE,
@@ -2972,6 +2975,9 @@ const FFCodec ff_vp8_decoder = {
 #endif
 #if CONFIG_VP8_NVDEC_HWACCEL
                                HWACCEL_NVDEC(vp8),
+#endif
+#if CONFIG_VP8_NVTEGRA_HWACCEL
+                               HWACCEL_NVTEGRA(vp8),
 #endif
                                NULL
                            },
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 15/16] nvtegra: add vp9 hardware decoding
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (13 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 14/16] nvtegra: add vp8 " averne
@ 2024-05-30 19:43 ` averne
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 16/16] nvtegra: add mjpeg " averne
  15 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

This hardware block was based on/licensed from the hantro implementation (as evidenced by the identical structures). Relevant V4L2 kernel code was referenced when implementing backward entropy updates.

Signed-off-by: averne <averne381@gmail.com>
---
 configure                |   2 +
 libavcodec/Makefile      |   1 +
 libavcodec/hwaccels.h    |   1 +
 libavcodec/nvtegra_vp9.c | 665 +++++++++++++++++++++++++++++++++++++++
 libavcodec/vp9.c         |  10 +-
 5 files changed, 678 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/nvtegra_vp9.c

diff --git a/configure b/configure
index a347337dd4..3fe948d9ab 100755
--- a/configure
+++ b/configure
@@ -3295,6 +3295,8 @@ vp9_vdpau_hwaccel_deps="vdpau VdpPictureInfoVP9"
 vp9_vdpau_hwaccel_select="vp9_decoder"
 vp9_videotoolbox_hwaccel_deps="videotoolbox"
 vp9_videotoolbox_hwaccel_select="vp9_decoder"
+vp9_nvtegra_hwaccel_deps="nvtegra"
+vp9_nvtegra_hwaccel_select="vp9_decoder"
 wmv3_d3d11va_hwaccel_select="vc1_d3d11va_hwaccel"
 wmv3_d3d11va2_hwaccel_select="vc1_d3d11va2_hwaccel"
 wmv3_d3d12va_hwaccel_select="vc1_d3d12va_hwaccel"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 89c5986aab..914995558e 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1061,6 +1061,7 @@ OBJS-$(CONFIG_VP9_NVDEC_HWACCEL)          += nvdec_vp9.o
 OBJS-$(CONFIG_VP9_VAAPI_HWACCEL)          += vaapi_vp9.o
 OBJS-$(CONFIG_VP9_VDPAU_HWACCEL)          += vdpau_vp9.o
 OBJS-$(CONFIG_VP9_VIDEOTOOLBOX_HWACCEL)   += videotoolbox_vp9.o
+OBJS-$(CONFIG_VP9_NVTEGRA_HWACCEL)        += nvtegra_vp9.o
 OBJS-$(CONFIG_VP8_QSV_HWACCEL)            += qsvdec.o
 
 # Objects duplicated from other libraries for shared builds
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index 7d43aeccec..a3babfc309 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -89,6 +89,7 @@ extern const struct FFHWAccel ff_vp9_nvdec_hwaccel;
 extern const struct FFHWAccel ff_vp9_vaapi_hwaccel;
 extern const struct FFHWAccel ff_vp9_vdpau_hwaccel;
 extern const struct FFHWAccel ff_vp9_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_vp9_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_wmv3_d3d11va_hwaccel;
 extern const struct FFHWAccel ff_wmv3_d3d11va2_hwaccel;
 extern const struct FFHWAccel ff_wmv3_d3d12va_hwaccel;
diff --git a/libavcodec/nvtegra_vp9.c b/libavcodec/nvtegra_vp9.c
new file mode 100644
index 0000000000..a0cca1a5a4
--- /dev/null
+++ b/libavcodec/nvtegra_vp9.c
@@ -0,0 +1,665 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdbool.h>
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "vp9data.h"
+#include "vp9dec.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraVP9DecodeContext {
+    FFNVTegraDecodeContext core;
+
+    uint32_t prob_tab_off;
+
+    AVNVTegraMap common_map;
+    uint32_t segment_rw1_off, segment_rw2_off, tile_sizes_off, filter_off,
+             col_mvrw1_off, col_mvrw2_off, ctx_counter_off;
+
+    bool prev_show_frame;
+
+    AVFrame *refs[3];
+} NVTegraVP9DecodeContext;
+
+/* Size (width, height) of a macroblock */
+#define MB_SIZE 16
+
+/* Maximum size (width, height) of a superblock */
+#define SB_SIZE 64
+
+#define CEILDIV(a, b) (((a) + (b) - 1) / (b))
+
+/* Prediction modes aren't layed out in the same order in ffmpeg's defaults than in hardware */
+static const uint8_t pmconv[] = { 2, 0, 1, 3, 4, 5, 6, 8, 7, 9 };
+
+static int nvtegra_vp9_decode_uninit(AVCodecContext *avctx) {
+    NVTegraVP9DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA VP9 decoder\n");
+
+    err = av_nvtegra_map_destroy(&ctx->common_map);
+    if (err < 0)
+        return err;
+
+    err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_vp9_decode_init(AVCodecContext *avctx) {
+    NVTegraVP9DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    AVHWDeviceContext      *hw_device_ctx;
+    AVNVTegraDeviceContext *device_hwctx;
+    uint32_t aligned_width, aligned_height, max_sb_size,
+             segment_rw_size, filter_size, col_mvrw_size, ctx_counter_size,
+             common_map_size;
+    uint8_t *mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA VP9 decoder\n");
+
+    ctx->core.pic_setup_off  = 0;
+    ctx->core.status_off     = FFALIGN(ctx->core.pic_setup_off + sizeof(nvdec_vp9_pic_s),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.cmdbuf_off     = FFALIGN(ctx->core.status_off    + sizeof(nvdec_status_s),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->prob_tab_off        = FFALIGN(ctx->core.cmdbuf_off    + 2*AV_NVTEGRA_MAP_ALIGN,
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.bitstream_off  = FFALIGN(ctx->prob_tab_off       + sizeof(nvdec_vp9EntropyProbs_t),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx),
+                                       0x1000);
+
+    ctx->core.max_cmdbuf_size    = ctx->prob_tab_off        - ctx->core.cmdbuf_off;
+    ctx->core.max_bitstream_size = ctx->core.input_map_size - ctx->core.bitstream_off;
+
+    err = ff_nvtegra_decode_init(avctx, &ctx->core);
+    if (err < 0)
+        goto fail;
+
+    hw_device_ctx = (AVHWDeviceContext *)ctx->core.hw_device_ref->data;
+    device_hwctx  = hw_device_ctx->hwctx;
+
+    aligned_width    = FFALIGN(avctx->coded_width,  MB_SIZE);
+    aligned_height   = FFALIGN(avctx->coded_height, MB_SIZE);
+    max_sb_size      = CEILDIV(aligned_width, 64) * CEILDIV(aligned_height, 64);
+    segment_rw_size  = FFALIGN(max_sb_size * 32, 0x100);
+    filter_size      = FFALIGN(avctx->height, 64) * 988;
+    col_mvrw_size    = max_sb_size * 1024;
+    ctx_counter_size = FFALIGN(sizeof(nvdec_vp9EntropyCounts_t), 0x100);
+
+    ctx->segment_rw1_off = 0;
+    ctx->segment_rw2_off = FFALIGN(ctx->segment_rw1_off + segment_rw_size,  AV_NVTEGRA_MAP_ALIGN);
+    ctx->tile_sizes_off  = FFALIGN(ctx->segment_rw2_off + segment_rw_size,  AV_NVTEGRA_MAP_ALIGN);
+    ctx->filter_off      = FFALIGN(ctx->tile_sizes_off  + 0x700,            AV_NVTEGRA_MAP_ALIGN);
+    ctx->col_mvrw1_off   = FFALIGN(ctx->filter_off      + filter_size,      AV_NVTEGRA_MAP_ALIGN);
+    ctx->col_mvrw2_off   = FFALIGN(ctx->col_mvrw1_off   + col_mvrw_size,    AV_NVTEGRA_MAP_ALIGN);
+    ctx->ctx_counter_off = FFALIGN(ctx->col_mvrw2_off   + col_mvrw_size,    AV_NVTEGRA_MAP_ALIGN);
+    common_map_size      = FFALIGN(ctx->ctx_counter_off + ctx_counter_size, 0x1000);
+
+    err = av_nvtegra_map_create(&ctx->common_map, &device_hwctx->nvdec_channel, common_map_size, 0x100,
+                                NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
+    if (err < 0)
+        goto fail;
+
+    mem = av_nvtegra_map_get_addr(&ctx->common_map);
+
+    memset(mem + ctx->segment_rw1_off, 0, segment_rw_size);
+    memset(mem + ctx->segment_rw2_off, 0, segment_rw_size);
+
+    memset(mem + ctx->tile_sizes_off, 0, 0x700);
+    ((uint16_t *)(mem + ctx->tile_sizes_off))[0x37a] = 9;
+    ((uint16_t *)(mem + ctx->tile_sizes_off))[0x37b] = 1;
+
+    memset(mem + ctx->col_mvrw1_off, 0, col_mvrw_size);
+    memset(mem + ctx->col_mvrw2_off, 0, col_mvrw_size);
+
+    memset(mem + ctx->ctx_counter_off, 0, sizeof(nvdec_vp9EntropyCounts_t));
+
+    return 0;
+
+fail:
+    nvtegra_vp9_decode_uninit(avctx);
+    return err;
+}
+
+static void nvtegra_vp9_init_probs(nvdec_vp9EntropyProbs_t *probs) {
+    int i, j;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(probs->kf_bmode_prob); ++i) {
+        for (j = 0; j < FF_ARRAY_ELEMS(probs->kf_bmode_prob[0]); ++j) {
+            memcpy(probs->kf_bmode_prob[i][j], ff_vp9_default_kf_ymode_probs[pmconv[i]][pmconv[j]], 8);
+            probs->kf_bmode_probB[i][j][0]   = ff_vp9_default_kf_ymode_probs[pmconv[i]][pmconv[j]][8];
+        }
+        memcpy(probs->kf_uv_mode_prob[i], ff_vp9_default_kf_uvmode_probs[pmconv[i]], 8);
+        probs->kf_uv_mode_probB[i][0]   = ff_vp9_default_kf_uvmode_probs[pmconv[i]][8];
+    }
+}
+
+static void nvtegra_vp9_update_probs(nvdec_vp9EntropyProbs_t *probs,
+                                     VP9Context *s, bool init)
+{
+    ProbContext *p = &s->prob.p;
+
+    int i, j, k, l;
+
+    if (init) {
+        memset(probs, 0, sizeof(nvdec_vp9EntropyProbs_t));
+        nvtegra_vp9_init_probs(probs);
+    }
+
+    for (i = 0; i < FF_ARRAY_ELEMS(probs->ref_pred_probs); ++i)
+        probs->ref_pred_probs[i] = *s->intra_pred_data[i];
+
+    memcpy(probs->mb_segment_tree_probs,  s->s.h.segmentation.prob,      sizeof(probs->mb_segment_tree_probs));
+    if (s->s.h.segmentation.temporal)
+        memcpy(probs->segment_pred_probs, s->s.h.segmentation.pred_prob, sizeof(probs->segment_pred_probs));
+    else
+        memset(probs->segment_pred_probs, 0xff, sizeof(probs->segment_pred_probs));
+
+    /* Ignored by official software: ref_scores, prob_comppred */
+
+    for (i = 0; i < FF_ARRAY_ELEMS(probs->a.inter_mode_prob); ++i)
+        memcpy(probs->a.inter_mode_prob[i], p->mv_mode[i], 3);
+
+    memcpy(probs->a.intra_inter_prob, p->intra, sizeof(probs->a.intra_inter_prob));
+
+    for (i = 0; i < FF_ARRAY_ELEMS(probs->a.uv_mode_prob); ++i) {
+        memcpy(probs->a.uv_mode_prob[i], p->uv_mode[pmconv[i]], 8);
+        probs->a.uv_mode_probB[i][0]   = p->uv_mode[pmconv[i]][8];
+    }
+
+    for (i = 0; i < FF_ARRAY_ELEMS(probs->a.tx8x8_prob); ++i) {
+        memcpy(probs->a.tx8x8_prob  [i], &p->tx8p [i], 1);
+        memcpy(probs->a.tx16x16_prob[i],  p->tx16p[i], 2);
+        memcpy(probs->a.tx32x32_prob[i],  p->tx32p[i], 3);
+    }
+
+    for (i = 0; i < FF_ARRAY_ELEMS(probs->a.sb_ymode_prob); ++i) {
+        memcpy(probs->a.sb_ymode_prob[i], p->y_mode[i], 8);
+        probs->a.sb_ymode_probB[i][0]   = p->y_mode[i][8];
+    }
+
+    for (i = 0; i < 4; ++i) {
+        for (j = 0; j < 4; ++j) {
+            memcpy(probs->a.partition_prob[0][4*(3-i)+j],
+                   &ff_vp9_default_kf_partition_probs[i][j], 3);
+            memcpy(probs->a.partition_prob[1][4*(3-i)+j], &p->partition[i][j], 3);
+        }
+    }
+
+    memcpy(probs->a.switchable_interp_prob, p->filter, sizeof(probs->a.switchable_interp_prob));
+    memcpy(probs->a.comp_inter_prob,        p->comp,   sizeof(probs->a.comp_inter_prob));
+    memcpy(probs->a.mbskip_probs,           p->skip,   sizeof(probs->a.mbskip_probs));
+
+    memcpy(probs->a.nmvc.joints, p->mv_joint, 3);
+    for (i = 0; i < FF_ARRAY_ELEMS(p->mv_comp); ++i) {
+        probs->a.nmvc.sign     [i]       = p->mv_comp[i].sign;
+        probs->a.nmvc.class0   [i][0]    = p->mv_comp[i].class0;
+        probs->a.nmvc.class0_hp[i]       = p->mv_comp[i].class0_hp;
+        probs->a.nmvc.hp       [i]       = p->mv_comp[i].hp;
+        memcpy(probs->a.nmvc.fp       [i], p->mv_comp[i].fp,        3);
+        memcpy(probs->a.nmvc.classes  [i], p->mv_comp[i].classes,   10);
+        memcpy(probs->a.nmvc.class0_fp[i], p->mv_comp[i].class0_fp, 2 * 3);
+        memcpy(probs->a.nmvc.bits     [i], p->mv_comp[i].bits,      10);
+    }
+
+    memcpy(probs->a.single_ref_prob, p->single_ref, sizeof(probs->a.single_ref_prob));
+    memcpy(probs->a.comp_ref_prob,   p->comp_ref,   sizeof(probs->a.comp_ref_prob));
+
+    for (i = 0; i < FF_ARRAY_ELEMS(probs->a.probCoeffs); ++i) {
+        for (j = 0; j < FF_ARRAY_ELEMS(probs->a.probCoeffs[0]); ++j) {
+            for (k = 0; k < FF_ARRAY_ELEMS(probs->a.probCoeffs[0][0]); ++k) {
+                for (l = 0; l < FF_ARRAY_ELEMS(probs->a.probCoeffs[0][0][0]); ++l) {
+                    memcpy(probs->a.probCoeffs     [i][j][k][l], s->prob.coef[0][i][j][k][l], 3);
+                    memcpy(probs->a.probCoeffs8x8  [i][j][k][l], s->prob.coef[1][i][j][k][l], 3);
+                    memcpy(probs->a.probCoeffs16x16[i][j][k][l], s->prob.coef[2][i][j][k][l], 3);
+                    memcpy(probs->a.probCoeffs32x32[i][j][k][l], s->prob.coef[3][i][j][k][l], 3);
+                }
+            }
+        }
+    }
+}
+
+static void nvtegra_vp9_set_tile_sizes(uint16_t *sizes, VP9Context *s) {
+    int i, j;
+
+    for (i = 0; i < s->s.h.tiling.tile_rows; ++i) {
+        for (j = 0; j < s->s.h.tiling.tile_cols; ++j) {
+            sizes[0] = (s->sb_cols * (j + 1) >> s->s.h.tiling.log2_tile_cols) -
+                       (s->sb_cols *  j      >> s->s.h.tiling.log2_tile_cols);
+            sizes[1] = (s->sb_rows * (i + 1) >> s->s.h.tiling.log2_tile_rows) -
+                       (s->sb_rows *  i      >> s->s.h.tiling.log2_tile_rows);
+            sizes += 2;
+        }
+    }
+}
+
+static void nvtegra_vp9_update_counts(nvdec_vp9EntropyCounts_t *cts,
+                                      VP9TileData *td)
+{
+    int i, j, k, l;
+
+    for (i = 0; i < FF_ARRAY_ELEMS(td->counts.y_mode); ++i) {
+        for (j = 0; j < FF_ARRAY_ELEMS(td->counts.y_mode[0]); ++j) {
+            td->counts.y_mode[i][pmconv[j]] = cts->sb_ymode_counts[i][j];
+        }
+    }
+
+    for (i = 0; i < FF_ARRAY_ELEMS(td->counts.uv_mode); ++i) {
+        for (j = 0; j < FF_ARRAY_ELEMS(td->counts.uv_mode[0]); ++j) {
+            td->counts.uv_mode[pmconv[i]][pmconv[j]] = cts->uv_mode_counts[i][j];
+        }
+    }
+
+    memcpy(td->counts.filter,     cts->switchable_interp_counts, sizeof(td->counts.filter));
+    memcpy(td->counts.intra,      cts->intra_inter_count,        sizeof(td->counts.intra));
+    memcpy(td->counts.comp,       cts->comp_inter_count,         sizeof(td->counts.comp));
+    memcpy(td->counts.single_ref, cts->single_ref_count,         sizeof(td->counts.single_ref));
+    memcpy(td->counts.tx32p,      cts->tx32x32_count,            sizeof(td->counts.tx32p));
+    memcpy(td->counts.tx16p,      cts->tx16x16_count,            sizeof(td->counts.tx16p));
+    memcpy(td->counts.tx8p,       cts->tx8x8_count,              sizeof(td->counts.tx8p));
+    memcpy(td->counts.skip,       cts->mbskip_count,             sizeof(td->counts.skip));
+
+    for (i = 0; i < FF_ARRAY_ELEMS(td->counts.mv_mode); ++i) {
+        td->counts.mv_mode[i][0] = cts->inter_mode_counts[i][1][0];
+        td->counts.mv_mode[i][1] = cts->inter_mode_counts[i][2][0];
+        td->counts.mv_mode[i][2] = cts->inter_mode_counts[i][0][0];
+        td->counts.mv_mode[i][3] = cts->inter_mode_counts[i][2][1];
+    }
+
+    memcpy(td->counts.mv_joint,                 cts->nmvcount.joints,       sizeof(td->counts.mv_joint));
+    for (i = 0; i < FF_ARRAY_ELEMS(td->counts.mv_comp); ++i) {
+        memcpy(td->counts.mv_comp[i].sign,      cts->nmvcount.sign     [i], sizeof(td->counts.mv_comp[i].sign));
+        memcpy(td->counts.mv_comp[i].classes,   cts->nmvcount.classes  [i], sizeof(td->counts.mv_comp[i].classes));
+        memcpy(td->counts.mv_comp[i].class0,    cts->nmvcount.class0   [i], sizeof(td->counts.mv_comp[i].class0));
+        memcpy(td->counts.mv_comp[i].bits,      cts->nmvcount.bits     [i], sizeof(td->counts.mv_comp[i].bits));
+        memcpy(td->counts.mv_comp[i].class0_fp, cts->nmvcount.class0_fp[i], sizeof(td->counts.mv_comp[i].class0_fp));
+        memcpy(td->counts.mv_comp[i].fp,        cts->nmvcount.fp       [i], sizeof(td->counts.mv_comp[i].fp));
+        memcpy(td->counts.mv_comp[i].class0_hp, cts->nmvcount.class0_hp[i], sizeof(td->counts.mv_comp[i].class0_hp));
+        memcpy(td->counts.mv_comp[i].hp,        cts->nmvcount.hp       [i], sizeof(td->counts.mv_comp[i].hp));
+    }
+
+    memcpy(td->counts.partition[0], cts->partition_counts[12], sizeof(td->counts.partition[0]));
+    memcpy(td->counts.partition[1], cts->partition_counts[ 8], sizeof(td->counts.partition[1]));
+    memcpy(td->counts.partition[2], cts->partition_counts[ 4], sizeof(td->counts.partition[2]));
+    memcpy(td->counts.partition[3], cts->partition_counts[ 0], sizeof(td->counts.partition[3]));
+
+    for (i = 0; i < FF_ARRAY_ELEMS(td->counts.coef[0]); ++i) {
+        for (j = 0; j < FF_ARRAY_ELEMS(td->counts.coef[0][0]); ++j) {
+            for (k = 0; k < FF_ARRAY_ELEMS(td->counts.coef[0][0][0]); ++k) {
+                for (l = 0; l < FF_ARRAY_ELEMS(td->counts.coef[0][0][0][0]); ++l) {
+                    memcpy(td->counts.coef[0][i][j][k][l], cts->countCoeffs     [i][j][k][l],
+                        sizeof(td->counts.coef[0][i][j][k][l]));
+                    memcpy(td->counts.coef[1][i][j][k][l], cts->countCoeffs8x8  [i][j][k][l],
+                        sizeof(td->counts.coef[1][i][j][k][l]));
+                    memcpy(td->counts.coef[2][i][j][k][l], cts->countCoeffs16x16[i][j][k][l],
+                        sizeof(td->counts.coef[2][i][j][k][l]));
+                    memcpy(td->counts.coef[3][i][j][k][l], cts->countCoeffs32x32[i][j][k][l],
+                        sizeof(td->counts.coef[3][i][j][k][l]));
+                    td->counts.eob[0][i][j][k][l][0] = cts->countCoeffs     [i][j][k][l][3];
+                    td->counts.eob[0][i][j][k][l][1] = cts->countEobs[0][i][j][k][l] - td->counts.eob[0][i][j][k][l][0];
+                    td->counts.eob[1][i][j][k][l][0] = cts->countCoeffs8x8  [i][j][k][l][3];
+                    td->counts.eob[1][i][j][k][l][1] = cts->countEobs[1][i][j][k][l] - td->counts.eob[1][i][j][k][l][0];
+                    td->counts.eob[2][i][j][k][l][0] = cts->countCoeffs16x16[i][j][k][l][3];
+                    td->counts.eob[2][i][j][k][l][1] = cts->countEobs[2][i][j][k][l] - td->counts.eob[2][i][j][k][l][0];
+                    td->counts.eob[3][i][j][k][l][0] = cts->countCoeffs32x32[i][j][k][l][3];
+                    td->counts.eob[3][i][j][k][l][1] = cts->countEobs[3][i][j][k][l] - td->counts.eob[3][i][j][k][l][0];
+                }
+            }
+        }
+    }
+}
+
+static void nvtegra_vp9_prepare_frame_setup(nvdec_vp9_pic_s *setup, AVCodecContext *avctx,
+                                            NVTegraVP9DecodeContext *ctx)
+{
+    VP9Context       *s = avctx->priv_data;
+    VP9SharedContext *h = &s->s;
+
+    int i;
+
+    /* Note: the stride is divided by 2 when the depth is > 8 (not supported on T210) */
+#define FWIDTH(f)      ((f && f->private_ref) ? f->width       : 0)
+#define FHEIGHT(f)     ((f && f->private_ref) ? f->height      : 0)
+#define FSTRIDE(f, c)  ((f && f->private_ref) ? f->linesize[c] : 0)
+
+    /* Note: the v1 substructure isn't filled out on T210 */
+    *setup = (nvdec_vp9_pic_s){
+        .gptimer_timeout_value    = 0, /* Default value */
+
+        .tileformat               = 0, /* TBL */
+        .gob_height               = 0, /* GOB_2 */
+
+        .Vp9BsdCtrlOffset         = FFALIGN(avctx->height, 64) * 912 / 256,
+
+        .ref0_width               = FWIDTH (h->refs[h->h.refidx[0]].f),
+        .ref0_height              = FHEIGHT(h->refs[h->h.refidx[0]].f),
+        .ref0_stride              = {
+            FSTRIDE(h->refs[h->h.refidx[0]].f, 0),
+            FSTRIDE(h->refs[h->h.refidx[0]].f, 1),
+        },
+
+        .ref1_width               = FWIDTH (h->refs[h->h.refidx[1]].f),
+        .ref1_height              = FHEIGHT(h->refs[h->h.refidx[1]].f),
+        .ref1_stride              = {
+            FSTRIDE(h->refs[h->h.refidx[1]].f, 0),
+            FSTRIDE(h->refs[h->h.refidx[1]].f, 1),
+        },
+
+        .ref2_width               = FWIDTH (h->refs[h->h.refidx[2]].f),
+        .ref2_height              = FHEIGHT(h->refs[h->h.refidx[2]].f),
+        .ref2_stride              = {
+            FSTRIDE(h->refs[h->h.refidx[2]].f, 0),
+            FSTRIDE(h->refs[h->h.refidx[2]].f, 1),
+        },
+
+        .width                    = FWIDTH (h->frames[CUR_FRAME].tf.f),
+        .height                   = FHEIGHT(h->frames[CUR_FRAME].tf.f),
+        .framestride              = {
+            FSTRIDE(h->frames[CUR_FRAME].tf.f, 0),
+            FSTRIDE(h->frames[CUR_FRAME].tf.f, 1),
+        },
+
+        .keyFrame                 = h->h.keyframe,
+        .prevIsKeyFrame           = s->last_keyframe,
+        .errorResilient           = h->h.errorres,
+        .prevShowFrame            = ctx->prev_show_frame,
+        .intraOnly                = h->h.intraonly,
+
+        .refFrameSignBias         = {
+            0,
+            h->h.signbias[0], h->h.signbias[1], h->h.signbias[2],
+        },
+
+        .loopFilterLevel          = h->h.filter.level,
+        .loopFilterSharpness      = h->h.filter.sharpness,
+
+        .qpYAc                    = h->h.yac_qi,
+        .qpYDc                    = h->h.ydc_qdelta,
+        .qpChAc                   = h->h.uvdc_qdelta,
+        .qpChDc                   = h->h.uvac_qdelta,
+
+        .lossless                 = h->h.lossless,
+        .transform_mode           = h->h.txfmmode,
+        .allow_high_precision_mv  = h->h.keyframe ? 0 : h->h.highprecisionmvs,
+        .mcomp_filter_type        = h->h.filtermode,
+        .comp_pred_mode           = h->h.comppredmode,
+        .comp_fixed_ref           = h->h.allowcompinter ? h->h.fixcompref + 1 : 0,
+        .comp_var_ref             = {
+            h->h.allowcompinter ? h->h.varcompref[0] + 1 : 0,
+            h->h.allowcompinter ? h->h.varcompref[1] + 1 : 0,
+        },
+
+        .log2_tile_columns        = h->h.tiling.log2_tile_cols,
+        .log2_tile_rows           = h->h.tiling.log2_tile_rows,
+
+        .segmentEnabled           = h->h.segmentation.enabled,
+        .segmentMapUpdate         = h->h.segmentation.update_map,
+        .segmentMapTemporalUpdate = h->h.segmentation.temporal,
+        .segmentFeatureMode       = h->h.segmentation.absolute_vals,
+        .modeRefLfEnabled         = h->h.lf_delta.enabled,
+        .mbRefLfDelta             = {
+            h->h.lf_delta.ref[0],  h->h.lf_delta.ref[1],
+            h->h.lf_delta.ref[2],  h->h.lf_delta.ref[3],
+        },
+        .mbModeLfDelta            = {
+            h->h.lf_delta.mode[0], h->h.lf_delta.mode[1],
+        },
+    };
+
+    for (i = 0; i < 8; ++i) {
+        setup->segmentFeatureEnable[i][0] = h->h.segmentation.feat[i].q_enabled;
+        setup->segmentFeatureEnable[i][1] = h->h.segmentation.feat[i].lf_enabled;
+        setup->segmentFeatureEnable[i][2] = h->h.segmentation.feat[i].ref_enabled;
+        setup->segmentFeatureEnable[i][3] = h->h.segmentation.feat[i].skip_enabled;
+
+        setup->segmentFeatureData[i][0]   = h->h.segmentation.feat[i].q_val;
+        setup->segmentFeatureData[i][1]   = h->h.segmentation.feat[i].lf_val;
+        setup->segmentFeatureData[i][2]   = h->h.segmentation.feat[i].ref_val;
+        setup->segmentFeatureData[i][3]   = 0;
+    }
+
+    ctx->prev_show_frame = !h->h.invisible;
+}
+
+static int nvtegra_vp9_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, VP9SharedContext *h,
+                                      NVTegraVP9DecodeContext *ctx, AVFrame *cur_frame)
+{
+    FrameDecodeData     *fdd = (FrameDecodeData *)cur_frame->private_ref->data;
+    FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv;
+    AVNVTegraMap  *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    uint32_t col_mvwrite_off, col_mvread_off;
+    int err;
+
+    if (ctx->core.frame_idx % 2 == 0)
+        col_mvwrite_off = ctx->col_mvrw1_off, col_mvread_off = ctx->col_mvrw2_off;
+    else
+        col_mvwrite_off = ctx->col_mvrw2_off, col_mvread_off = ctx->col_mvrw1_off;
+
+    err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVDEC);
+    if (err < 0)
+        return err;
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_APPLICATION_ID,
+                          AV_NVTEGRA_ENUM(NVC5B0_SET_APPLICATION_ID, ID, VP9));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_CONTROL_PARAMS,
+                          AV_NVTEGRA_ENUM (NVC5B0_SET_CONTROL_PARAMS, CODEC_TYPE,     VP9) |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, ERR_CONCEAL_ON, 1)   |
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_CONTROL_PARAMS, GPTIMER_ON,     1));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_SET_PICTURE_INDEX,
+                          AV_NVTEGRA_VALUE(NVC5B0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx));
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_DRV_PIC_SETUP_OFFSET,
+                          input_map,        ctx->core.pic_setup_off,     NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_IN_BUF_BASE_OFFSET,
+                          input_map,        ctx->core.bitstream_off,     NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_NVDEC_STATUS_OFFSET,
+                          input_map,        ctx->core.status_off,        NVHOST_RELOC_TYPE_DEFAULT);
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_PROB_TAB_BUF_OFFSET,
+                          input_map,        ctx->prob_tab_off,           NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_CTX_COUNTER_BUF_OFFSET,
+                          &ctx->common_map, ctx->ctx_counter_off,        NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_TILE_SIZE_BUF_OFFSET,
+                          &ctx->common_map, ctx->tile_sizes_off,         NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_COL_MVWRITE_BUF_OFFSET,
+                          &ctx->common_map, col_mvwrite_off,             NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_COL_MVREAD_BUF_OFFSET,
+                          &ctx->common_map, col_mvread_off,              NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_SEGMENT_READ_BUF_OFFSET,
+                          &ctx->common_map, ctx->segment_rw1_off,        NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_SEGMENT_WRITE_BUF_OFFSET,
+                          &ctx->common_map, ctx->segment_rw2_off,        NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_VP9_SET_FILTER_BUFFER_OFFSET,
+                          &ctx->common_map, ctx->filter_off,             NVHOST_RELOC_TYPE_DEFAULT);
+
+#define PUSH_FRAME(fr, offset) ({                                                           \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_LUMA_OFFSET0   + offset * 4,           \
+                          av_nvtegra_frame_get_fbuf_map(fr), 0, NVHOST_RELOC_TYPE_DEFAULT); \
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVC5B0_SET_PICTURE_CHROMA_OFFSET0 + offset * 4,           \
+                          av_nvtegra_frame_get_fbuf_map(fr), fr->data[1] - fr->data[0],     \
+                          NVHOST_RELOC_TYPE_DEFAULT);                                       \
+})
+
+    PUSH_FRAME(ctx->refs[0], 0);
+    PUSH_FRAME(ctx->refs[1], 1);
+    PUSH_FRAME(ctx->refs[2], 2);
+    PUSH_FRAME(cur_frame,    3);
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVC5B0_EXECUTE,
+                          AV_NVTEGRA_ENUM(NVC5B0_EXECUTE, AWAKEN, ENABLE));
+
+    err = av_nvtegra_cmdbuf_end(cmdbuf);
+    if (err < 0)
+        return err;
+
+    if (h->h.segmentation.update_map)
+        FFSWAP(uint32_t, ctx->segment_rw1_off, ctx->segment_rw2_off);
+
+    return 0;
+}
+
+static int nvtegra_vp9_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) {
+    VP9Context                *s = avctx->priv_data;
+    VP9SharedContext          *h = &s->s;
+    AVFrame               *frame = h->frames[CUR_FRAME].tf.f;
+    FrameDecodeData         *fdd = (FrameDecodeData *)frame->private_ref->data;
+    NVTegraVP9DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    FFNVTegraDecodeFrame *tf;
+    AVNVTegraMap *input_map;
+    uint8_t *mem, *common_mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Starting VP9-NVTEGRA frame with pixel format %s\n",
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    if (s->s.h.refreshctx && s->s.h.parallelmode) {
+        int i, j, k, l, m;
+
+        for (i = 0; i < FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef); i++) {
+            for (j = 0; j < FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef[0]); j++)
+                for (k = 0; k < FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef[0][0]); k++)
+                    for (l = 0; l < FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef[0][0][0]); l++)
+                        for (m = 0; m < FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef[0][0][0][0]); m++)
+                            memcpy(s->prob_ctx[s->s.h.framectxid].coef[i][j][k][l][m],
+                                   s->prob.coef[i][j][k][l][m],
+                                   FF_ARRAY_ELEMS(s->prob_ctx[s->s.h.framectxid].coef[0][0][0][0][0]));
+            if (s->s.h.txfmmode == i)
+                break;
+        }
+
+        s->prob_ctx[s->s.h.framectxid].p = s->prob.p;
+    }
+
+    err = ff_nvtegra_start_frame(avctx, frame, &ctx->core);
+    if (err < 0)
+        return err;
+
+    tf = fdd->hwaccel_priv;
+    input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    mem = av_nvtegra_map_get_addr(input_map), common_mem = av_nvtegra_map_get_addr(&ctx->common_map);
+
+    nvtegra_vp9_prepare_frame_setup((nvdec_vp9_pic_s *)(mem + ctx->core.pic_setup_off), avctx, ctx);
+    nvtegra_vp9_set_tile_sizes((uint16_t *)(common_mem + ctx->tile_sizes_off), s);
+    nvtegra_vp9_update_probs((nvdec_vp9EntropyProbs_t *)(mem + ctx->prob_tab_off), s, ctx->core.new_input_buffer);
+
+    ctx->refs[0] = ff_nvtegra_safe_get_ref(h->refs[h->h.refidx[0]].f, h->frames[CUR_FRAME].tf.f);
+    ctx->refs[1] = ff_nvtegra_safe_get_ref(h->refs[h->h.refidx[1]].f, h->frames[CUR_FRAME].tf.f);
+    ctx->refs[2] = ff_nvtegra_safe_get_ref(h->refs[h->h.refidx[2]].f, h->frames[CUR_FRAME].tf.f);
+
+    return 0;
+}
+
+static int nvtegra_vp9_end_frame(AVCodecContext *avctx) {
+    VP9Context                *s = avctx->priv_data;
+    VP9SharedContext          *h = avctx->priv_data;
+    NVTegraVP9DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    AVFrame               *frame = h->frames[CUR_FRAME].tf.f;
+    FrameDecodeData         *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame     *tf = fdd->hwaccel_priv;
+
+    nvdec_vp9_pic_s *setup;
+    uint8_t *mem, *common_mem;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Ending VP9-NVTEGRA frame with %u slices -> %u bytes\n",
+           ctx->core.num_slices, ctx->core.bitstream_len);
+
+    if (!tf || !ctx->core.num_slices)
+        return 0;
+
+    mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data);
+
+    setup = (nvdec_vp9_pic_s *)(mem + ctx->core.pic_setup_off);
+    setup->stream_len = ctx->core.bitstream_len;
+
+    err = nvtegra_vp9_prepare_cmdbuf(&ctx->core.cmdbuf, h, ctx, frame);
+    if (err < 0)
+        return err;
+
+    err = ff_nvtegra_end_frame(avctx, frame, &ctx->core, NULL, 0);
+    if (err < 0)
+        return err;
+
+    /*
+     * Perform backward probability updates if necessary.
+     * Since it depends on entropy counts calculated by the hardware,
+     * we need to wait for the decode operation to complete.
+     */
+    if (!s->s.h.errorres && !s->s.h.parallelmode) {
+        err = ff_nvtegra_wait_decode(avctx, frame);
+        if (err < 0)
+            return err;
+
+        common_mem = av_nvtegra_map_get_addr(&ctx->common_map);
+
+        nvtegra_vp9_update_counts((nvdec_vp9EntropyCounts_t *)(common_mem + ctx->ctx_counter_off),
+                                  s->td);
+        ff_vp9_adapt_probs(s);
+    }
+
+    return 0;
+}
+
+static int nvtegra_vp9_decode_slice(AVCodecContext *avctx, const uint8_t *buf,
+                                    uint32_t buf_size)
+{
+    VP9SharedContext *h = avctx->priv_data;
+    AVFrame      *frame = h->frames[CUR_FRAME].tf.f;
+
+    int offset = h->h.uncompressed_header_size + h->h.compressed_header_size;
+
+    return ff_nvtegra_decode_slice(avctx, frame, buf + offset, buf_size - offset, false);
+}
+
+#if CONFIG_VP9_NVTEGRA_HWACCEL
+const FFHWAccel ff_vp9_nvtegra_hwaccel = {
+    .p.name         = "vp9_nvtegra",
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_VP9,
+    .p.pix_fmt      = AV_PIX_FMT_NVTEGRA,
+    .start_frame    = &nvtegra_vp9_start_frame,
+    .end_frame      = &nvtegra_vp9_end_frame,
+    .decode_slice   = &nvtegra_vp9_decode_slice,
+    .init           = &nvtegra_vp9_decode_init,
+    .uninit         = &nvtegra_vp9_decode_uninit,
+    .frame_params   = &ff_nvtegra_frame_params,
+    .priv_data_size = sizeof(NVTegraVP9DecodeContext),
+    .caps_internal  = HWACCEL_CAP_ASYNC_SAFE,
+};
+#endif
diff --git a/libavcodec/vp9.c b/libavcodec/vp9.c
index 8ede2e2eb3..6f2b6f5241 100644
--- a/libavcodec/vp9.c
+++ b/libavcodec/vp9.c
@@ -165,7 +165,8 @@ static int update_size(AVCodecContext *avctx, int w, int h)
                      CONFIG_VP9_NVDEC_HWACCEL + \
                      CONFIG_VP9_VAAPI_HWACCEL + \
                      CONFIG_VP9_VDPAU_HWACCEL + \
-                     CONFIG_VP9_VIDEOTOOLBOX_HWACCEL)
+                     CONFIG_VP9_VIDEOTOOLBOX_HWACCEL + \
+                     CONFIG_VP9_NVTEGRA_HWACCEL)
     enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmtp = pix_fmts;
     VP9Context *s = avctx->priv_data;
     uint8_t *p;
@@ -180,6 +181,10 @@ static int update_size(AVCodecContext *avctx, int w, int h)
 
         switch (s->pix_fmt) {
         case AV_PIX_FMT_YUV420P:
+#if CONFIG_VP9_NVTEGRA_HWACCEL
+            *fmtp++ = AV_PIX_FMT_NVTEGRA;
+#endif
+        /* fallthrough */
         case AV_PIX_FMT_YUV420P10:
 #if CONFIG_VP9_DXVA2_HWACCEL
             *fmtp++ = AV_PIX_FMT_DXVA2_VLD;
@@ -1870,6 +1875,9 @@ const FFCodec ff_vp9_decoder = {
 #endif
 #if CONFIG_VP9_VIDEOTOOLBOX_HWACCEL
                                HWACCEL_VIDEOTOOLBOX(vp9),
+#endif
+#if CONFIG_VP9_NVTEGRA_HWACCEL
+                               HWACCEL_NVTEGRA(vp9),
 #endif
                                NULL
                            },
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [FFmpeg-devel] [PATCH 16/16] nvtegra: add mjpeg hardware decoding
  2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
                   ` (14 preceding siblings ...)
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 15/16] nvtegra: add vp9 " averne
@ 2024-05-30 19:43 ` averne
  15 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-30 19:43 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: averne

This uses NVJPG, a hardware engine separate from NVDEC. On the tegra 210 (and possibly later hardware), it has the specificity of being unable to decode to tiled surfaces, along with some quirks that have been observed to hang the hardware.

Signed-off-by: averne <averne381@gmail.com>
---
 configure                  |   2 +
 libavcodec/Makefile        |   1 +
 libavcodec/hwaccels.h      |   1 +
 libavcodec/mjpegdec.c      |   6 +
 libavcodec/nvtegra_mjpeg.c | 336 +++++++++++++++++++++++++++++++++++++
 5 files changed, 346 insertions(+)
 create mode 100644 libavcodec/nvtegra_mjpeg.c

diff --git a/configure b/configure
index 3fe948d9ab..1d885ed655 100755
--- a/configure
+++ b/configure
@@ -3219,6 +3219,8 @@ mjpeg_nvdec_hwaccel_deps="nvdec"
 mjpeg_nvdec_hwaccel_select="mjpeg_decoder"
 mjpeg_vaapi_hwaccel_deps="vaapi"
 mjpeg_vaapi_hwaccel_select="mjpeg_decoder"
+mjpeg_nvtegra_hwaccel_deps="nvtegra"
+mjpeg_nvtegra_hwaccel_select="mjpeg_decoder"
 mpeg1_nvdec_hwaccel_deps="nvdec"
 mpeg1_nvdec_hwaccel_select="mpeg1video_decoder"
 mpeg1_vdpau_hwaccel_deps="vdpau"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 914995558e..6a773f8d3e 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1025,6 +1025,7 @@ OBJS-$(CONFIG_HEVC_VULKAN_HWACCEL)        += vulkan_decode.o vulkan_hevc.o
 OBJS-$(CONFIG_HEVC_NVTEGRA_HWACCEL)       += nvtegra_hevc.o
 OBJS-$(CONFIG_MJPEG_NVDEC_HWACCEL)        += nvdec_mjpeg.o
 OBJS-$(CONFIG_MJPEG_VAAPI_HWACCEL)        += vaapi_mjpeg.o
+OBJS-$(CONFIG_MJPEG_NVTEGRA_HWACCEL)      += nvtegra_mjpeg.o
 OBJS-$(CONFIG_MPEG1_NVDEC_HWACCEL)        += nvdec_mpeg12.o
 OBJS-$(CONFIG_MPEG1_VDPAU_HWACCEL)        += vdpau_mpeg12.o
 OBJS-$(CONFIG_MPEG1_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index a3babfc309..f5a121d23f 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -51,6 +51,7 @@ extern const struct FFHWAccel ff_hevc_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_hevc_vulkan_hwaccel;
 extern const struct FFHWAccel ff_mjpeg_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mjpeg_vaapi_hwaccel;
+extern const struct FFHWAccel ff_mjpeg_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_vdpau_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_videotoolbox_hwaccel;
diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c
index 1481a7f285..f8b00a92d6 100644
--- a/libavcodec/mjpegdec.c
+++ b/libavcodec/mjpegdec.c
@@ -733,6 +733,9 @@ int ff_mjpeg_decode_sof(MJpegDecodeContext *s)
 #endif
 #if CONFIG_MJPEG_VAAPI_HWACCEL
                 AV_PIX_FMT_VAAPI,
+#endif
+#if CONFIG_MJPEG_NVTEGRA_HWACCEL
+                AV_PIX_FMT_NVTEGRA,
 #endif
                 s->avctx->pix_fmt,
                 AV_PIX_FMT_NONE,
@@ -3021,6 +3024,9 @@ const FFCodec ff_mjpeg_decoder = {
 #endif
 #if CONFIG_MJPEG_VAAPI_HWACCEL
                         HWACCEL_VAAPI(mjpeg),
+#endif
+#if CONFIG_MJPEG_NVTEGRA_HWACCEL
+                        HWACCEL_NVTEGRA(mjpeg),
 #endif
                         NULL
                     },
diff --git a/libavcodec/nvtegra_mjpeg.c b/libavcodec/nvtegra_mjpeg.c
new file mode 100644
index 0000000000..9139116159
--- /dev/null
+++ b/libavcodec/nvtegra_mjpeg.c
@@ -0,0 +1,336 @@
+/*
+ * Copyright (c) 2024 averne <averne381@gmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "mjpegdec.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraMJPEGDecodeContext {
+    FFNVTegraDecodeContext core;
+} NVTegraMJPEGDecodeContext;
+
+static int nvtegra_mjpeg_decode_uninit(AVCodecContext *avctx) {
+    NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA MJPEG decoder\n");
+
+    err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_mjpeg_decode_init(AVCodecContext *avctx) {
+    MJpegDecodeContext          *s = avctx->priv_data;
+    NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+
+    enum AVPixelFormat fmt;
+    int luma, err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Initializing NVTEGRA MJPEG decoder\n");
+
+    /* Reject encodes with known hardware issues */
+    if (avctx->profile != AV_PROFILE_MJPEG_HUFFMAN_BASELINE_DCT) {
+        av_log(avctx, AV_LOG_ERROR, "Non-baseline encoded jpegs are not supported by NVJPG\n");
+        return AVERROR(EINVAL);
+    }
+
+    fmt = s->avctx->pix_fmt, luma = s->comp_index[0];
+    if ((fmt == AV_PIX_FMT_YUV444P || fmt == AV_PIX_FMT_YUVJ444P)
+            && (s->h_count[luma] != 1 || s->v_count[luma] != 1)) {
+        av_log(avctx, AV_LOG_ERROR, "Subsampled YUV444 is not supported by NVJPG\n");
+        return AVERROR(EINVAL);
+    }
+
+    ctx->core.pic_setup_off  = 0;
+    ctx->core.status_off     = FFALIGN(ctx->core.pic_setup_off + sizeof(nvjpg_dec_drv_pic_setup_s),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.cmdbuf_off     = FFALIGN(ctx->core.status_off    + sizeof(nvjpg_dec_status),
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.bitstream_off  = FFALIGN(ctx->core.cmdbuf_off    + AV_NVTEGRA_MAP_ALIGN,
+                                       AV_NVTEGRA_MAP_ALIGN);
+    ctx->core.input_map_size = FFALIGN(ctx->core.bitstream_off + ff_nvtegra_decode_pick_bitstream_buffer_size(avctx),
+                                       0x1000);
+
+    ctx->core.max_cmdbuf_size    =  ctx->core.slice_offsets_off - ctx->core.cmdbuf_off;
+    ctx->core.max_bitstream_size =  ctx->core.input_map_size    - ctx->core.bitstream_off;
+
+    ctx->core.is_nvjpg = true;
+
+    err = ff_nvtegra_decode_init(avctx, &ctx->core);
+    if (err < 0)
+        goto fail;
+
+    return 0;
+
+fail:
+    nvtegra_mjpeg_decode_uninit(avctx);
+    return err;
+}
+
+static void nvtegra_mjpeg_prepare_frame_setup(nvjpg_dec_drv_pic_setup_s *setup, MJpegDecodeContext *s,
+                                              NVTegraMJPEGDecodeContext *ctx)
+{
+    int input_chroma_mode, output_chroma_mode, memory_mode;
+    int i, j;
+
+    switch (s->hwaccel_sw_pix_fmt) {
+        case AV_PIX_FMT_GRAY8:
+            input_chroma_mode  = 0; /* Monochrome */
+            output_chroma_mode = 0; /* Monochrome */
+            memory_mode        = 3; /* YUV420, for some reason decoding fails with NV12 */
+            break;
+        default:
+        case AV_PIX_FMT_YUV420P:
+        case AV_PIX_FMT_YUVJ420P:
+            input_chroma_mode  = 1; /* YUV420 */
+            output_chroma_mode = 1; /* YUV420 */
+            memory_mode        = 0; /* NV12 */
+            break;
+        case AV_PIX_FMT_YUV422P:
+        case AV_PIX_FMT_YUVJ422P:
+            input_chroma_mode  = 2; /* YUV422H (not sure what nvidia means by that) */
+            output_chroma_mode = 1; /* YUV420 */
+            memory_mode        = 0; /* NV12 */
+            break;
+        case AV_PIX_FMT_YUV440P:
+        case AV_PIX_FMT_YUVJ440P:
+            input_chroma_mode  = 3; /* YUV422V (ditto) */
+            output_chroma_mode = 1; /* YUV420 */
+            memory_mode        = 0; /* NV12 */
+            break;
+        case AV_PIX_FMT_YUV444P:
+        case AV_PIX_FMT_YUVJ444P:
+            input_chroma_mode  = 4; /* YUV444 */
+            output_chroma_mode = 1; /* YUV420 */
+            memory_mode        = 0; /* NV12 */
+            break;
+    }
+
+    *setup = (nvjpg_dec_drv_pic_setup_s){
+        .restart_interval     = s->restart_interval,
+        .frame_width          = s->width,
+        .frame_height         = s->height,
+        .mcu_width            = s->mb_width,
+        .mcu_height           = s->mb_height,
+        .comp                 = s->nb_components,
+
+        .stream_chroma_mode   = input_chroma_mode,
+        .output_chroma_mode   = output_chroma_mode,
+        .output_pixel_format  = 0,  /* YUV */
+        .output_stride_luma   = s->picture->linesize[0],
+        .output_stride_chroma = s->picture->linesize[1],
+
+        .tile_mode            = 0,  /* Pitch linear (tiled formats are unsupported by the T210) */
+        .memory_mode          = memory_mode,
+        .power2_downscale     = 0,
+        .motion_jpeg_type     = 0,  /* Type A */
+
+        .start_mcu_x          = 0,
+        .start_mcu_y          = 0,
+    };
+
+    for (i = 0; i < 4; ++i) {
+        for (j = 0; j < 16; ++j) {
+            setup->huffTab[0][i].codeNum[j] = s->raw_huffman_lengths[0][i][j];
+            setup->huffTab[1][i].codeNum[j] = s->raw_huffman_lengths[1][i][j];
+        }
+
+        memcpy(setup->huffTab[0][i].symbol, s->raw_huffman_values[0][i], sizeof(setup->huffTab[0][i].symbol));
+        memcpy(setup->huffTab[1][i].symbol, s->raw_huffman_values[1][i], sizeof(setup->huffTab[1][i].symbol));
+    }
+
+    for (i = 0; i < s->nb_components; ++i) {
+        j = s->comp_index[i];
+        setup->blkPar[j].ac     = s->ac_index   [i];
+        setup->blkPar[j].dc     = s->dc_index   [i];
+        setup->blkPar[j].hblock = s->h_count    [i];
+        setup->blkPar[j].vblock = s->v_count    [i];
+        setup->blkPar[j].quant  = s->quant_index[i];
+    }
+
+    for (i = 0; i < 4; ++i) {
+        for (j = 0; j < 64; ++j)
+            setup->quant[i][j] = s->quant_matrixes[i][j];
+    }
+}
+
+static int nvtegra_mjpeg_prepare_cmdbuf(AVNVTegraCmdbuf *cmdbuf, MJpegDecodeContext *s,
+                                        NVTegraMJPEGDecodeContext *ctx, AVFrame *current_frame)
+{
+    FrameDecodeData     *fdd = (FrameDecodeData *)current_frame->private_ref->data;
+    FFNVTegraDecodeFrame *tf = fdd->hwaccel_priv;
+    AVNVTegraMap  *input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+
+    int err;
+
+    err = av_nvtegra_cmdbuf_begin(cmdbuf, HOST1X_CLASS_NVJPG);
+    if (err < 0)
+        return err;
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVE7D0_SET_APPLICATION_ID,
+                          AV_NVTEGRA_ENUM(NVE7D0_SET_APPLICATION_ID, ID, NVJPG_DECODER));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVE7D0_SET_CONTROL_PARAMS,
+                          AV_NVTEGRA_VALUE(NVE7D0_SET_CONTROL_PARAMS, DUMP_CYCLE_COUNT, 1) |
+                          AV_NVTEGRA_VALUE(NVE7D0_SET_CONTROL_PARAMS, GPTIMER_ON,       1));
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVE7D0_SET_PICTURE_INDEX,
+                          AV_NVTEGRA_VALUE(NVE7D0_SET_PICTURE_INDEX, INDEX, ctx->core.frame_idx));
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVE7D0_SET_IN_DRV_PIC_SETUP,
+                          input_map, ctx->core.pic_setup_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVE7D0_SET_BITSTREAM,
+                          input_map, ctx->core.bitstream_off, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVE7D0_SET_OUT_STATUS,
+                          input_map, ctx->core.status_off,    NVHOST_RELOC_TYPE_DEFAULT);
+
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVE7D0_SET_CUR_PIC, av_nvtegra_frame_get_fbuf_map(current_frame),
+                          0, NVHOST_RELOC_TYPE_DEFAULT);
+    AV_NVTEGRA_PUSH_RELOC(cmdbuf, NVE7D0_SET_CUR_PIC_CHROMA_U, av_nvtegra_frame_get_fbuf_map(current_frame),
+                          current_frame->data[1] - current_frame->data[0], NVHOST_RELOC_TYPE_DEFAULT);
+
+    AV_NVTEGRA_PUSH_VALUE(cmdbuf, NVE7D0_EXECUTE,
+                          AV_NVTEGRA_ENUM(NVE7D0_EXECUTE, AWAKEN, ENABLE));
+
+    err = av_nvtegra_cmdbuf_end(cmdbuf);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_mjpeg_start_frame(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) {
+    MJpegDecodeContext          *s = avctx->priv_data;
+    AVFrame                 *frame = s->picture;
+    NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Starting MJPEG-NVTEGRA frame with pixel format %s\n",
+           av_get_pix_fmt_name(avctx->sw_pix_fmt));
+
+    err = ff_nvtegra_start_frame(avctx, frame, &ctx->core);
+    if (err < 0)
+        return err;
+
+    return 0;
+}
+
+static int nvtegra_mjpeg_end_frame(AVCodecContext *avctx) {
+    MJpegDecodeContext          *s = avctx->priv_data;
+    NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    AVFrame                 *frame = s->picture;
+    FrameDecodeData           *fdd = (FrameDecodeData *)frame->private_ref->data;
+    FFNVTegraDecodeFrame       *tf = fdd->hwaccel_priv;
+
+    nvjpg_dec_drv_pic_setup_s *setup;
+    uint8_t *mem;
+    AVNVTegraMap *output_map;
+    int err;
+
+    av_log(avctx, AV_LOG_DEBUG, "Ending MJPEG-NVTEGRA frame with %u slices -> %u bytes\n",
+           ctx->core.num_slices, ctx->core.bitstream_len);
+
+    if (!tf || !ctx->core.num_slices)
+        return 0;
+
+    mem = av_nvtegra_map_get_addr((AVNVTegraMap *)tf->input_map_ref->data);
+
+    setup = (nvjpg_dec_drv_pic_setup_s *)(mem + ctx->core.pic_setup_off);
+    setup->bitstream_offset = 0;
+    setup->bitstream_size   = ctx->core.bitstream_len;
+
+    err = nvtegra_mjpeg_prepare_cmdbuf(&ctx->core.cmdbuf, s, ctx, frame);
+    if (err < 0)
+        return err;
+
+    output_map = av_nvtegra_frame_get_fbuf_map(frame);
+    output_map->is_linear = true;
+
+    return ff_nvtegra_end_frame(avctx, frame, &ctx->core, NULL, 0);
+}
+
+static int nvtegra_mjpeg_decode_slice(AVCodecContext *avctx, const uint8_t *buf, uint32_t buf_size) {
+    MJpegDecodeContext          *s = avctx->priv_data;
+    NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+    AVFrame                 *frame = s->picture;
+    FrameDecodeData           *fdd = (FrameDecodeData *)frame->private_ref->data;
+
+    FFNVTegraDecodeFrame *tf;
+    AVNVTegraMap *input_map;
+    uint8_t *mem;
+
+    tf = fdd->hwaccel_priv;
+    input_map = (AVNVTegraMap *)tf->input_map_ref->data;
+    mem = av_nvtegra_map_get_addr(input_map);
+
+    /* In nvtegra_mjpeg_start_frame the JFIF headers haven't been entirely parsed yet */
+    nvtegra_mjpeg_prepare_frame_setup((nvjpg_dec_drv_pic_setup_s *)(mem + ctx->core.pic_setup_off), s, ctx);
+
+    return ff_nvtegra_decode_slice(avctx, frame, buf, buf_size, false);
+}
+
+static int nvtegra_mjpeg_frame_params(AVCodecContext *avctx, AVBufferRef *hw_frames_ctx) {
+    AVHWFramesContext *frames_ctx = (AVHWFramesContext *)hw_frames_ctx->data;
+
+    int err;
+
+    err = ff_nvtegra_frame_params(avctx, hw_frames_ctx);
+    if (err < 0)
+        return err;
+
+    /*
+     * NVJPG1 can only decode to pitch linear surfaces, which have a
+     * 256b alignment requirement in VIC.
+     */
+    frames_ctx->width  = FFALIGN(frames_ctx->width,  256);
+    frames_ctx->height = FFALIGN(frames_ctx->height, 4);
+
+    return 0;
+}
+
+#if CONFIG_MJPEG_NVTEGRA_HWACCEL
+const FFHWAccel ff_mjpeg_nvtegra_hwaccel = {
+    .p.name         = "mjpeg_nvtegra",
+    .p.type         = AVMEDIA_TYPE_VIDEO,
+    .p.id           = AV_CODEC_ID_MJPEG,
+    .p.pix_fmt      = AV_PIX_FMT_NVTEGRA,
+    .start_frame    = &nvtegra_mjpeg_start_frame,
+    .end_frame      = &nvtegra_mjpeg_end_frame,
+    .decode_slice   = &nvtegra_mjpeg_decode_slice,
+    .init           = &nvtegra_mjpeg_decode_init,
+    .uninit         = &nvtegra_mjpeg_decode_uninit,
+    .frame_params   = &nvtegra_mjpeg_frame_params,
+    .priv_data_size = sizeof(NVTegraMJPEGDecodeContext),
+    .caps_internal  = HWACCEL_CAP_ASYNC_SAFE,
+};
+#endif
-- 
2.45.1

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 02/16] configure, avutil: add support for HorizonOS
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 02/16] configure, avutil: add support for HorizonOS averne
@ 2024-05-30 20:37   ` Rémi Denis-Courmont
  2024-05-31 21:06     ` averne
  0 siblings, 1 reply; 37+ messages in thread
From: Rémi Denis-Courmont @ 2024-05-30 20:37 UTC (permalink / raw)
  To: ffmpeg-devel

Le torstaina 30. toukokuuta 2024, 22.43.04 EEST averne a écrit :
> HorizonOS (HOS) is the operating system of the Nintendo Switch.
> This patch enables integration with the homebrew toolchain developped by the
> devkitPro team. Its two main components are devkitA64 (common toolchain for
> aarch64 targets) and libnx (library implementing interaction with the HOS
> kernel and system daemons, termed sysmodules).
> 
> Signed-off-by: averne <averne381@gmail.com>
> ---
>  configure       | 8 ++++++++
>  libavutil/cpu.c | 7 +++++++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/configure b/configure
> index 96b181fd21..09fb2aed1b 100755
> --- a/configure
> +++ b/configure
> @@ -5967,6 +5967,10 @@ case $target_os in
>          ;;
>      minix)
>          ;;
> +    horizon)
> +        enable section_data_rel_ro
> +        add_extralibs -lnx
> +        ;;
>      none)
>          ;;
>      *)
> @@ -7710,6 +7714,10 @@ haiku)
>          disable memalign
>      fi
>      ;;
> +horizon)
> +    disable sysctl
> +    disable sysctlbyname
> +    ;;

Are those really broken, or is this just a trick to force a fallback? In the 
later case, you don't need to disable them; just to put the HOS code ahead of 
the generic BSD code.

>  esac
> 
>  flatten_extralibs(){
> diff --git a/libavutil/cpu.c b/libavutil/cpu.c
> index 9ac2f01c20..6a77df5e34 100644
> --- a/libavutil/cpu.c
> +++ b/libavutil/cpu.c
> @@ -48,6 +48,9 @@
>  #if HAVE_UNISTD_H
>  #include <unistd.h>
>  #endif
> +#ifdef __SWITCH__
> +#include <switch.h>
> +#endif
> 
>  static atomic_int cpu_flags = -1;
>  static atomic_int cpu_count = -1;
> @@ -247,6 +250,10 @@ int av_cpu_count(void)
>  #elif HAVE_WINRT
>      GetNativeSystemInfo(&sysinfo);
>      nb_cpus = sysinfo.dwNumberOfProcessors;
> +#elif defined(__SWITCH__)
> +    u64 core_mask = 0;
> +    Result rc = svcGetInfo(&core_mask, InfoType_CoreMask,
> CUR_PROCESS_HANDLE, 0); +    nb_cpus = R_SUCCEEDED(rc) ?
> av_popcount64(core_mask) : 3;
>  #endif
> 
>      if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))


-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory averne
@ 2024-05-30 20:38   ` Rémi Denis-Courmont
  2024-05-31 21:06     ` averne
  0 siblings, 1 reply; 37+ messages in thread
From: Rémi Denis-Courmont @ 2024-05-30 20:38 UTC (permalink / raw)
  To: ffmpeg-devel

Le torstaina 30. toukokuuta 2024, 22.43.03 EEST averne a écrit :
> This is useful eg. for memory-mapped buffers that need page-aligned memory,
> when dealing with hardware devices
> 
> Signed-off-by: averne <averne381@gmail.com>
> ---
>  libavutil/buffer.c | 31 +++++++++++++++++++++++++++++++
>  libavutil/buffer.h |  7 +++++++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/libavutil/buffer.c b/libavutil/buffer.c
> index e4562a79b1..b8e357f540 100644
> --- a/libavutil/buffer.c
> +++ b/libavutil/buffer.c
> @@ -16,9 +16,14 @@
>   * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA */
> 
> +#include "config.h"
> +
>  #include <stdatomic.h>
>  #include <stdint.h>
>  #include <string.h>
> +#if HAVE_MALLOC_H
> +#include <malloc.h>
> +#endif
> 
>  #include "avassert.h"
>  #include "buffer_internal.h"
> @@ -100,6 +105,32 @@ AVBufferRef *av_buffer_allocz(size_t size)
>      return ret;
>  }
> 
> +AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align)
> +{
> +    AVBufferRef *ret = NULL;
> +    uint8_t    *data = NULL;
> +
> +#if HAVE_POSIX_MEMALIGN
> +    if (posix_memalign((void **)&data, align, size))

Invalid cast.

> +        return NULL;
> +#elif HAVE_ALIGNED_MALLOC
> +    data = aligned_alloc(align, size);
> +#elif HAVE_MEMALIGN
> +    data = memalign(align, size);
> +#else
> +    return NULL;
> +#endif
> +
> +    if (!data)
> +        return NULL;
> +
> +    ret = av_buffer_create(data, size, av_buffer_default_free, NULL, 0);
> +    if (!ret)
> +        av_freep(&data);
> +
> +    return ret;
> +}
> +
>  AVBufferRef *av_buffer_ref(const AVBufferRef *buf)
>  {
>      AVBufferRef *ret = av_mallocz(sizeof(*ret));
> diff --git a/libavutil/buffer.h b/libavutil/buffer.h
> index e1ef5b7f07..8422ec3453 100644
> --- a/libavutil/buffer.h
> +++ b/libavutil/buffer.h
> @@ -107,6 +107,13 @@ AVBufferRef *av_buffer_alloc(size_t size);
>   */
>  AVBufferRef *av_buffer_allocz(size_t size);
> 
> +/**
> + * Allocate an AVBuffer of the given size and alignment.
> + *
> + * @return an AVBufferRef of given size or NULL when out of memory
> + */
> +AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align);
> +
>  /**
>   * Always treat the buffer as read-only, even when it has only one
>   * reference.


-- 
レミ・デニ-クールモン
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices averne
@ 2024-05-30 20:42   ` Rémi Denis-Courmont
  2024-05-31 21:06     ` averne
  0 siblings, 1 reply; 37+ messages in thread
From: Rémi Denis-Courmont @ 2024-05-30 20:42 UTC (permalink / raw)
  To: ffmpeg-devel

Le torstaina 30. toukokuuta 2024, 22.43.05 EEST averne a écrit :
> These files are taken with minimal modifications from nvidia's Linux4Tegra
> (L4T) tree. nvmap enables management of memory-mapped buffers for hardware
> devices. nvhost enables interaction with different hardware modules
> (multimedia engines, display engine, ...), through a common block, host1x.
> 
> Signed-off-by: averne <averne381@gmail.com>
> ---
>  libavutil/Makefile       |   2 +
>  libavutil/nvhost_ioctl.h | 511 +++++++++++++++++++++++++++++++++++++++
>  libavutil/nvmap_ioctl.h  | 451 ++++++++++++++++++++++++++++++++++
>  3 files changed, 964 insertions(+)
>  create mode 100644 libavutil/nvhost_ioctl.h
>  create mode 100644 libavutil/nvmap_ioctl.h
> 
> diff --git a/libavutil/Makefile b/libavutil/Makefile
> index 6e6fa8d800..9c112bc58a 100644
> --- a/libavutil/Makefile
> +++ b/libavutil/Makefile
> @@ -52,6 +52,8 @@ HEADERS = adler32.h                                       
>              \ hwcontext_videotoolbox.h                                    
>  \ hwcontext_vdpau.h                                             \
> hwcontext_vulkan.h                                            \ +         
> nvhost_ioctl.h                                                \ +         
> nvmap_ioctl.h                                                 \ iamf.h     
>                                                   \ imgutils.h             
>                                       \ intfloat.h                         
>                           \ diff --git a/libavutil/nvhost_ioctl.h
> b/libavutil/nvhost_ioctl.h
> new file mode 100644
> index 0000000000..b0bf3e3ae6
> --- /dev/null
> +++ b/libavutil/nvhost_ioctl.h
> @@ -0,0 +1,511 @@
> +/*
> + * include/uapi/linux/nvhost_ioctl.h

Well, then that should be provided by linux-libc-dev or equivalent. I don't 
think that this should be vendored into FFmpeg.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra averne
@ 2024-05-31  8:32   ` Rémi Denis-Courmont
  2024-05-31 21:06     ` averne
  2024-06-05 20:29   ` Mark Thompson
  1 sibling, 1 reply; 37+ messages in thread
From: Rémi Denis-Courmont @ 2024-05-31  8:32 UTC (permalink / raw)
  To: FFmpeg development discussions and patches



Le 30 mai 2024 22:43:07 GMT+03:00, averne <averne381@gmail.com> a écrit :
>This includes a new pixel format for nvtegra hardware frames, and several objects for interaction with hardware blocks.
>In particular, this contains code for channels (handles to hardware engines), maps (memory-mapped buffers shared with engines), and command buffers (abstraction for building command lists sent to the engines).
>
>Signed-off-by: averne <averne381@gmail.com>
>---
> configure                  |    2 +
> libavutil/Makefile         |    4 +
> libavutil/nvtegra.c        | 1035 ++++++++++++++++++++++++++++++++++++
> libavutil/nvtegra.h        |  258 +++++++++
> libavutil/nvtegra_host1x.h |   94 ++++
> libavutil/pixdesc.c        |    4 +
> libavutil/pixfmt.h         |    8 +
> 7 files changed, 1405 insertions(+)
> create mode 100644 libavutil/nvtegra.c
> create mode 100644 libavutil/nvtegra.h
> create mode 100644 libavutil/nvtegra_host1x.h
>
>diff --git a/configure b/configure
>index 09fb2aed1b..51f169bfbd 100755
>--- a/configure
>+++ b/configure
>@@ -361,6 +361,7 @@ External library support:
>   --disable-vdpau          disable Nvidia Video Decode and Presentation API for Unix code [autodetect]
>   --disable-videotoolbox   disable VideoToolbox code [autodetect]
>   --disable-vulkan         disable Vulkan code [autodetect]
>+  --enable-nvtegra         enable nvtegra code [no]
> 
> Toolchain options:
>   --arch=ARCH              select architecture [$arch]
>@@ -3151,6 +3152,7 @@ videotoolbox_hwaccel_deps="videotoolbox pthreads"
> videotoolbox_hwaccel_extralibs="-framework QuartzCore"
> vulkan_deps="threads"
> vulkan_deps_any="libdl LoadLibrary"
>+nvtegra_deps="gpl"
> 
> av1_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_AV1"
> av1_d3d11va_hwaccel_select="av1_decoder"
>diff --git a/libavutil/Makefile b/libavutil/Makefile
>index 9c112bc58a..733a23a8a3 100644
>--- a/libavutil/Makefile
>+++ b/libavutil/Makefile
>@@ -52,6 +52,7 @@ HEADERS = adler32.h                                                     \
>           hwcontext_videotoolbox.h                                      \
>           hwcontext_vdpau.h                                             \
>           hwcontext_vulkan.h                                            \
>+          nvtegra.h                                                     \
>           nvhost_ioctl.h                                                \
>           nvmap_ioctl.h                                                 \
>           iamf.h                                                        \
>@@ -209,6 +210,7 @@ OBJS-$(CONFIG_VDPAU)                    += hwcontext_vdpau.o
> OBJS-$(CONFIG_VULKAN)                   += hwcontext_vulkan.o vulkan.o
> 
> OBJS-$(!CONFIG_VULKAN)                  += hwcontext_stub.o
>+OBJS-$(CONFIG_NVTEGRA)                  += nvtegra.o
> 
> OBJS += $(COMPAT_OBJS:%=../compat/%)
> 
>@@ -230,6 +232,8 @@ SKIPHEADERS-$(CONFIG_VDPAU)            += hwcontext_vdpau.h
> SKIPHEADERS-$(CONFIG_VULKAN)           += hwcontext_vulkan.h vulkan.h   \
>                                           vulkan_functions.h            \
>                                           vulkan_loader.h
>+SKIPHEADERS-$(CONFIG_NVTEGRA)          += nvtegra.h                     \
>+                                          nvtegra_host1x.h
> 
> TESTPROGS = adler32                                                     \
>             aes                                                         \
>diff --git a/libavutil/nvtegra.c b/libavutil/nvtegra.c
>new file mode 100644
>index 0000000000..ad0bbbdfaa
>--- /dev/null
>+++ b/libavutil/nvtegra.c
>@@ -0,0 +1,1035 @@
>+/*
>+ * Copyright (c) 2024 averne <averne381@gmail.com>
>+ *
>+ * This file is part of FFmpeg.
>+ *
>+ * FFmpeg is free software; you can redistribute it and/or modify
>+ * it under the terms of the GNU General Public License as published by
>+ * the Free Software Foundation; either version 2 of the License, or
>+ * (at your option) any later version.
>+ *
>+ * FFmpeg is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+ * GNU General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU General Public License along
>+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
>+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>+ */
>+
>+#ifndef __SWITCH__
>+#   include <sys/ioctl.h>
>+#   include <sys/mman.h>
>+#   include <fcntl.h>
>+#   include <unistd.h>
>+#else
>+#   include <stdlib.h>
>+#   include <switch.h>
>+#endif
>+
>+#include <string.h>
>+
>+#include "buffer.h"
>+#include "log.h"
>+#include "error.h"
>+#include "mem.h"
>+#include "thread.h"
>+
>+#include "nvhost_ioctl.h"
>+#include "nvmap_ioctl.h"
>+#include "nvtegra_host1x.h"
>+
>+#include "nvtegra.h"
>+
>+/*
>+ * Tag used by the kernel to identify allocations.
>+ * Official software has been seen using 0x900, 0xf00, 0x1100, 0x1400, 0x4000.
>+ */
>+#define MEM_TAG (0xfeed)
>+
>+struct DriverState {
>+    int nvmap_fd, nvhost_fd;
>+};
>+
>+static AVMutex g_driver_init_mtx = AV_MUTEX_INITIALIZER;
>+static struct DriverState *g_driver_state = NULL;
>+static AVBufferRef *g_driver_state_ref = NULL;
>+
>+static void free_driver_fds(void *opaque, uint8_t *data) {
>+    if (!g_driver_state)
>+        return;
>+
>+#ifndef __SWITCH__
>+    if (g_driver_state->nvmap_fd > 0)
>+        close(g_driver_state->nvmap_fd);
>+
>+    if (g_driver_state->nvhost_fd > 0)
>+        close(g_driver_state->nvhost_fd);
>+#else
>+    nvFenceExit();
>+    nvMapExit();
>+    nvExit();
>+    mmuExit();
>+#endif
>+
>+    g_driver_init_mtx  = (AVMutex)AV_MUTEX_INITIALIZER;
>+    g_driver_state_ref = NULL;
>+    av_freep(&g_driver_state);
>+}
>+
>+static int init_driver_fds(void) {
>+    AVBufferRef *ref;
>+    struct DriverState *state;
>+    int err;
>+
>+    state = av_mallocz(sizeof(*state));
>+    if (!state)
>+        return AVERROR(ENOMEM);
>+
>+    ref = av_buffer_create((uint8_t *)state, sizeof(*state), free_driver_fds, NULL, 0);
>+    if (!state)
>+        return AVERROR(ENOMEM);
>+
>+    g_driver_state     = state;
>+    g_driver_state_ref = ref;
>+
>+#ifndef __SWITCH__
>+    err = open("/dev/nvmap", O_RDWR | O_SYNC);

There's helpers to open files, and you're missing the close on exec here. Also not clear why you need O_SYNC.

But did you consider just reimplementing libnvdec instead of putting the device driver directly in FFmpeg?

>+    if (err < 0)
>+        return AVERROR(errno);
>+    state->nvmap_fd = err;
>+
>+    err = open("/dev/nvhost-ctrl", O_RDWR | O_SYNC);
>+    if (err < 0)
>+        return AVERROR(errno);
>+    state->nvhost_fd = err;
>+#else
>+    err = nvInitialize();
>+    if (R_FAILED(err))
>+        return AVERROR(err);
>+
>+    err = nvMapInit();
>+    if (R_FAILED(err))
>+        return AVERROR(err);
>+    state->nvmap_fd = nvMapGetFd();
>+
>+    err = nvFenceInit();
>+    if (R_FAILED(err))
>+        return AVERROR(err);
>+    /* libnx doesn't export the nvhost-ctrl file descriptor */
>+
>+    err = mmuInitialize();
>+    if (R_FAILED(err))
>+        return AVERROR(err);
>+#endif
>+
>+    return 0;
>+}
>+
>+static inline int get_nvmap_fd(void) {
>+    if (!g_driver_state)
>+        return AVERROR_UNKNOWN;
>+
>+    if (!g_driver_state->nvmap_fd)
>+        return AVERROR_UNKNOWN;
>+
>+    return g_driver_state->nvmap_fd;
>+}
>+
>+static inline int get_nvhost_fd(void) {
>+    if (!g_driver_state)
>+        return AVERROR_UNKNOWN;
>+
>+    if (!g_driver_state->nvhost_fd)
>+        return AVERROR_UNKNOWN;
>+
>+    return g_driver_state->nvhost_fd;
>+}
>+
>+AVBufferRef *av_nvtegra_driver_init(void) {
>+    AVBufferRef *out = NULL;
>+    int err;
>+
>+    /*
>+     * We have to do this overly complex dance of putting driver fds in a refcounted struct,
>+     * otherwise initializing multiple hwcontexts would leak fds
>+     */
>+
>+    err = ff_mutex_lock(&g_driver_init_mtx);
>+    if (err != 0)
>+        goto exit;
>+
>+    if (g_driver_state_ref) {
>+        out = av_buffer_ref(g_driver_state_ref);
>+        goto exit;
>+    }
>+
>+    err = init_driver_fds();
>+    if (err < 0) {
>+        /* In case memory allocations failed, call the destructor ourselves */
>+        av_buffer_unref(&g_driver_state_ref);
>+        free_driver_fds(NULL, NULL);
>+        goto exit;
>+    }
>+
>+    out = g_driver_state_ref;
>+
>+exit:
>+    ff_mutex_unlock(&g_driver_init_mtx);
>+    return out;
>+}
>+
>+int av_nvtegra_channel_open(AVNVTegraChannel *channel, const char *dev) {
>+    int err;
>+#ifndef __SWITCH__
>+    struct nvhost_get_param_arg args;
>+
>+    err = open(dev, O_RDWR);
>+    if (err < 0)
>+        return AVERROR(errno);
>+
>+    channel->fd = err;
>+
>+    args = (struct nvhost_get_param_arg){0};
>+
>+    err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_GET_SYNCPOINT, &args);
>+    if (err < 0)
>+        goto fail;
>+
>+    channel->syncpt = args.value;
>+
>+    return 0;
>+
>+fail:
>+    close(channel->fd);
>+    return AVERROR(errno);
>+#else
>+    err = nvChannelCreate(&channel->channel, dev);
>+    if (R_FAILED(err))
>+        return AVERROR(err);
>+
>+    err = nvioctlChannel_GetSyncpt(channel->channel.fd, 0, &channel->syncpt);
>+    if (R_FAILED(err))
>+        goto fail;
>+
>+    return 0;
>+
>+fail:
>+    nvChannelClose(&channel->channel);
>+    return AVERROR(err);
>+#endif
>+}
>+
>+int av_nvtegra_channel_close(AVNVTegraChannel *channel) {
>+#ifndef __SWITCH__
>+    if (!channel->fd)
>+        return 0;
>+
>+    return close(channel->fd);
>+#else
>+    nvChannelClose(&channel->channel);
>+    return 0;
>+#endif
>+}
>+
>+int av_nvtegra_channel_get_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t *clock_rate) {
>+    int err;
>+#ifndef __SWITCH__
>+    struct nvhost_clk_rate_args args;
>+
>+    args = (struct nvhost_clk_rate_args){
>+        .moduleid = moduleid,
>+    };
>+
>+    err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_GET_CLK_RATE, &args);
>+    if (err < 0)
>+        return AVERROR(errno);
>+
>+    if (clock_rate)
>+        *clock_rate = args.rate;
>+
>+    return 0;
>+#else
>+    uint32_t tmp;
>+
>+    err = AVERROR(nvioctlChannel_GetModuleClockRate(channel->channel.fd, moduleid, &tmp));
>+    if (err < 0)
>+        return err;
>+
>+    if (clock_rate)
>+        *clock_rate = tmp * 1000;
>+
>+    return 0;
>+#endif
>+}
>+
>+int av_nvtegra_channel_set_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t clock_rate) {
>+#ifndef __SWITCH__
>+    struct nvhost_clk_rate_args args;
>+
>+    args = (struct nvhost_clk_rate_args){
>+        .rate     = clock_rate,
>+        .moduleid = moduleid,
>+    };
>+
>+    return (ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SET_CLK_RATE, &args) < 0) ? AVERROR(errno) : 0;
>+#else
>+    return AVERROR(nvioctlChannel_SetModuleClockRate(channel->channel.fd, moduleid, clock_rate / 1000));
>+#endif
>+}
>+
>+int av_nvtegra_channel_submit(AVNVTegraChannel *channel, AVNVTegraCmdbuf *cmdbuf, uint32_t *fence) {
>+    int err;
>+#ifndef __SWITCH__
>+    struct nvhost_submit_args args;
>+
>+    args = (struct nvhost_submit_args){
>+        .submit_version          = NVHOST_SUBMIT_VERSION_V2,
>+        .num_syncpt_incrs        = cmdbuf->num_syncpt_incrs,
>+        .num_cmdbufs             = cmdbuf->num_cmdbufs,
>+        .num_relocs              = cmdbuf->num_relocs,
>+        .num_waitchks            = cmdbuf->num_waitchks,
>+        .timeout                 = 0,
>+        .flags                   = 0,
>+        .fence                   = 0,
>+        .syncpt_incrs            = (uintptr_t)cmdbuf->syncpt_incrs,
>+        .cmdbuf_exts             = (uintptr_t)cmdbuf->cmdbuf_exts,
>+        .checksum_methods        = 0,
>+        .checksum_falcon_methods = 0,
>+        .pad                     = { 0 },
>+        .reloc_types             = (uintptr_t)cmdbuf->reloc_types,
>+        .cmdbufs                 = (uintptr_t)cmdbuf->cmdbufs,
>+        .relocs                  = (uintptr_t)cmdbuf->relocs,
>+        .reloc_shifts            = (uintptr_t)cmdbuf->reloc_shifts,
>+        .waitchks                = (uintptr_t)cmdbuf->waitchks,
>+        .waitbases               = 0,
>+        .class_ids               = (uintptr_t)cmdbuf->class_ids,
>+        .fences                  = (uintptr_t)cmdbuf->fences,
>+    };
>+
>+    err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SUBMIT, &args);
>+    if (err < 0)
>+        return AVERROR(errno);
>+
>+    if (fence)
>+        *fence = args.fence;
>+
>+    return 0;
>+#else
>+    nvioctl_fence tmp;
>+
>+    err = nvioctlChannel_Submit(channel->channel.fd, (nvioctl_cmdbuf *)cmdbuf->cmdbufs, cmdbuf->num_cmdbufs,
>+                                NULL, NULL, 0, (nvioctl_syncpt_incr *)cmdbuf->syncpt_incrs, cmdbuf->num_syncpt_incrs,
>+                                &tmp, 1);
>+    if (R_FAILED(err))
>+        return AVERROR(err);
>+
>+    if (fence)
>+        *fence = tmp.value;
>+
>+    return 0;
>+#endif
>+}
>+
>+int av_nvtegra_channel_set_submit_timeout(AVNVTegraChannel *channel, uint32_t timeout_ms) {
>+#ifndef __SWITCH__
>+    struct nvhost_set_timeout_args args;
>+
>+    args = (struct nvhost_set_timeout_args){
>+        .timeout = timeout_ms,
>+    };
>+
>+    return (ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SET_TIMEOUT, &args) < 0) ? AVERROR(errno) : 0;
>+#else
>+    return AVERROR(nvioctlChannel_SetSubmitTimeout(channel->channel.fd, timeout_ms));
>+#endif
>+}
>+
>+int av_nvtegra_syncpt_wait(AVNVTegraChannel *channel, uint32_t threshold, int32_t timeout) {
>+#ifndef __SWITCH__
>+    struct nvhost_ctrl_syncpt_waitex_args args = {
>+        .id      = channel->syncpt,
>+        .thresh  = threshold,
>+        .timeout = timeout,
>+    };
>+
>+    return (ioctl(get_nvhost_fd(), NVHOST_IOCTL_CTRL_SYNCPT_WAITEX, &args) < 0) ? AVERROR(errno) : 0;
>+#else
>+    NvFence fence;
>+
>+    fence = (NvFence){
>+        .id    = channel->syncpt,
>+        .value = threshold,
>+    };
>+
>+    return AVERROR(nvFenceWait(&fence, timeout));
>+#endif
>+}
>+
>+#ifdef __SWITCH__
>+static inline bool convert_cache_flags(uint32_t flags) {
>+    /* Return whether the map should be CPU-cacheable */
>+    switch (flags & NVMAP_HANDLE_CACHE_FLAG) {
>+        case NVMAP_HANDLE_INNER_CACHEABLE:
>+        case NVMAP_HANDLE_CACHEABLE:
>+            return true;
>+        default:
>+            return false;
>+    }
>+}
>+#endif
>+
>+int av_nvtegra_map_allocate(AVNVTegraMap *map, AVNVTegraChannel *channel, uint32_t size,
>+                            uint32_t align, int heap_mask, int flags)
>+{
>+#ifndef __SWITCH__
>+    struct nvmap_create_handle create_args;
>+    struct nvmap_alloc_handle alloc_args;
>+    int err;
>+
>+    create_args = (struct nvmap_create_handle){
>+        .size   = size,
>+    };
>+
>+    err = ioctl(get_nvmap_fd(), NVMAP_IOC_CREATE, &create_args);
>+    if (err < 0)
>+        return AVERROR(errno);
>+
>+    map->size   = size;
>+    map->handle = create_args.handle;
>+
>+    alloc_args = (struct nvmap_alloc_handle){
>+        .handle    = create_args.handle,
>+        .heap_mask = heap_mask,
>+        .flags     = flags | (MEM_TAG << 16),
>+        .align     = align,
>+    };
>+
>+    err = ioctl(get_nvmap_fd(), NVMAP_IOC_ALLOC, &alloc_args);
>+    if (err < 0)
>+        goto fail;
>+
>+    return 0;
>+
>+fail:
>+    av_nvtegra_map_free(map);
>+    return AVERROR(errno);
>+#else
>+    void *mem;
>+
>+    map->owner = channel->channel.fd;
>+
>+    size = FFALIGN(size, 0x1000);
>+
>+    mem = aligned_alloc(FFALIGN(align, 0x1000), size);
>+    if (!mem)
>+        return AVERROR(ENOMEM);
>+
>+    return AVERROR(nvMapCreate(&map->map, mem, size, 0x10000, NvKind_Pitch,
>+                               convert_cache_flags(flags)));
>+#endif
>+}
>+
>+int av_nvtegra_map_free(AVNVTegraMap *map) {
>+#ifndef __SWITCH__
>+    int err;
>+
>+    if (!map->handle)
>+        return 0;
>+
>+    err = ioctl(get_nvmap_fd(), NVMAP_IOC_FREE, map->handle);
>+    if (err < 0)
>+        return AVERROR(errno);
>+
>+    map->handle = 0;
>+
>+    return 0;
>+#else
>+    void *addr = map->map.cpu_addr;
>+
>+    if (!map->map.cpu_addr)
>+        return 0;
>+
>+    nvMapClose(&map->map);
>+    free(addr);
>+    return 0;
>+#endif
>+}
>+
>+int av_nvtegra_map_from_va(AVNVTegraMap *map, AVNVTegraChannel *owner, void *mem,
>+                           uint32_t size, uint32_t align, uint32_t flags)
>+{
>+#ifndef __SWITCH__
>+    struct nvmap_create_handle_from_va args;
>+    int err;
>+
>+    args = (struct nvmap_create_handle_from_va){
>+        .va    = (uintptr_t)mem,
>+        .size  = size,
>+        .flags = flags | (MEM_TAG << 16),
>+    };
>+
>+    err = ioctl(get_nvmap_fd(), NVMAP_IOC_FROM_VA, &args);
>+    if (err < 0)
>+        return AVERROR(errno);
>+
>+    map->cpu_addr = mem;
>+    map->size     = size;
>+    map->handle   = args.handle;
>+
>+    return 0;
>+#else
>+
>+    map->owner = owner->channel.fd;
>+
>+    return AVERROR(nvMapCreate(&map->map, mem, FFALIGN(size, 0x1000), 0x10000, NvKind_Pitch,
>+                               convert_cache_flags(flags)));;
>+#endif
>+}
>+
>+int av_nvtegra_map_close(AVNVTegraMap *map) {
>+#ifndef __SWITCH__
>+    return av_nvtegra_map_free(map);
>+#else
>+    nvMapClose(&map->map);
>+    return 0;
>+#endif
>+}
>+
>+int av_nvtegra_map_map(AVNVTegraMap *map) {
>+#ifndef __SWITCH__
>+    void *addr;
>+
>+    addr = mmap(NULL, map->size, PROT_READ | PROT_WRITE, MAP_SHARED, map->handle, 0);
>+    if (addr == MAP_FAILED)
>+        return AVERROR(errno);
>+
>+    map->cpu_addr = addr;
>+
>+    return 0;
>+#else
>+    nvioctl_command_buffer_map params;
>+    int err;
>+
>+    params = (nvioctl_command_buffer_map){
>+        .handle = map->map.handle,
>+    };
>+
>+    err = nvioctlChannel_MapCommandBuffer(map->owner, &params, 1, false);
>+    if (R_FAILED(err))
>+        return AVERROR(err);
>+
>+    map->iova = params.iova;
>+
>+    return 0;
>+#endif
>+}
>+
>+int av_nvtegra_map_unmap(AVNVTegraMap *map) {
>+    int err;
>+#ifndef __SWITCH__
>+    if (!map->cpu_addr)
>+        return 0;
>+
>+    err = munmap(map->cpu_addr, map->size);
>+    if (err < 0)
>+        return AVERROR(errno);
>+
>+    map->cpu_addr = NULL;
>+
>+    return 0;
>+#else
>+    nvioctl_command_buffer_map params;
>+
>+    if (!map->iova)
>+        return 0;
>+
>+    params = (nvioctl_command_buffer_map){
>+        .handle = map->map.handle,
>+        .iova   = map->iova,
>+    };
>+
>+    err = nvioctlChannel_UnmapCommandBuffer(map->owner, &params, 1, false);
>+    if (R_FAILED(err))
>+        return AVERROR(err);
>+
>+    map->iova = 0;
>+
>+    return 0;
>+#endif
>+}
>+
>+int av_nvtegra_map_cache_op(AVNVTegraMap *map, int op, void *addr, size_t len) {
>+#ifndef __SWITCH__
>+    struct nvmap_cache_op args;
>+
>+    args = (struct nvmap_cache_op){
>+        .addr   = (uintptr_t)addr,
>+        .len    = len,
>+        .handle = av_nvtegra_map_get_handle(map),
>+        .op     = op,
>+    };
>+
>+    return AVERROR(ioctl(get_nvmap_fd(), NVMAP_IOC_CACHE, &args));
>+#else
>+    if (!map->map.is_cpu_cacheable)
>+        return 0;
>+
>+    switch (op) {
>+        case NVMAP_CACHE_OP_WB:
>+            armDCacheClean(addr, len);
>+            break;
>+        default:
>+        case NVMAP_CACHE_OP_INV:
>+        case NVMAP_CACHE_OP_WB_INV:
>+            /* libnx internally performs a clean-invalidate, since invalidate is a privileged instruction */
>+            armDCacheFlush(addr, len);
>+            break;
>+    }
>+
>+    return 0;
>+#endif
>+}
>+
>+int av_nvtegra_map_realloc(AVNVTegraMap *map, uint32_t size, uint32_t align,
>+                           int heap_mask, int flags)
>+{
>+    AVNVTegraChannel channel;
>+    AVNVTegraMap tmp = {0};
>+    int err;
>+
>+    if (av_nvtegra_map_get_size(map) >= size)
>+        return 0;
>+
>+    /* Dummy channel object to hold the owner fd */
>+    channel = (AVNVTegraChannel){
>+#ifdef __SWITCH__
>+        .channel.fd = map->owner,
>+#endif
>+    };
>+
>+    err = av_nvtegra_map_create(&tmp, &channel, size, align, heap_mask, flags);
>+    if (err < 0)
>+        goto fail;
>+
>+    memcpy(av_nvtegra_map_get_addr(&tmp), av_nvtegra_map_get_addr(map), av_nvtegra_map_get_size(map));
>+
>+    err = av_nvtegra_map_destroy(map);
>+    if (err < 0)
>+        goto fail;
>+
>+    *map = tmp;
>+
>+    return 0;
>+
>+fail:
>+    av_nvtegra_map_destroy(&tmp);
>+    return err;
>+}
>+
>+int av_nvtegra_cmdbuf_init(AVNVTegraCmdbuf *cmdbuf) {
>+    cmdbuf->num_cmdbufs      = 0;
>+#ifndef __SWITCH__
>+    cmdbuf->num_relocs       = 0;
>+    cmdbuf->num_waitchks     = 0;
>+#endif
>+    cmdbuf->num_syncpt_incrs = 0;
>+
>+#define NUM_INITIAL_CMDBUFS      3
>+#define NUM_INITIAL_RELOCS       15
>+#define NUM_INITIAL_SYNCPT_INCRS 3
>+
>+    cmdbuf->cmdbufs      = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->cmdbufs));
>+#ifndef __SWITCH__
>+    cmdbuf->cmdbuf_exts  = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->cmdbuf_exts));
>+    cmdbuf->class_ids    = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->class_ids));
>+#endif
>+
>+#ifndef __SWITCH__
>+    if (!cmdbuf->cmdbufs || !cmdbuf->cmdbuf_exts || !cmdbuf->class_ids)
>+#else
>+    if (!cmdbuf->cmdbufs)
>+#endif
>+        return AVERROR(ENOMEM);
>+
>+#ifndef __SWITCH__
>+    cmdbuf->relocs       = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->relocs));
>+    cmdbuf->reloc_types  = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->reloc_types));
>+    cmdbuf->reloc_shifts = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->reloc_shifts));
>+    if (!cmdbuf->relocs || !cmdbuf->reloc_types || !cmdbuf->reloc_shifts)
>+        return AVERROR(ENOMEM);
>+#endif
>+
>+    cmdbuf->syncpt_incrs = av_malloc_array(NUM_INITIAL_SYNCPT_INCRS, sizeof(*cmdbuf->syncpt_incrs));
>+#ifndef __SWITCH__
>+    cmdbuf->fences       = av_malloc_array(NUM_INITIAL_SYNCPT_INCRS, sizeof(*cmdbuf->fences));
>+#endif
>+
>+#ifndef __SWITCH__
>+    if (!cmdbuf->syncpt_incrs || !cmdbuf->fences)
>+#else
>+    if (!cmdbuf->syncpt_incrs)
>+#endif
>+        return AVERROR(ENOMEM);
>+
>+    return 0;
>+}
>+
>+int av_nvtegra_cmdbuf_deinit(AVNVTegraCmdbuf *cmdbuf) {
>+    av_freep(&cmdbuf->cmdbufs);
>+    av_freep(&cmdbuf->syncpt_incrs);
>+
>+#ifndef __SWITCH__
>+    av_freep(&cmdbuf->cmdbuf_exts), av_freep(&cmdbuf->class_ids);
>+    av_freep(&cmdbuf->relocs), av_freep(&cmdbuf->reloc_types), av_freep(&cmdbuf->reloc_shifts);
>+    av_freep(&cmdbuf->fences);
>+#endif
>+
>+    return 0;
>+}
>+
>+int av_nvtegra_cmdbuf_add_memory(AVNVTegraCmdbuf *cmdbuf, AVNVTegraMap *map, uint32_t offset, uint32_t size) {
>+    uint8_t *mem;
>+
>+    mem = av_nvtegra_map_get_addr(map);
>+
>+    cmdbuf->map        = map;
>+    cmdbuf->mem_offset = offset;
>+    cmdbuf->mem_size   = size;
>+
>+    cmdbuf->cur_word = (uint32_t *)(mem + cmdbuf->mem_offset);
>+
>+    return 0;
>+}
>+
>+int av_nvtegra_cmdbuf_clear(AVNVTegraCmdbuf *cmdbuf) {
>+    uint8_t *mem;
>+
>+    mem = av_nvtegra_map_get_addr(cmdbuf->map);
>+
>+    cmdbuf->num_cmdbufs = 0, cmdbuf->num_syncpt_incrs = 0;
>+#ifndef __SWITCH__
>+    cmdbuf->num_relocs = 0, cmdbuf->num_waitchks = 0;
>+#endif
>+
>+    cmdbuf->cur_word = (uint32_t *)(mem + cmdbuf->mem_offset);
>+    return 0;
>+}
>+
>+int av_nvtegra_cmdbuf_begin(AVNVTegraCmdbuf *cmdbuf, uint32_t class_id) {
>+    uint8_t *mem;
>+    void *tmp1;
>+#ifndef __SWITCH__
>+    void *tmp2, *tmp3;
>+#endif
>+
>+    mem = av_nvtegra_map_get_addr(cmdbuf->map);
>+
>+    tmp1 = av_realloc_array(cmdbuf->cmdbufs,     cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->cmdbufs));
>+#ifndef __SWITCH__
>+    tmp2 = av_realloc_array(cmdbuf->cmdbuf_exts, cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->cmdbuf_exts));
>+    tmp3 = av_realloc_array(cmdbuf->class_ids,   cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->class_ids));
>+#endif
>+
>+#ifndef __SWITCH__
>+    if (!tmp1 || !tmp2 || !tmp3)
>+#else
>+    if (!tmp1)
>+#endif
>+        return AVERROR(ENOMEM);
>+
>+    cmdbuf->cmdbufs = tmp1;
>+
>+#ifndef __SWITCH__
>+    cmdbuf->cmdbuf_exts = tmp2, cmdbuf->class_ids = tmp3;
>+#endif
>+
>+    cmdbuf->cmdbufs[cmdbuf->num_cmdbufs] = (struct nvhost_cmdbuf){
>+        .mem       = av_nvtegra_map_get_handle(cmdbuf->map),
>+        .offset    = (uint8_t *)cmdbuf->cur_word - mem,
>+    };
>+
>+#ifndef __SWITCH__
>+    cmdbuf->cmdbuf_exts[cmdbuf->num_cmdbufs] = (struct nvhost_cmdbuf_ext){
>+        .pre_fence = -1,
>+    };
>+
>+    cmdbuf->class_ids[cmdbuf->num_cmdbufs] = class_id;
>+#endif
>+
>+#ifdef __SWITCH__
>+    if (cmdbuf->num_cmdbufs == 0)
>+        av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_setclass(class_id, 0, 0));
>+#endif
>+
>+    return 0;
>+}
>+
>+int av_nvtegra_cmdbuf_end(AVNVTegraCmdbuf *cmdbuf) {
>+    cmdbuf->num_cmdbufs++;
>+    return 0;
>+}
>+
>+int av_nvtegra_cmdbuf_push_word(AVNVTegraCmdbuf *cmdbuf, uint32_t word) {
>+    uintptr_t mem_start = (uintptr_t)av_nvtegra_map_get_addr(cmdbuf->map) + cmdbuf->mem_offset;
>+
>+    if ((uintptr_t)cmdbuf->cur_word - mem_start >= cmdbuf->mem_size)
>+        return AVERROR(ENOMEM);
>+
>+    *cmdbuf->cur_word++ = word;
>+    cmdbuf->cmdbufs[cmdbuf->num_cmdbufs].words += 1;
>+    return 0;
>+}
>+
>+int av_nvtegra_cmdbuf_push_value(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, uint32_t word) {
>+    int err;
>+
>+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_incr(NV_THI_METHOD0>>2, 2));
>+    if (err < 0)
>+        return err;
>+
>+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, offset);
>+    if (err < 0)
>+        return err;
>+
>+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, word);
>+    if (err < 0)
>+        return err;
>+
>+    return 0;
>+}
>+
>+int av_nvtegra_cmdbuf_push_reloc(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, AVNVTegraMap *target, uint32_t target_offset,
>+                                 int reloc_type, int shift)
>+{
>+    int err;
>+#ifndef __SWITCH__
>+    uint8_t *mem;
>+    void *tmp1, *tmp2, *tmp3;
>+
>+    mem = av_nvtegra_map_get_addr(cmdbuf->map);
>+
>+    tmp1 = av_realloc_array(cmdbuf->relocs,       cmdbuf->num_relocs + 1, sizeof(*cmdbuf->relocs));
>+    tmp2 = av_realloc_array(cmdbuf->reloc_types,  cmdbuf->num_relocs + 1, sizeof(*cmdbuf->reloc_types));
>+    tmp3 = av_realloc_array(cmdbuf->reloc_shifts, cmdbuf->num_relocs + 1, sizeof(*cmdbuf->reloc_shifts));
>+    if (!tmp1 || !tmp2 || !tmp3)
>+        return AVERROR(ENOMEM);
>+
>+    cmdbuf->relocs = tmp1, cmdbuf->reloc_types = tmp2, cmdbuf->reloc_shifts = tmp3;
>+
>+    err = av_nvtegra_cmdbuf_push_value(cmdbuf, offset, 0xdeadbeef);
>+    if (err < 0)
>+        return err;
>+
>+    cmdbuf->relocs[cmdbuf->num_relocs]       = (struct nvhost_reloc){
>+        .cmdbuf_mem    = av_nvtegra_map_get_handle(cmdbuf->map),
>+        .cmdbuf_offset = (uint8_t *)cmdbuf->cur_word - mem - sizeof(uint32_t),
>+        .target        = av_nvtegra_map_get_handle(target),
>+        .target_offset = target_offset,
>+    };
>+
>+    cmdbuf->reloc_types[cmdbuf->num_relocs]  = (struct nvhost_reloc_type){
>+        .reloc_type    = reloc_type,
>+    };
>+
>+    cmdbuf->reloc_shifts[cmdbuf->num_relocs] = (struct nvhost_reloc_shift){
>+        .shift         = shift,
>+    };
>+
>+    cmdbuf->num_relocs++;
>+
>+    return 0;
>+#else
>+    err = av_nvtegra_cmdbuf_push_value(cmdbuf, offset, (target->iova + target_offset) >> shift);
>+    if (err < 0)
>+        return err;
>+
>+    return 0;
>+#endif
>+}
>+
>+int av_nvtegra_cmdbuf_push_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt) {
>+    int err;
>+
>+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_nonincr(NV_THI_INCR_SYNCPT>>2, 1));
>+    if (err < 0)
>+        return err;
>+
>+    err = av_nvtegra_cmdbuf_push_word(cmdbuf,
>+                                      AV_NVTEGRA_VALUE(NV_THI_INCR_SYNCPT, INDX, syncpt) |
>+                                      AV_NVTEGRA_ENUM (NV_THI_INCR_SYNCPT, COND, OP_DONE));
>+    if (err < 0)
>+        return err;
>+
>+    return 0;
>+}
>+
>+int av_nvtegra_cmdbuf_push_wait(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence) {
>+    int err;
>+
>+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_setclass(HOST1X_CLASS_HOST1X, 0, 0));
>+    if (err < 0)
>+        return err;
>+
>+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_mask(NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD>>2,
>+                                      (1<<(NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD - NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD)) |
>+                                      (1<<(NV_CLASS_HOST_WAIT_SYNCPT         - NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD))));
>+    if (err < 0)
>+        return err;
>+
>+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, fence);
>+    if (err < 0)
>+        return err;
>+
>+    err = av_nvtegra_cmdbuf_push_word(cmdbuf, syncpt);
>+    if (err < 0)
>+        return err;
>+
>+    return 0;
>+}
>+
>+int av_nvtegra_cmdbuf_add_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence)
>+{
>+    void *tmp1;
>+#ifndef __SWITCH__
>+    void *tmp2;
>+#endif
>+
>+    tmp1 = av_realloc_array(cmdbuf->syncpt_incrs, cmdbuf->num_syncpt_incrs + 1, sizeof(*cmdbuf->syncpt_incrs));
>+#ifndef __SWITCH__
>+    tmp2 = av_realloc_array(cmdbuf->fences,       cmdbuf->num_syncpt_incrs + 1, sizeof(*cmdbuf->fences));
>+#endif
>+
>+#ifndef __SWITCH__
>+    if (!tmp1 || !tmp2)
>+#else
>+    if (!tmp1)
>+#endif
>+        return AVERROR(ENOMEM);
>+
>+    cmdbuf->syncpt_incrs = tmp1;
>+#ifndef __SWITCH__
>+    cmdbuf->fences       = tmp2;
>+#endif
>+
>+    cmdbuf->syncpt_incrs[cmdbuf->num_syncpt_incrs] = (struct nvhost_syncpt_incr){
>+        .syncpt_id    = syncpt,
>+        .syncpt_incrs = 1,
>+    };
>+
>+#ifndef __SWITCH__
>+    cmdbuf->fences[cmdbuf->num_syncpt_incrs]       = fence;
>+#endif
>+
>+    cmdbuf->num_syncpt_incrs++;
>+
>+    return av_nvtegra_cmdbuf_push_syncpt_incr(cmdbuf, syncpt);
>+}
>+
>+int av_nvtegra_cmdbuf_add_waitchk(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence) {
>+#ifndef __SWITCH__
>+    uint8_t *mem;
>+    void *tmp;
>+
>+    mem = av_nvtegra_map_get_addr(cmdbuf->map);
>+
>+    tmp = av_realloc_array(cmdbuf->waitchks, cmdbuf->num_waitchks + 1, sizeof(*cmdbuf->waitchks));
>+    if (!tmp)
>+        return AVERROR(ENOMEM);
>+
>+    cmdbuf->waitchks = tmp;
>+
>+    cmdbuf->waitchks[cmdbuf->num_waitchks] = (struct nvhost_waitchk){
>+        .mem       = av_nvtegra_map_get_handle(cmdbuf->map),
>+        .offset    = (uint8_t *)cmdbuf->cur_word - mem - sizeof(uint32_t),
>+        .syncpt_id = syncpt,
>+        .thresh    = fence,
>+    };
>+
>+    cmdbuf->num_waitchks++;
>+#endif
>+
>+    return av_nvtegra_cmdbuf_push_wait(cmdbuf, syncpt, fence);
>+}
>+
>+static void nvtegra_job_free(void *opaque, uint8_t *data) {
>+    AVNVTegraJob *job = (AVNVTegraJob *)data;
>+
>+    if (!job)
>+        return;
>+
>+    av_nvtegra_cmdbuf_deinit(&job->cmdbuf);
>+    av_nvtegra_map_destroy(&job->input_map);
>+
>+    av_freep(&job);
>+}
>+
>+static AVBufferRef *nvtegra_job_alloc(void *opaque, size_t size) {
>+    AVNVTegraJobPool *pool = opaque;
>+
>+    AVBufferRef  *buffer;
>+    AVNVTegraJob *job;
>+    int err;
>+
>+    job = av_mallocz(sizeof(*job));
>+    if (!job)
>+        return NULL;
>+
>+    err = av_nvtegra_map_create(&job->input_map, pool->channel, pool->input_map_size, 0x100,
>+                                NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
>+    if (err < 0)
>+        goto fail;
>+
>+    err = av_nvtegra_cmdbuf_init(&job->cmdbuf);
>+    if (err < 0)
>+        goto fail;
>+
>+    err = av_nvtegra_cmdbuf_add_memory(&job->cmdbuf, &job->input_map, pool->cmdbuf_off, pool->max_cmdbuf_size);
>+    if (err < 0)
>+        goto fail;
>+
>+    buffer = av_buffer_create((uint8_t *)job, sizeof(*job), nvtegra_job_free, pool, 0);
>+    if (!buffer)
>+        goto fail;
>+
>+    return buffer;
>+
>+fail:
>+    av_nvtegra_cmdbuf_deinit(&job->cmdbuf);
>+    av_nvtegra_map_destroy(&job->input_map);
>+    av_freep(job);
>+    return NULL;
>+}
>+
>+int av_nvtegra_job_pool_init(AVNVTegraJobPool *pool, AVNVTegraChannel *channel,
>+                             size_t input_map_size, off_t cmdbuf_off, size_t max_cmdbuf_size)
>+{
>+    pool->channel         = channel;
>+    pool->input_map_size  = input_map_size;
>+    pool->cmdbuf_off      = cmdbuf_off;
>+    pool->max_cmdbuf_size = max_cmdbuf_size;
>+    pool->pool            = av_buffer_pool_init2(sizeof(AVNVTegraJob), pool,
>+                                                 nvtegra_job_alloc, NULL);
>+    if (!pool->pool)
>+        return AVERROR(ENOMEM);
>+
>+    return 0;
>+}
>+
>+int av_nvtegra_job_pool_uninit(AVNVTegraJobPool *pool) {
>+    av_buffer_pool_uninit(&pool->pool);
>+    return 0;
>+}
>+
>+AVBufferRef *av_nvtegra_job_pool_get(AVNVTegraJobPool *pool) {
>+    return av_buffer_pool_get(pool->pool);
>+}
>+
>+int av_nvtegra_job_submit(AVNVTegraJobPool *pool, AVNVTegraJob *job) {
>+    return av_nvtegra_channel_submit(pool->channel, &job->cmdbuf, &job->fence);
>+}
>+
>+int av_nvtegra_job_wait(AVNVTegraJobPool *pool, AVNVTegraJob *job, int timeout) {
>+    return av_nvtegra_syncpt_wait(pool->channel, job->fence, timeout);
>+}
>diff --git a/libavutil/nvtegra.h b/libavutil/nvtegra.h
>new file mode 100644
>index 0000000000..3b63335d6c
>--- /dev/null
>+++ b/libavutil/nvtegra.h
>@@ -0,0 +1,258 @@
>+/*
>+ * Copyright (c) 2024 averne <averne381@gmail.com>
>+ *
>+ * This file is part of FFmpeg.
>+ *
>+ * FFmpeg is free software; you can redistribute it and/or modify
>+ * it under the terms of the GNU General Public License as published by
>+ * the Free Software Foundation; either version 2 of the License, or
>+ * (at your option) any later version.
>+ *
>+ * FFmpeg is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+ * GNU General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU General Public License along
>+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
>+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>+ */
>+
>+#ifndef AVUTIL_NVTEGRA_H
>+#define AVUTIL_NVTEGRA_H
>+
>+#include <stdint.h>
>+#include <stdbool.h>
>+
>+#include "buffer.h"
>+
>+#include "nvhost_ioctl.h"
>+#include "nvmap_ioctl.h"
>+
>+typedef struct AVNVTegraChannel {
>+#ifndef __SWITCH__
>+    int fd;
>+    int module_id;
>+#else
>+    NvChannel channel;
>+#endif
>+
>+    uint32_t syncpt;
>+
>+#ifdef __SWITCH__
>+    MmuRequest mmu_request;
>+#endif
>+    uint32_t clock;
>+} AVNVTegraChannel;
>+
>+typedef struct AVNVTegraMap {
>+#ifndef __SWITCH__
>+    uint32_t handle;
>+    uint32_t size;
>+    void *cpu_addr;
>+#else
>+    NvMap map;
>+    uint32_t iova;
>+    uint32_t owner;
>+#endif
>+    bool is_linear;
>+} AVNVTegraMap;
>+
>+typedef struct AVNVTegraCmdbuf {
>+    AVNVTegraMap *map;
>+
>+    uint32_t mem_offset, mem_size;
>+
>+    uint32_t *cur_word;
>+
>+    struct nvhost_cmdbuf       *cmdbufs;
>+#ifndef __SWITCH__
>+    struct nvhost_cmdbuf_ext   *cmdbuf_exts;
>+    uint32_t                   *class_ids;
>+#endif
>+    uint32_t num_cmdbufs;
>+
>+#ifndef __SWITCH__
>+    struct nvhost_reloc        *relocs;
>+    struct nvhost_reloc_type   *reloc_types;
>+    struct nvhost_reloc_shift  *reloc_shifts;
>+    uint32_t num_relocs;
>+#endif
>+
>+    struct nvhost_syncpt_incr  *syncpt_incrs;
>+#ifndef __SWITCH__
>+    uint32_t                   *fences;
>+#endif
>+    uint32_t num_syncpt_incrs;
>+
>+#ifndef __SWITCH__
>+    struct nvhost_waitchk      *waitchks;
>+    uint32_t num_waitchks;
>+#endif
>+} AVNVTegraCmdbuf;
>+
>+typedef struct AVNVTegraJobPool {
>+    /*
>+     * Pool object for job allocation
>+     */
>+    AVBufferPool *pool;
>+
>+    /*
>+     * Hardware channel the jobs will be submitted to
>+     */
>+    AVNVTegraChannel *channel;
>+
>+    /*
>+     * Total size of the input memory-mapped buffer
>+     */
>+    size_t input_map_size;
>+
>+    /*
>+     * Offset of the command data within the input map
>+     */
>+    off_t cmdbuf_off;
>+
>+    /*
>+     * Maximum memory usable by the command buffer
>+     */
>+    size_t max_cmdbuf_size;
>+} AVNVTegraJobPool;
>+
>+typedef struct AVNVTegraJob {
>+    /*
>+     * Memory-mapped buffer for command buffers, metadata structures, ...
>+     */
>+    AVNVTegraMap input_map;
>+
>+    /*
>+     * Object for command recording
>+     */
>+    AVNVTegraCmdbuf cmdbuf;
>+
>+    /*
>+     * Fence indicating completion of the job
>+     */
>+    uint32_t fence;
>+} AVNVTegraJob;
>+
>+AVBufferRef *av_nvtegra_driver_init(void);
>+
>+int av_nvtegra_channel_open(AVNVTegraChannel *channel, const char *dev);
>+int av_nvtegra_channel_close(AVNVTegraChannel *channel);
>+int av_nvtegra_channel_get_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t *clock_rate);
>+int av_nvtegra_channel_set_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t clock_rate);
>+int av_nvtegra_channel_submit(AVNVTegraChannel *channel, AVNVTegraCmdbuf *cmdbuf, uint32_t *fence);
>+int av_nvtegra_channel_set_submit_timeout(AVNVTegraChannel *channel, uint32_t timeout_ms);
>+
>+int av_nvtegra_syncpt_wait(AVNVTegraChannel *channel, uint32_t threshold, int32_t timeout);
>+
>+int av_nvtegra_map_allocate(AVNVTegraMap *map, AVNVTegraChannel *owner, uint32_t size,
>+                            uint32_t align, int heap_mask, int flags);
>+int av_nvtegra_map_free(AVNVTegraMap *map);
>+int av_nvtegra_map_from_va(AVNVTegraMap *map, AVNVTegraChannel *owner, void *mem,
>+                           uint32_t size, uint32_t align, uint32_t flags);
>+int av_nvtegra_map_close(AVNVTegraMap *map);
>+int av_nvtegra_map_map(AVNVTegraMap *map);
>+int av_nvtegra_map_unmap(AVNVTegraMap *map);
>+int av_nvtegra_map_cache_op(AVNVTegraMap *map, int op, void *addr, size_t len);
>+int av_nvtegra_map_realloc(AVNVTegraMap *map, uint32_t size, uint32_t align, int heap_mask, int flags);
>+
>+static inline int av_nvtegra_map_create(AVNVTegraMap *map, AVNVTegraChannel *owner, uint32_t size, uint32_t align,
>+                                        int heap_mask, int flags)
>+{
>+    int err;
>+
>+    err = av_nvtegra_map_allocate(map, owner, size, align, heap_mask, flags);
>+    if (err < 0)
>+        return err;
>+
>+    return av_nvtegra_map_map(map);
>+}
>+
>+static inline int av_nvtegra_map_destroy(AVNVTegraMap *map) {
>+    int err;
>+
>+    err = av_nvtegra_map_unmap(map);
>+    if (err < 0)
>+        return err;
>+
>+    return av_nvtegra_map_free(map);
>+}
>+
>+int av_nvtegra_cmdbuf_init(AVNVTegraCmdbuf *cmdbuf);
>+int av_nvtegra_cmdbuf_deinit(AVNVTegraCmdbuf *cmdbuf);
>+int av_nvtegra_cmdbuf_add_memory(AVNVTegraCmdbuf *cmdbuf, AVNVTegraMap *map, uint32_t offset, uint32_t size);
>+int av_nvtegra_cmdbuf_clear(AVNVTegraCmdbuf *cmdbuf);
>+int av_nvtegra_cmdbuf_begin(AVNVTegraCmdbuf *cmdbuf, uint32_t class_id);
>+int av_nvtegra_cmdbuf_end(AVNVTegraCmdbuf *cmdbuf);
>+int av_nvtegra_cmdbuf_push_word(AVNVTegraCmdbuf *cmdbuf, uint32_t word);
>+int av_nvtegra_cmdbuf_push_value(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, uint32_t word);
>+int av_nvtegra_cmdbuf_push_reloc(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, AVNVTegraMap *target, uint32_t target_offset,
>+                                 int reloc_type, int shift);
>+int av_nvtegra_cmdbuf_push_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt);
>+int av_nvtegra_cmdbuf_push_wait(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence);
>+int av_nvtegra_cmdbuf_add_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence);
>+int av_nvtegra_cmdbuf_add_waitchk(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence);
>+
>+/*
>+ * Job allocation and submission routines
>+ */
>+int av_nvtegra_job_pool_init(AVNVTegraJobPool *pool, AVNVTegraChannel *channel,
>+                             size_t input_map_size, off_t cmdbuf_off, size_t max_cmdbuf_size);
>+int av_nvtegra_job_pool_uninit(AVNVTegraJobPool *pool);
>+AVBufferRef *av_nvtegra_job_pool_get(AVNVTegraJobPool *pool);
>+
>+int av_nvtegra_job_submit(AVNVTegraJobPool *pool, AVNVTegraJob *job);
>+int av_nvtegra_job_wait(AVNVTegraJobPool *pool, AVNVTegraJob *job, int timeout);
>+
>+static inline uint32_t av_nvtegra_map_get_handle(AVNVTegraMap *map) {
>+#ifndef __SWITCH__
>+    return map->handle;
>+#else
>+    return map->map.handle;
>+#endif
>+}
>+
>+static inline void *av_nvtegra_map_get_addr(AVNVTegraMap *map) {
>+#ifndef __SWITCH__
>+    return map->cpu_addr;
>+#else
>+    return map->map.cpu_addr;
>+#endif
>+}
>+
>+static inline uint32_t av_nvtegra_map_get_size(AVNVTegraMap *map) {
>+#ifndef __SWITCH__
>+    return map->size;
>+#else
>+    return map->map.size;
>+#endif
>+}
>+
>+/* Addresses are shifted by 8 bits in the command buffer, requiring an alignment to 256 */
>+#define AV_NVTEGRA_MAP_ALIGN (1 << 8)
>+
>+#define AV_NVTEGRA_VALUE(offset, field, value)                                                    \
>+    ((value &                                                                                     \
>+    ((uint32_t)((UINT64_C(1) << ((1?offset ## _ ## field) - (0?offset ## _ ## field) + 1)) - 1))) \
>+    << (0?offset ## _ ## field))
>+
>+#define AV_NVTEGRA_ENUM(offset, field, value)                                                     \
>+    ((offset ## _ ## field ## _ ## value &                                                        \
>+    ((uint32_t)((UINT64_C(1) << ((1?offset ## _ ## field) - (0?offset ## _ ## field) + 1)) - 1))) \
>+    << (0?offset ## _ ## field))
>+
>+#define AV_NVTEGRA_PUSH_VALUE(cmdbuf, offset, value) ({                                  \
>+    int _err = av_nvtegra_cmdbuf_push_value(cmdbuf, (offset) / sizeof(uint32_t), value); \
>+    if (_err < 0)                                                                        \
>+        return _err;                                                                     \
>+})
>+
>+#define AV_NVTEGRA_PUSH_RELOC(cmdbuf, offset, target, target_offset, type) ({    \
>+    int _err = av_nvtegra_cmdbuf_push_reloc(cmdbuf, (offset) / sizeof(uint32_t), \
>+                                            target, target_offset, type, 8);     \
>+    if (_err < 0)                                                                \
>+        return _err;                                                             \
>+})
>+
>+#endif /* AVUTIL_NVTEGRA_H */
>diff --git a/libavutil/nvtegra_host1x.h b/libavutil/nvtegra_host1x.h
>new file mode 100644
>index 0000000000..25e37eae61
>--- /dev/null
>+++ b/libavutil/nvtegra_host1x.h
>@@ -0,0 +1,94 @@
>+/*
>+ * Copyright (c) 2024 averne <averne381@gmail.com>
>+ *
>+ * This file is part of FFmpeg.
>+ *
>+ * FFmpeg is free software; you can redistribute it and/or modify
>+ * it under the terms of the GNU General Public License as published by
>+ * the Free Software Foundation; either version 2 of the License, or
>+ * (at your option) any later version.
>+ *
>+ * FFmpeg is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+ * GNU General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU General Public License along
>+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
>+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>+ */
>+
>+#ifndef AVUTIL_NVTEGRA_HOST1X_H
>+#define AVUTIL_NVTEGRA_HOST1X_H
>+
>+#include <stdint.h>
>+
>+#include "macros.h"
>+
>+/* From L4T include/linux/host1x.h */
>+enum host1x_class {
>+    HOST1X_CLASS_HOST1X  = 0x01,
>+    HOST1X_CLASS_NVENC   = 0x21,
>+    HOST1X_CLASS_VI      = 0x30,
>+    HOST1X_CLASS_ISPA    = 0x32,
>+    HOST1X_CLASS_ISPB    = 0x34,
>+    HOST1X_CLASS_GR2D    = 0x51,
>+    HOST1X_CLASS_GR2D_SB = 0x52,
>+    HOST1X_CLASS_VIC     = 0x5d,
>+    HOST1X_CLASS_GR3D    = 0x60,
>+    HOST1X_CLASS_NVJPG   = 0xc0,
>+    HOST1X_CLASS_NVDEC   = 0xf0,
>+};
>+
>+static inline uint32_t host1x_opcode_setclass(unsigned class_id, unsigned offset, unsigned mask) {
>+    return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
>+}
>+
>+static inline uint32_t host1x_opcode_incr(unsigned offset, unsigned count) {
>+    return (1 << 28) | (offset << 16) | count;
>+}
>+
>+static inline uint32_t host1x_opcode_nonincr(unsigned offset, unsigned count) {
>+    return (2 << 28) | (offset << 16) | count;
>+}
>+
>+static inline uint32_t host1x_opcode_mask(unsigned offset, unsigned mask) {
>+    return (3 << 28) | (offset << 16) | mask;
>+}
>+
>+static inline uint32_t host1x_opcode_imm(unsigned offset, unsigned value) {
>+    return (4 << 28) | (offset << 16) | value;
>+}
>+
>+#define NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD                                  (0x00000138)
>+#define NV_CLASS_HOST_WAIT_SYNCPT                                          (0x00000140)
>+
>+#define NV_THI_INCR_SYNCPT                                                 (0x00000000)
>+#define NV_THI_INCR_SYNCPT_INDX                                            7:0
>+#define NV_THI_INCR_SYNCPT_COND                                            15:8
>+#define NV_THI_INCR_SYNCPT_COND_IMMEDIATE                                  (0x00000000)
>+#define NV_THI_INCR_SYNCPT_COND_OP_DONE                                    (0x00000001)
>+#define NV_THI_INCR_SYNCPT_ERR                                             (0x00000008)
>+#define NV_THI_INCR_SYNCPT_ERR_COND_STS_IMM                                0:0
>+#define NV_THI_INCR_SYNCPT_ERR_COND_STS_OPDONE                             1:1
>+#define NV_THI_CTXSW_INCR_SYNCPT                                           (0x0000000c)
>+#define NV_THI_CTXSW_INCR_SYNCPT_INDX                                      7:0
>+#define NV_THI_CTXSW                                                       (0x00000020)
>+#define NV_THI_CTXSW_CURR_CLASS                                            9:0
>+#define NV_THI_CTXSW_AUTO_ACK                                              11:11
>+#define NV_THI_CTXSW_CURR_CHANNEL                                          15:12
>+#define NV_THI_CTXSW_NEXT_CLASS                                            25:16
>+#define NV_THI_CTXSW_NEXT_CHANNEL                                          31:28
>+#define NV_THI_CONT_SYNCPT_EOF                                             (0x00000028)
>+#define NV_THI_CONT_SYNCPT_EOF_INDEX                                       7:0
>+#define NV_THI_CONT_SYNCPT_EOF_COND                                        8:8
>+#define NV_THI_METHOD0                                                     (0x00000040)
>+#define NV_THI_METHOD0_OFFSET                                              11:0
>+#define NV_THI_METHOD1                                                     (0x00000044)
>+#define NV_THI_METHOD1_DATA                                                31:0
>+#define NV_THI_INT_STATUS                                                  (0x00000078)
>+#define NV_THI_INT_STATUS_FALCON_INT                                       0:0
>+#define NV_THI_INT_MASK                                                    (0x0000007c)
>+#define NV_THI_INT_MASK_FALCON_INT                                         0:0
>+
>+#endif /* AVUTIL_NVTEGRA_HOST1X_H */
>diff --git a/libavutil/pixdesc.c b/libavutil/pixdesc.c
>index 1c0bcf2232..bb14b1b306 100644
>--- a/libavutil/pixdesc.c
>+++ b/libavutil/pixdesc.c
>@@ -2791,6 +2791,10 @@ static const AVPixFmtDescriptor av_pix_fmt_descriptors[AV_PIX_FMT_NB] = {
>         },
>         .flags = AV_PIX_FMT_FLAG_PLANAR,
>     },
>+    [AV_PIX_FMT_NVTEGRA] = {
>+        .name = "nvtegra",
>+        .flags = AV_PIX_FMT_FLAG_HWACCEL,
>+    },
> };
> 
> static const char * const color_range_names[] = {
>diff --git a/libavutil/pixfmt.h b/libavutil/pixfmt.h
>index a7f50e1690..a3213c792a 100644
>--- a/libavutil/pixfmt.h
>+++ b/libavutil/pixfmt.h
>@@ -439,6 +439,14 @@ enum AVPixelFormat {
>      */
>     AV_PIX_FMT_D3D12,
> 
>+    /**
>+     * Hardware surfaces for Tegra devices.
>+     *
>+     * data[0..2] points to memory-mapped buffer containing frame data
>+     * buf[0] contains an AVBufferRef to an AVNTegraMap
>+     */
>+    AV_PIX_FMT_NVTEGRA,
>+
>     AV_PIX_FMT_NB         ///< number of pixel formats, DO NOT USE THIS if you want to link with shared libav* because the number of formats might differ between versions
> };
> 
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 02/16] configure, avutil: add support for HorizonOS
  2024-05-30 20:37   ` Rémi Denis-Courmont
@ 2024-05-31 21:06     ` averne
  0 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-05-31 21:06 UTC (permalink / raw)
  To: ffmpeg-devel

Le 30/05/2024 à 22:37, Rémi Denis-Courmont a écrit :
> Le torstaina 30. toukokuuta 2024, 22.43.04 EEST averne a écrit :
>> HorizonOS (HOS) is the operating system of the Nintendo Switch.
>> This patch enables integration with the homebrew toolchain developped by the
>> devkitPro team. Its two main components are devkitA64 (common toolchain for
>> aarch64 targets) and libnx (library implementing interaction with the HOS
>> kernel and system daemons, termed sysmodules).
>>
>> Signed-off-by: averne <averne381@gmail.com>
>> ---
>>  configure       | 8 ++++++++
>>  libavutil/cpu.c | 7 +++++++
>>  2 files changed, 15 insertions(+)
>>
>> diff --git a/configure b/configure
>> index 96b181fd21..09fb2aed1b 100755
>> --- a/configure
>> +++ b/configure
>> @@ -5967,6 +5967,10 @@ case $target_os in
>>          ;;
>>      minix)
>>          ;;
>> +    horizon)
>> +        enable section_data_rel_ro
>> +        add_extralibs -lnx
>> +        ;;
>>      none)
>>          ;;
>>      *)
>> @@ -7710,6 +7714,10 @@ haiku)
>>          disable memalign
>>      fi
>>      ;;
>> +horizon)
>> +    disable sysctl
>> +    disable sysctlbyname
>> +    ;;
> 
> Are those really broken, or is this just a trick to force a fallback? In the 
> later case, you don't need to disable them; just to put the HOS code ahead of 
> the generic BSD code.
> 

Hi, those functions are only available for socket-related operations 
(see https://github.com/switchbrew/libnx/blob/master/nx/include/switch/services/bsd.h#L57). 
I think it makes sense to disable them to avoid potential confusion.

>>  esac
>>
>>  flatten_extralibs(){
>> diff --git a/libavutil/cpu.c b/libavutil/cpu.c
>> index 9ac2f01c20..6a77df5e34 100644
>> --- a/libavutil/cpu.c
>> +++ b/libavutil/cpu.c
>> @@ -48,6 +48,9 @@
>>  #if HAVE_UNISTD_H
>>  #include <unistd.h>
>>  #endif
>> +#ifdef __SWITCH__
>> +#include <switch.h>
>> +#endif
>>
>>  static atomic_int cpu_flags = -1;
>>  static atomic_int cpu_count = -1;
>> @@ -247,6 +250,10 @@ int av_cpu_count(void)
>>  #elif HAVE_WINRT
>>      GetNativeSystemInfo(&sysinfo);
>>      nb_cpus = sysinfo.dwNumberOfProcessors;
>> +#elif defined(__SWITCH__)
>> +    u64 core_mask = 0;
>> +    Result rc = svcGetInfo(&core_mask, InfoType_CoreMask,
>> CUR_PROCESS_HANDLE, 0); +    nb_cpus = R_SUCCEEDED(rc) ?
>> av_popcount64(core_mask) : 3;
>>  #endif
>>
>>      if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))
> 
> 
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices
  2024-05-30 20:42   ` Rémi Denis-Courmont
@ 2024-05-31 21:06     ` averne
  2024-05-31 21:16       ` Timo Rothenpieler
  0 siblings, 1 reply; 37+ messages in thread
From: averne @ 2024-05-31 21:06 UTC (permalink / raw)
  To: ffmpeg-devel

Le 30/05/2024 à 22:42, Rémi Denis-Courmont a écrit :
> Le torstaina 30. toukokuuta 2024, 22.43.05 EEST averne a écrit :
>> These files are taken with minimal modifications from nvidia's Linux4Tegra
>> (L4T) tree. nvmap enables management of memory-mapped buffers for hardware
>> devices. nvhost enables interaction with different hardware modules
>> (multimedia engines, display engine, ...), through a common block, host1x.
>>
>> Signed-off-by: averne <averne381@gmail.com>
>> ---
>>  libavutil/Makefile       |   2 +
>>  libavutil/nvhost_ioctl.h | 511 +++++++++++++++++++++++++++++++++++++++
>>  libavutil/nvmap_ioctl.h  | 451 ++++++++++++++++++++++++++++++++++
>>  3 files changed, 964 insertions(+)
>>  create mode 100644 libavutil/nvhost_ioctl.h
>>  create mode 100644 libavutil/nvmap_ioctl.h
>>
>> diff --git a/libavutil/Makefile b/libavutil/Makefile
>> index 6e6fa8d800..9c112bc58a 100644
>> --- a/libavutil/Makefile
>> +++ b/libavutil/Makefile
>> @@ -52,6 +52,8 @@ HEADERS = adler32.h                                       
>>              \ hwcontext_videotoolbox.h                                    
>>  \ hwcontext_vdpau.h                                             \
>> hwcontext_vulkan.h                                            \ +         
>> nvhost_ioctl.h                                                \ +         
>> nvmap_ioctl.h                                                 \ iamf.h     
>>                                                   \ imgutils.h             
>>                                       \ intfloat.h                         
>>                           \ diff --git a/libavutil/nvhost_ioctl.h
>> b/libavutil/nvhost_ioctl.h
>> new file mode 100644
>> index 0000000000..b0bf3e3ae6
>> --- /dev/null
>> +++ b/libavutil/nvhost_ioctl.h
>> @@ -0,0 +1,511 @@
>> +/*
>> + * include/uapi/linux/nvhost_ioctl.h
> 
> Well, then that should be provided by linux-libc-dev or equivalent. I don't 
> think that this should be vendored into FFmpeg.
> 

Agreed. On L4T this is provided by nvidia-l4t-kernel-headers, but 
on the HOS side there is no such equivalent yet. If this patch
series moves forward, I will integrate the relevant bits in libnx
and get rid of those headers.
As for the hardware definitions (in the following patch), I think
they should be put in nv-codec-headers.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory
  2024-05-30 20:38   ` Rémi Denis-Courmont
@ 2024-05-31 21:06     ` averne
  2024-05-31 21:44       ` Michael Niedermayer
  2024-06-01  6:59       ` Rémi Denis-Courmont
  0 siblings, 2 replies; 37+ messages in thread
From: averne @ 2024-05-31 21:06 UTC (permalink / raw)
  To: ffmpeg-devel

Le 30/05/2024 à 22:38, Rémi Denis-Courmont a écrit :
> Le torstaina 30. toukokuuta 2024, 22.43.03 EEST averne a écrit :
>> This is useful eg. for memory-mapped buffers that need page-aligned memory,
>> when dealing with hardware devices
>>
>> Signed-off-by: averne <averne381@gmail.com>
>> ---
>>  libavutil/buffer.c | 31 +++++++++++++++++++++++++++++++
>>  libavutil/buffer.h |  7 +++++++
>>  2 files changed, 38 insertions(+)
>>
>> diff --git a/libavutil/buffer.c b/libavutil/buffer.c
>> index e4562a79b1..b8e357f540 100644
>> --- a/libavutil/buffer.c
>> +++ b/libavutil/buffer.c
>> @@ -16,9 +16,14 @@
>>   * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
>> USA */
>>
>> +#include "config.h"
>> +
>>  #include <stdatomic.h>
>>  #include <stdint.h>
>>  #include <string.h>
>> +#if HAVE_MALLOC_H
>> +#include <malloc.h>
>> +#endif
>>
>>  #include "avassert.h"
>>  #include "buffer_internal.h"
>> @@ -100,6 +105,32 @@ AVBufferRef *av_buffer_allocz(size_t size)
>>      return ret;
>>  }
>>
>> +AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align)
>> +{
>> +    AVBufferRef *ret = NULL;
>> +    uint8_t    *data = NULL;
>> +
>> +#if HAVE_POSIX_MEMALIGN
>> +    if (posix_memalign((void **)&data, align, size))
> 
> Invalid cast.
> 

Neither gcc or clang emit a warning here, even on -Weverything.
What would be your idea of a valid cast then? First cast to intptr_t, 
then void** ?

>> +        return NULL;
>> +#elif HAVE_ALIGNED_MALLOC
>> +    data = aligned_alloc(align, size);
>> +#elif HAVE_MEMALIGN
>> +    data = memalign(align, size);
>> +#else
>> +    return NULL;
>> +#endif
>> +
>> +    if (!data)
>> +        return NULL;
>> +
>> +    ret = av_buffer_create(data, size, av_buffer_default_free, NULL, 0);
>> +    if (!ret)
>> +        av_freep(&data);
>> +
>> +    return ret;
>> +}
>> +
>>  AVBufferRef *av_buffer_ref(const AVBufferRef *buf)
>>  {
>>      AVBufferRef *ret = av_mallocz(sizeof(*ret));
>> diff --git a/libavutil/buffer.h b/libavutil/buffer.h
>> index e1ef5b7f07..8422ec3453 100644
>> --- a/libavutil/buffer.h
>> +++ b/libavutil/buffer.h
>> @@ -107,6 +107,13 @@ AVBufferRef *av_buffer_alloc(size_t size);
>>   */
>>  AVBufferRef *av_buffer_allocz(size_t size);
>>
>> +/**
>> + * Allocate an AVBuffer of the given size and alignment.
>> + *
>> + * @return an AVBufferRef of given size or NULL when out of memory
>> + */
>> +AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align);
>> +
>>  /**
>>   * Always treat the buffer as read-only, even when it has only one
>>   * reference.
> 
> 
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra
  2024-05-31  8:32   ` Rémi Denis-Courmont
@ 2024-05-31 21:06     ` averne
  2024-06-01  7:29       ` Rémi Denis-Courmont
  0 siblings, 1 reply; 37+ messages in thread
From: averne @ 2024-05-31 21:06 UTC (permalink / raw)
  To: ffmpeg-devel

Le 31/05/2024 à 10:32, Rémi Denis-Courmont a écrit :
> 
> 
> Le 30 mai 2024 22:43:07 GMT+03:00, averne <averne381@gmail.com> a écrit :
>> This includes a new pixel format for nvtegra hardware frames, and several objects for interaction with hardware blocks.
>> In particular, this contains code for channels (handles to hardware engines), maps (memory-mapped buffers shared with engines), and command buffers (abstraction for building command lists sent to the engines).
>>
>> Signed-off-by: averne <averne381@gmail.com>
>> ---
>> configure                  |    2 +
>> libavutil/Makefile         |    4 +
>> libavutil/nvtegra.c        | 1035 ++++++++++++++++++++++++++++++++++++
>> libavutil/nvtegra.h        |  258 +++++++++
>> libavutil/nvtegra_host1x.h |   94 ++++
>> libavutil/pixdesc.c        |    4 +
>> libavutil/pixfmt.h         |    8 +
>> 7 files changed, 1405 insertions(+)
>> create mode 100644 libavutil/nvtegra.c
>> create mode 100644 libavutil/nvtegra.h
>> create mode 100644 libavutil/nvtegra_host1x.h
>>
>> diff --git a/configure b/configure
>> index 09fb2aed1b..51f169bfbd 100755
>> --- a/configure
>> +++ b/configure
>> @@ -361,6 +361,7 @@ External library support:
>>   --disable-vdpau          disable Nvidia Video Decode and Presentation API for Unix code [autodetect]
>>   --disable-videotoolbox   disable VideoToolbox code [autodetect]
>>   --disable-vulkan         disable Vulkan code [autodetect]
>> +  --enable-nvtegra         enable nvtegra code [no]
>>
>> Toolchain options:
>>   --arch=ARCH              select architecture [$arch]
>> @@ -3151,6 +3152,7 @@ videotoolbox_hwaccel_deps="videotoolbox pthreads"
>> videotoolbox_hwaccel_extralibs="-framework QuartzCore"
>> vulkan_deps="threads"
>> vulkan_deps_any="libdl LoadLibrary"
>> +nvtegra_deps="gpl"
>>
>> av1_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_AV1"
>> av1_d3d11va_hwaccel_select="av1_decoder"
>> diff --git a/libavutil/Makefile b/libavutil/Makefile
>> index 9c112bc58a..733a23a8a3 100644
>> --- a/libavutil/Makefile
>> +++ b/libavutil/Makefile
>> @@ -52,6 +52,7 @@ HEADERS = adler32.h                                                     \
>>           hwcontext_videotoolbox.h                                      \
>>           hwcontext_vdpau.h                                             \
>>           hwcontext_vulkan.h                                            \
>> +          nvtegra.h                                                     \
>>           nvhost_ioctl.h                                                \
>>           nvmap_ioctl.h                                                 \
>>           iamf.h                                                        \
>> @@ -209,6 +210,7 @@ OBJS-$(CONFIG_VDPAU)                    += hwcontext_vdpau.o
>> OBJS-$(CONFIG_VULKAN)                   += hwcontext_vulkan.o vulkan.o
>>
>> OBJS-$(!CONFIG_VULKAN)                  += hwcontext_stub.o
>> +OBJS-$(CONFIG_NVTEGRA)                  += nvtegra.o
>>
>> OBJS += $(COMPAT_OBJS:%=../compat/%)
>>
>> @@ -230,6 +232,8 @@ SKIPHEADERS-$(CONFIG_VDPAU)            += hwcontext_vdpau.h
>> SKIPHEADERS-$(CONFIG_VULKAN)           += hwcontext_vulkan.h vulkan.h   \
>>                                           vulkan_functions.h            \
>>                                           vulkan_loader.h
>> +SKIPHEADERS-$(CONFIG_NVTEGRA)          += nvtegra.h                     \
>> +                                          nvtegra_host1x.h
>>
>> TESTPROGS = adler32                                                     \
>>             aes                                                         \
>> diff --git a/libavutil/nvtegra.c b/libavutil/nvtegra.c
>> new file mode 100644
>> index 0000000000..ad0bbbdfaa
>> --- /dev/null
>> +++ b/libavutil/nvtegra.c
>> @@ -0,0 +1,1035 @@
>> +/*
>> + * Copyright (c) 2024 averne <averne381@gmail.com>
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
>> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>> + */
>> +
>> +#ifndef __SWITCH__
>> +#   include <sys/ioctl.h>
>> +#   include <sys/mman.h>
>> +#   include <fcntl.h>
>> +#   include <unistd.h>
>> +#else
>> +#   include <stdlib.h>
>> +#   include <switch.h>
>> +#endif
>> +
>> +#include <string.h>
>> +
>> +#include "buffer.h"
>> +#include "log.h"
>> +#include "error.h"
>> +#include "mem.h"
>> +#include "thread.h"
>> +
>> +#include "nvhost_ioctl.h"
>> +#include "nvmap_ioctl.h"
>> +#include "nvtegra_host1x.h"
>> +
>> +#include "nvtegra.h"
>> +
>> +/*
>> + * Tag used by the kernel to identify allocations.
>> + * Official software has been seen using 0x900, 0xf00, 0x1100, 0x1400, 0x4000.
>> + */
>> +#define MEM_TAG (0xfeed)
>> +
>> +struct DriverState {
>> +    int nvmap_fd, nvhost_fd;
>> +};
>> +
>> +static AVMutex g_driver_init_mtx = AV_MUTEX_INITIALIZER;
>> +static struct DriverState *g_driver_state = NULL;
>> +static AVBufferRef *g_driver_state_ref = NULL;
>> +
>> +static void free_driver_fds(void *opaque, uint8_t *data) {
>> +    if (!g_driver_state)
>> +        return;
>> +
>> +#ifndef __SWITCH__
>> +    if (g_driver_state->nvmap_fd > 0)
>> +        close(g_driver_state->nvmap_fd);
>> +
>> +    if (g_driver_state->nvhost_fd > 0)
>> +        close(g_driver_state->nvhost_fd);
>> +#else
>> +    nvFenceExit();
>> +    nvMapExit();
>> +    nvExit();
>> +    mmuExit();
>> +#endif
>> +
>> +    g_driver_init_mtx  = (AVMutex)AV_MUTEX_INITIALIZER;
>> +    g_driver_state_ref = NULL;
>> +    av_freep(&g_driver_state);
>> +}
>> +
>> +static int init_driver_fds(void) {
>> +    AVBufferRef *ref;
>> +    struct DriverState *state;
>> +    int err;
>> +
>> +    state = av_mallocz(sizeof(*state));
>> +    if (!state)
>> +        return AVERROR(ENOMEM);
>> +
>> +    ref = av_buffer_create((uint8_t *)state, sizeof(*state), free_driver_fds, NULL, 0);
>> +    if (!state)
>> +        return AVERROR(ENOMEM);
>> +
>> +    g_driver_state     = state;
>> +    g_driver_state_ref = ref;
>> +
>> +#ifndef __SWITCH__
>> +    err = open("/dev/nvmap", O_RDWR | O_SYNC);
> 
> There's helpers to open files, and you're missing the close on exec here. Also not clear why you need O_SYNC.
> 
> But did you consider just reimplementing libnvdec instead of putting the device driver directly in FFmpeg?
> 

I checked and official code uses O_RDWR|O_SYNC|O_CLOEXEC for 
/dev/nvhost-ctrl and /dev/nvmap, then O_RDWR|O_CLOEXEC for 
/dev/nvhost-vic and /dev/nvhost-nvdec.
I don't believe O_SYNC is required but I think it's good
practice to reproduce offical behavior when possible. I'll 
switch everything O_RDWR|O_SYNC|O_CLOEXEC.

As for your second question, I probably should've given some
context about this decision. Initially I thought about writing a
vaapi driver, but for a number of reasons I decided against it.
- First, the Switch is a performance-constrained device, so removing
  abstraction layers frees up CPU time and memory accesses.
  Integrating directly into FFmpeg enables some optimizations, for 
  instance bitstream data is never copied to a staging buffer, but
  written directly to the memory-mapped buffer that will be fed to the
  hardware.
  There are also some codecs that need information not given in vaapi 
  structures (see for instance sw_hdr_skip_length in the HEVC code),
  so it would require re-parsing slice headers. Likewise, in NVDEC
  the VP9 entropy context isn't managed in hardware/microcode, so the
  vaapi implementation would need to duplicate work.
- Second, a vaapi driver honestly seemed like an enormous amount of
  work, on top of all the reverse engineering efforts, I would need to
  make FFmpeg (and later mpv) happy about my implementation. 
- Third, I wasn't certain I would be able to implement zero-copy 
  frame imports in my graphics context. The goal was to use deko3d
  (https://github.com/devkitPro/deko3d), an efficient homebrew graphics
  API for the Switch, which needs CPU addresses to import external 
  buffers.

>> +    if (err < 0)
>> +        return AVERROR(errno);
>> +    state->nvmap_fd = err;
>> +
>> +    err = open("/dev/nvhost-ctrl", O_RDWR | O_SYNC);
>> +    if (err < 0)
>> +        return AVERROR(errno);
>> +    state->nvhost_fd = err;
>> +#else
>> +    err = nvInitialize();
>> +    if (R_FAILED(err))
>> +        return AVERROR(err);
>> +
>> +    err = nvMapInit();
>> +    if (R_FAILED(err))
>> +        return AVERROR(err);
>> +    state->nvmap_fd = nvMapGetFd();
>> +
>> +    err = nvFenceInit();
>> +    if (R_FAILED(err))
>> +        return AVERROR(err);
>> +    /* libnx doesn't export the nvhost-ctrl file descriptor */
>> +
>> +    err = mmuInitialize();
>> +    if (R_FAILED(err))
>> +        return AVERROR(err);
>> +#endif
>> +
>> +    return 0;
>> +}
>> +
>> +static inline int get_nvmap_fd(void) {
>> +    if (!g_driver_state)
>> +        return AVERROR_UNKNOWN;
>> +
>> +    if (!g_driver_state->nvmap_fd)
>> +        return AVERROR_UNKNOWN;
>> +
>> +    return g_driver_state->nvmap_fd;
>> +}
>> +
>> +static inline int get_nvhost_fd(void) {
>> +    if (!g_driver_state)
>> +        return AVERROR_UNKNOWN;
>> +
>> +    if (!g_driver_state->nvhost_fd)
>> +        return AVERROR_UNKNOWN;
>> +
>> +    return g_driver_state->nvhost_fd;
>> +}
>> +
>> +AVBufferRef *av_nvtegra_driver_init(void) {
>> +    AVBufferRef *out = NULL;
>> +    int err;
>> +
>> +    /*
>> +     * We have to do this overly complex dance of putting driver fds in a refcounted struct,
>> +     * otherwise initializing multiple hwcontexts would leak fds
>> +     */
>> +
>> +    err = ff_mutex_lock(&g_driver_init_mtx);
>> +    if (err != 0)
>> +        goto exit;
>> +
>> +    if (g_driver_state_ref) {
>> +        out = av_buffer_ref(g_driver_state_ref);
>> +        goto exit;
>> +    }
>> +
>> +    err = init_driver_fds();
>> +    if (err < 0) {
>> +        /* In case memory allocations failed, call the destructor ourselves */
>> +        av_buffer_unref(&g_driver_state_ref);
>> +        free_driver_fds(NULL, NULL);
>> +        goto exit;
>> +    }
>> +
>> +    out = g_driver_state_ref;
>> +
>> +exit:
>> +    ff_mutex_unlock(&g_driver_init_mtx);
>> +    return out;
>> +}
>> +
>> +int av_nvtegra_channel_open(AVNVTegraChannel *channel, const char *dev) {
>> +    int err;
>> +#ifndef __SWITCH__
>> +    struct nvhost_get_param_arg args;
>> +
>> +    err = open(dev, O_RDWR);
>> +    if (err < 0)
>> +        return AVERROR(errno);
>> +
>> +    channel->fd = err;
>> +
>> +    args = (struct nvhost_get_param_arg){0};
>> +
>> +    err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_GET_SYNCPOINT, &args);
>> +    if (err < 0)
>> +        goto fail;
>> +
>> +    channel->syncpt = args.value;
>> +
>> +    return 0;
>> +
>> +fail:
>> +    close(channel->fd);
>> +    return AVERROR(errno);
>> +#else
>> +    err = nvChannelCreate(&channel->channel, dev);
>> +    if (R_FAILED(err))
>> +        return AVERROR(err);
>> +
>> +    err = nvioctlChannel_GetSyncpt(channel->channel.fd, 0, &channel->syncpt);
>> +    if (R_FAILED(err))
>> +        goto fail;
>> +
>> +    return 0;
>> +
>> +fail:
>> +    nvChannelClose(&channel->channel);
>> +    return AVERROR(err);
>> +#endif
>> +}
>> +
>> +int av_nvtegra_channel_close(AVNVTegraChannel *channel) {
>> +#ifndef __SWITCH__
>> +    if (!channel->fd)
>> +        return 0;
>> +
>> +    return close(channel->fd);
>> +#else
>> +    nvChannelClose(&channel->channel);
>> +    return 0;
>> +#endif
>> +}
>> +
>> +int av_nvtegra_channel_get_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t *clock_rate) {
>> +    int err;
>> +#ifndef __SWITCH__
>> +    struct nvhost_clk_rate_args args;
>> +
>> +    args = (struct nvhost_clk_rate_args){
>> +        .moduleid = moduleid,
>> +    };
>> +
>> +    err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_GET_CLK_RATE, &args);
>> +    if (err < 0)
>> +        return AVERROR(errno);
>> +
>> +    if (clock_rate)
>> +        *clock_rate = args.rate;
>> +
>> +    return 0;
>> +#else
>> +    uint32_t tmp;
>> +
>> +    err = AVERROR(nvioctlChannel_GetModuleClockRate(channel->channel.fd, moduleid, &tmp));
>> +    if (err < 0)
>> +        return err;
>> +
>> +    if (clock_rate)
>> +        *clock_rate = tmp * 1000;
>> +
>> +    return 0;
>> +#endif
>> +}
>> +
>> +int av_nvtegra_channel_set_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t clock_rate) {
>> +#ifndef __SWITCH__
>> +    struct nvhost_clk_rate_args args;
>> +
>> +    args = (struct nvhost_clk_rate_args){
>> +        .rate     = clock_rate,
>> +        .moduleid = moduleid,
>> +    };
>> +
>> +    return (ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SET_CLK_RATE, &args) < 0) ? AVERROR(errno) : 0;
>> +#else
>> +    return AVERROR(nvioctlChannel_SetModuleClockRate(channel->channel.fd, moduleid, clock_rate / 1000));
>> +#endif
>> +}
>> +
>> +int av_nvtegra_channel_submit(AVNVTegraChannel *channel, AVNVTegraCmdbuf *cmdbuf, uint32_t *fence) {
>> +    int err;
>> +#ifndef __SWITCH__
>> +    struct nvhost_submit_args args;
>> +
>> +    args = (struct nvhost_submit_args){
>> +        .submit_version          = NVHOST_SUBMIT_VERSION_V2,
>> +        .num_syncpt_incrs        = cmdbuf->num_syncpt_incrs,
>> +        .num_cmdbufs             = cmdbuf->num_cmdbufs,
>> +        .num_relocs              = cmdbuf->num_relocs,
>> +        .num_waitchks            = cmdbuf->num_waitchks,
>> +        .timeout                 = 0,
>> +        .flags                   = 0,
>> +        .fence                   = 0,
>> +        .syncpt_incrs            = (uintptr_t)cmdbuf->syncpt_incrs,
>> +        .cmdbuf_exts             = (uintptr_t)cmdbuf->cmdbuf_exts,
>> +        .checksum_methods        = 0,
>> +        .checksum_falcon_methods = 0,
>> +        .pad                     = { 0 },
>> +        .reloc_types             = (uintptr_t)cmdbuf->reloc_types,
>> +        .cmdbufs                 = (uintptr_t)cmdbuf->cmdbufs,
>> +        .relocs                  = (uintptr_t)cmdbuf->relocs,
>> +        .reloc_shifts            = (uintptr_t)cmdbuf->reloc_shifts,
>> +        .waitchks                = (uintptr_t)cmdbuf->waitchks,
>> +        .waitbases               = 0,
>> +        .class_ids               = (uintptr_t)cmdbuf->class_ids,
>> +        .fences                  = (uintptr_t)cmdbuf->fences,
>> +    };
>> +
>> +    err = ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SUBMIT, &args);
>> +    if (err < 0)
>> +        return AVERROR(errno);
>> +
>> +    if (fence)
>> +        *fence = args.fence;
>> +
>> +    return 0;
>> +#else
>> +    nvioctl_fence tmp;
>> +
>> +    err = nvioctlChannel_Submit(channel->channel.fd, (nvioctl_cmdbuf *)cmdbuf->cmdbufs, cmdbuf->num_cmdbufs,
>> +                                NULL, NULL, 0, (nvioctl_syncpt_incr *)cmdbuf->syncpt_incrs, cmdbuf->num_syncpt_incrs,
>> +                                &tmp, 1);
>> +    if (R_FAILED(err))
>> +        return AVERROR(err);
>> +
>> +    if (fence)
>> +        *fence = tmp.value;
>> +
>> +    return 0;
>> +#endif
>> +}
>> +
>> +int av_nvtegra_channel_set_submit_timeout(AVNVTegraChannel *channel, uint32_t timeout_ms) {
>> +#ifndef __SWITCH__
>> +    struct nvhost_set_timeout_args args;
>> +
>> +    args = (struct nvhost_set_timeout_args){
>> +        .timeout = timeout_ms,
>> +    };
>> +
>> +    return (ioctl(channel->fd, NVHOST_IOCTL_CHANNEL_SET_TIMEOUT, &args) < 0) ? AVERROR(errno) : 0;
>> +#else
>> +    return AVERROR(nvioctlChannel_SetSubmitTimeout(channel->channel.fd, timeout_ms));
>> +#endif
>> +}
>> +
>> +int av_nvtegra_syncpt_wait(AVNVTegraChannel *channel, uint32_t threshold, int32_t timeout) {
>> +#ifndef __SWITCH__
>> +    struct nvhost_ctrl_syncpt_waitex_args args = {
>> +        .id      = channel->syncpt,
>> +        .thresh  = threshold,
>> +        .timeout = timeout,
>> +    };
>> +
>> +    return (ioctl(get_nvhost_fd(), NVHOST_IOCTL_CTRL_SYNCPT_WAITEX, &args) < 0) ? AVERROR(errno) : 0;
>> +#else
>> +    NvFence fence;
>> +
>> +    fence = (NvFence){
>> +        .id    = channel->syncpt,
>> +        .value = threshold,
>> +    };
>> +
>> +    return AVERROR(nvFenceWait(&fence, timeout));
>> +#endif
>> +}
>> +
>> +#ifdef __SWITCH__
>> +static inline bool convert_cache_flags(uint32_t flags) {
>> +    /* Return whether the map should be CPU-cacheable */
>> +    switch (flags & NVMAP_HANDLE_CACHE_FLAG) {
>> +        case NVMAP_HANDLE_INNER_CACHEABLE:
>> +        case NVMAP_HANDLE_CACHEABLE:
>> +            return true;
>> +        default:
>> +            return false;
>> +    }
>> +}
>> +#endif
>> +
>> +int av_nvtegra_map_allocate(AVNVTegraMap *map, AVNVTegraChannel *channel, uint32_t size,
>> +                            uint32_t align, int heap_mask, int flags)
>> +{
>> +#ifndef __SWITCH__
>> +    struct nvmap_create_handle create_args;
>> +    struct nvmap_alloc_handle alloc_args;
>> +    int err;
>> +
>> +    create_args = (struct nvmap_create_handle){
>> +        .size   = size,
>> +    };
>> +
>> +    err = ioctl(get_nvmap_fd(), NVMAP_IOC_CREATE, &create_args);
>> +    if (err < 0)
>> +        return AVERROR(errno);
>> +
>> +    map->size   = size;
>> +    map->handle = create_args.handle;
>> +
>> +    alloc_args = (struct nvmap_alloc_handle){
>> +        .handle    = create_args.handle,
>> +        .heap_mask = heap_mask,
>> +        .flags     = flags | (MEM_TAG << 16),
>> +        .align     = align,
>> +    };
>> +
>> +    err = ioctl(get_nvmap_fd(), NVMAP_IOC_ALLOC, &alloc_args);
>> +    if (err < 0)
>> +        goto fail;
>> +
>> +    return 0;
>> +
>> +fail:
>> +    av_nvtegra_map_free(map);
>> +    return AVERROR(errno);
>> +#else
>> +    void *mem;
>> +
>> +    map->owner = channel->channel.fd;
>> +
>> +    size = FFALIGN(size, 0x1000);
>> +
>> +    mem = aligned_alloc(FFALIGN(align, 0x1000), size);
>> +    if (!mem)
>> +        return AVERROR(ENOMEM);
>> +
>> +    return AVERROR(nvMapCreate(&map->map, mem, size, 0x10000, NvKind_Pitch,
>> +                               convert_cache_flags(flags)));
>> +#endif
>> +}
>> +
>> +int av_nvtegra_map_free(AVNVTegraMap *map) {
>> +#ifndef __SWITCH__
>> +    int err;
>> +
>> +    if (!map->handle)
>> +        return 0;
>> +
>> +    err = ioctl(get_nvmap_fd(), NVMAP_IOC_FREE, map->handle);
>> +    if (err < 0)
>> +        return AVERROR(errno);
>> +
>> +    map->handle = 0;
>> +
>> +    return 0;
>> +#else
>> +    void *addr = map->map.cpu_addr;
>> +
>> +    if (!map->map.cpu_addr)
>> +        return 0;
>> +
>> +    nvMapClose(&map->map);
>> +    free(addr);
>> +    return 0;
>> +#endif
>> +}
>> +
>> +int av_nvtegra_map_from_va(AVNVTegraMap *map, AVNVTegraChannel *owner, void *mem,
>> +                           uint32_t size, uint32_t align, uint32_t flags)
>> +{
>> +#ifndef __SWITCH__
>> +    struct nvmap_create_handle_from_va args;
>> +    int err;
>> +
>> +    args = (struct nvmap_create_handle_from_va){
>> +        .va    = (uintptr_t)mem,
>> +        .size  = size,
>> +        .flags = flags | (MEM_TAG << 16),
>> +    };
>> +
>> +    err = ioctl(get_nvmap_fd(), NVMAP_IOC_FROM_VA, &args);
>> +    if (err < 0)
>> +        return AVERROR(errno);
>> +
>> +    map->cpu_addr = mem;
>> +    map->size     = size;
>> +    map->handle   = args.handle;
>> +
>> +    return 0;
>> +#else
>> +
>> +    map->owner = owner->channel.fd;
>> +
>> +    return AVERROR(nvMapCreate(&map->map, mem, FFALIGN(size, 0x1000), 0x10000, NvKind_Pitch,
>> +                               convert_cache_flags(flags)));;
>> +#endif
>> +}
>> +
>> +int av_nvtegra_map_close(AVNVTegraMap *map) {
>> +#ifndef __SWITCH__
>> +    return av_nvtegra_map_free(map);
>> +#else
>> +    nvMapClose(&map->map);
>> +    return 0;
>> +#endif
>> +}
>> +
>> +int av_nvtegra_map_map(AVNVTegraMap *map) {
>> +#ifndef __SWITCH__
>> +    void *addr;
>> +
>> +    addr = mmap(NULL, map->size, PROT_READ | PROT_WRITE, MAP_SHARED, map->handle, 0);
>> +    if (addr == MAP_FAILED)
>> +        return AVERROR(errno);
>> +
>> +    map->cpu_addr = addr;
>> +
>> +    return 0;
>> +#else
>> +    nvioctl_command_buffer_map params;
>> +    int err;
>> +
>> +    params = (nvioctl_command_buffer_map){
>> +        .handle = map->map.handle,
>> +    };
>> +
>> +    err = nvioctlChannel_MapCommandBuffer(map->owner, &params, 1, false);
>> +    if (R_FAILED(err))
>> +        return AVERROR(err);
>> +
>> +    map->iova = params.iova;
>> +
>> +    return 0;
>> +#endif
>> +}
>> +
>> +int av_nvtegra_map_unmap(AVNVTegraMap *map) {
>> +    int err;
>> +#ifndef __SWITCH__
>> +    if (!map->cpu_addr)
>> +        return 0;
>> +
>> +    err = munmap(map->cpu_addr, map->size);
>> +    if (err < 0)
>> +        return AVERROR(errno);
>> +
>> +    map->cpu_addr = NULL;
>> +
>> +    return 0;
>> +#else
>> +    nvioctl_command_buffer_map params;
>> +
>> +    if (!map->iova)
>> +        return 0;
>> +
>> +    params = (nvioctl_command_buffer_map){
>> +        .handle = map->map.handle,
>> +        .iova   = map->iova,
>> +    };
>> +
>> +    err = nvioctlChannel_UnmapCommandBuffer(map->owner, &params, 1, false);
>> +    if (R_FAILED(err))
>> +        return AVERROR(err);
>> +
>> +    map->iova = 0;
>> +
>> +    return 0;
>> +#endif
>> +}
>> +
>> +int av_nvtegra_map_cache_op(AVNVTegraMap *map, int op, void *addr, size_t len) {
>> +#ifndef __SWITCH__
>> +    struct nvmap_cache_op args;
>> +
>> +    args = (struct nvmap_cache_op){
>> +        .addr   = (uintptr_t)addr,
>> +        .len    = len,
>> +        .handle = av_nvtegra_map_get_handle(map),
>> +        .op     = op,
>> +    };
>> +
>> +    return AVERROR(ioctl(get_nvmap_fd(), NVMAP_IOC_CACHE, &args));
>> +#else
>> +    if (!map->map.is_cpu_cacheable)
>> +        return 0;
>> +
>> +    switch (op) {
>> +        case NVMAP_CACHE_OP_WB:
>> +            armDCacheClean(addr, len);
>> +            break;
>> +        default:
>> +        case NVMAP_CACHE_OP_INV:
>> +        case NVMAP_CACHE_OP_WB_INV:
>> +            /* libnx internally performs a clean-invalidate, since invalidate is a privileged instruction */
>> +            armDCacheFlush(addr, len);
>> +            break;
>> +    }
>> +
>> +    return 0;
>> +#endif
>> +}
>> +
>> +int av_nvtegra_map_realloc(AVNVTegraMap *map, uint32_t size, uint32_t align,
>> +                           int heap_mask, int flags)
>> +{
>> +    AVNVTegraChannel channel;
>> +    AVNVTegraMap tmp = {0};
>> +    int err;
>> +
>> +    if (av_nvtegra_map_get_size(map) >= size)
>> +        return 0;
>> +
>> +    /* Dummy channel object to hold the owner fd */
>> +    channel = (AVNVTegraChannel){
>> +#ifdef __SWITCH__
>> +        .channel.fd = map->owner,
>> +#endif
>> +    };
>> +
>> +    err = av_nvtegra_map_create(&tmp, &channel, size, align, heap_mask, flags);
>> +    if (err < 0)
>> +        goto fail;
>> +
>> +    memcpy(av_nvtegra_map_get_addr(&tmp), av_nvtegra_map_get_addr(map), av_nvtegra_map_get_size(map));
>> +
>> +    err = av_nvtegra_map_destroy(map);
>> +    if (err < 0)
>> +        goto fail;
>> +
>> +    *map = tmp;
>> +
>> +    return 0;
>> +
>> +fail:
>> +    av_nvtegra_map_destroy(&tmp);
>> +    return err;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_init(AVNVTegraCmdbuf *cmdbuf) {
>> +    cmdbuf->num_cmdbufs      = 0;
>> +#ifndef __SWITCH__
>> +    cmdbuf->num_relocs       = 0;
>> +    cmdbuf->num_waitchks     = 0;
>> +#endif
>> +    cmdbuf->num_syncpt_incrs = 0;
>> +
>> +#define NUM_INITIAL_CMDBUFS      3
>> +#define NUM_INITIAL_RELOCS       15
>> +#define NUM_INITIAL_SYNCPT_INCRS 3
>> +
>> +    cmdbuf->cmdbufs      = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->cmdbufs));
>> +#ifndef __SWITCH__
>> +    cmdbuf->cmdbuf_exts  = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->cmdbuf_exts));
>> +    cmdbuf->class_ids    = av_malloc_array(NUM_INITIAL_CMDBUFS, sizeof(*cmdbuf->class_ids));
>> +#endif
>> +
>> +#ifndef __SWITCH__
>> +    if (!cmdbuf->cmdbufs || !cmdbuf->cmdbuf_exts || !cmdbuf->class_ids)
>> +#else
>> +    if (!cmdbuf->cmdbufs)
>> +#endif
>> +        return AVERROR(ENOMEM);
>> +
>> +#ifndef __SWITCH__
>> +    cmdbuf->relocs       = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->relocs));
>> +    cmdbuf->reloc_types  = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->reloc_types));
>> +    cmdbuf->reloc_shifts = av_malloc_array(NUM_INITIAL_RELOCS, sizeof(*cmdbuf->reloc_shifts));
>> +    if (!cmdbuf->relocs || !cmdbuf->reloc_types || !cmdbuf->reloc_shifts)
>> +        return AVERROR(ENOMEM);
>> +#endif
>> +
>> +    cmdbuf->syncpt_incrs = av_malloc_array(NUM_INITIAL_SYNCPT_INCRS, sizeof(*cmdbuf->syncpt_incrs));
>> +#ifndef __SWITCH__
>> +    cmdbuf->fences       = av_malloc_array(NUM_INITIAL_SYNCPT_INCRS, sizeof(*cmdbuf->fences));
>> +#endif
>> +
>> +#ifndef __SWITCH__
>> +    if (!cmdbuf->syncpt_incrs || !cmdbuf->fences)
>> +#else
>> +    if (!cmdbuf->syncpt_incrs)
>> +#endif
>> +        return AVERROR(ENOMEM);
>> +
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_deinit(AVNVTegraCmdbuf *cmdbuf) {
>> +    av_freep(&cmdbuf->cmdbufs);
>> +    av_freep(&cmdbuf->syncpt_incrs);
>> +
>> +#ifndef __SWITCH__
>> +    av_freep(&cmdbuf->cmdbuf_exts), av_freep(&cmdbuf->class_ids);
>> +    av_freep(&cmdbuf->relocs), av_freep(&cmdbuf->reloc_types), av_freep(&cmdbuf->reloc_shifts);
>> +    av_freep(&cmdbuf->fences);
>> +#endif
>> +
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_add_memory(AVNVTegraCmdbuf *cmdbuf, AVNVTegraMap *map, uint32_t offset, uint32_t size) {
>> +    uint8_t *mem;
>> +
>> +    mem = av_nvtegra_map_get_addr(map);
>> +
>> +    cmdbuf->map        = map;
>> +    cmdbuf->mem_offset = offset;
>> +    cmdbuf->mem_size   = size;
>> +
>> +    cmdbuf->cur_word = (uint32_t *)(mem + cmdbuf->mem_offset);
>> +
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_clear(AVNVTegraCmdbuf *cmdbuf) {
>> +    uint8_t *mem;
>> +
>> +    mem = av_nvtegra_map_get_addr(cmdbuf->map);
>> +
>> +    cmdbuf->num_cmdbufs = 0, cmdbuf->num_syncpt_incrs = 0;
>> +#ifndef __SWITCH__
>> +    cmdbuf->num_relocs = 0, cmdbuf->num_waitchks = 0;
>> +#endif
>> +
>> +    cmdbuf->cur_word = (uint32_t *)(mem + cmdbuf->mem_offset);
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_begin(AVNVTegraCmdbuf *cmdbuf, uint32_t class_id) {
>> +    uint8_t *mem;
>> +    void *tmp1;
>> +#ifndef __SWITCH__
>> +    void *tmp2, *tmp3;
>> +#endif
>> +
>> +    mem = av_nvtegra_map_get_addr(cmdbuf->map);
>> +
>> +    tmp1 = av_realloc_array(cmdbuf->cmdbufs,     cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->cmdbufs));
>> +#ifndef __SWITCH__
>> +    tmp2 = av_realloc_array(cmdbuf->cmdbuf_exts, cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->cmdbuf_exts));
>> +    tmp3 = av_realloc_array(cmdbuf->class_ids,   cmdbuf->num_cmdbufs + 1, sizeof(*cmdbuf->class_ids));
>> +#endif
>> +
>> +#ifndef __SWITCH__
>> +    if (!tmp1 || !tmp2 || !tmp3)
>> +#else
>> +    if (!tmp1)
>> +#endif
>> +        return AVERROR(ENOMEM);
>> +
>> +    cmdbuf->cmdbufs = tmp1;
>> +
>> +#ifndef __SWITCH__
>> +    cmdbuf->cmdbuf_exts = tmp2, cmdbuf->class_ids = tmp3;
>> +#endif
>> +
>> +    cmdbuf->cmdbufs[cmdbuf->num_cmdbufs] = (struct nvhost_cmdbuf){
>> +        .mem       = av_nvtegra_map_get_handle(cmdbuf->map),
>> +        .offset    = (uint8_t *)cmdbuf->cur_word - mem,
>> +    };
>> +
>> +#ifndef __SWITCH__
>> +    cmdbuf->cmdbuf_exts[cmdbuf->num_cmdbufs] = (struct nvhost_cmdbuf_ext){
>> +        .pre_fence = -1,
>> +    };
>> +
>> +    cmdbuf->class_ids[cmdbuf->num_cmdbufs] = class_id;
>> +#endif
>> +
>> +#ifdef __SWITCH__
>> +    if (cmdbuf->num_cmdbufs == 0)
>> +        av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_setclass(class_id, 0, 0));
>> +#endif
>> +
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_end(AVNVTegraCmdbuf *cmdbuf) {
>> +    cmdbuf->num_cmdbufs++;
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_push_word(AVNVTegraCmdbuf *cmdbuf, uint32_t word) {
>> +    uintptr_t mem_start = (uintptr_t)av_nvtegra_map_get_addr(cmdbuf->map) + cmdbuf->mem_offset;
>> +
>> +    if ((uintptr_t)cmdbuf->cur_word - mem_start >= cmdbuf->mem_size)
>> +        return AVERROR(ENOMEM);
>> +
>> +    *cmdbuf->cur_word++ = word;
>> +    cmdbuf->cmdbufs[cmdbuf->num_cmdbufs].words += 1;
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_push_value(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, uint32_t word) {
>> +    int err;
>> +
>> +    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_incr(NV_THI_METHOD0>>2, 2));
>> +    if (err < 0)
>> +        return err;
>> +
>> +    err = av_nvtegra_cmdbuf_push_word(cmdbuf, offset);
>> +    if (err < 0)
>> +        return err;
>> +
>> +    err = av_nvtegra_cmdbuf_push_word(cmdbuf, word);
>> +    if (err < 0)
>> +        return err;
>> +
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_push_reloc(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, AVNVTegraMap *target, uint32_t target_offset,
>> +                                 int reloc_type, int shift)
>> +{
>> +    int err;
>> +#ifndef __SWITCH__
>> +    uint8_t *mem;
>> +    void *tmp1, *tmp2, *tmp3;
>> +
>> +    mem = av_nvtegra_map_get_addr(cmdbuf->map);
>> +
>> +    tmp1 = av_realloc_array(cmdbuf->relocs,       cmdbuf->num_relocs + 1, sizeof(*cmdbuf->relocs));
>> +    tmp2 = av_realloc_array(cmdbuf->reloc_types,  cmdbuf->num_relocs + 1, sizeof(*cmdbuf->reloc_types));
>> +    tmp3 = av_realloc_array(cmdbuf->reloc_shifts, cmdbuf->num_relocs + 1, sizeof(*cmdbuf->reloc_shifts));
>> +    if (!tmp1 || !tmp2 || !tmp3)
>> +        return AVERROR(ENOMEM);
>> +
>> +    cmdbuf->relocs = tmp1, cmdbuf->reloc_types = tmp2, cmdbuf->reloc_shifts = tmp3;
>> +
>> +    err = av_nvtegra_cmdbuf_push_value(cmdbuf, offset, 0xdeadbeef);
>> +    if (err < 0)
>> +        return err;
>> +
>> +    cmdbuf->relocs[cmdbuf->num_relocs]       = (struct nvhost_reloc){
>> +        .cmdbuf_mem    = av_nvtegra_map_get_handle(cmdbuf->map),
>> +        .cmdbuf_offset = (uint8_t *)cmdbuf->cur_word - mem - sizeof(uint32_t),
>> +        .target        = av_nvtegra_map_get_handle(target),
>> +        .target_offset = target_offset,
>> +    };
>> +
>> +    cmdbuf->reloc_types[cmdbuf->num_relocs]  = (struct nvhost_reloc_type){
>> +        .reloc_type    = reloc_type,
>> +    };
>> +
>> +    cmdbuf->reloc_shifts[cmdbuf->num_relocs] = (struct nvhost_reloc_shift){
>> +        .shift         = shift,
>> +    };
>> +
>> +    cmdbuf->num_relocs++;
>> +
>> +    return 0;
>> +#else
>> +    err = av_nvtegra_cmdbuf_push_value(cmdbuf, offset, (target->iova + target_offset) >> shift);
>> +    if (err < 0)
>> +        return err;
>> +
>> +    return 0;
>> +#endif
>> +}
>> +
>> +int av_nvtegra_cmdbuf_push_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt) {
>> +    int err;
>> +
>> +    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_nonincr(NV_THI_INCR_SYNCPT>>2, 1));
>> +    if (err < 0)
>> +        return err;
>> +
>> +    err = av_nvtegra_cmdbuf_push_word(cmdbuf,
>> +                                      AV_NVTEGRA_VALUE(NV_THI_INCR_SYNCPT, INDX, syncpt) |
>> +                                      AV_NVTEGRA_ENUM (NV_THI_INCR_SYNCPT, COND, OP_DONE));
>> +    if (err < 0)
>> +        return err;
>> +
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_push_wait(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence) {
>> +    int err;
>> +
>> +    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_setclass(HOST1X_CLASS_HOST1X, 0, 0));
>> +    if (err < 0)
>> +        return err;
>> +
>> +    err = av_nvtegra_cmdbuf_push_word(cmdbuf, host1x_opcode_mask(NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD>>2,
>> +                                      (1<<(NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD - NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD)) |
>> +                                      (1<<(NV_CLASS_HOST_WAIT_SYNCPT         - NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD))));
>> +    if (err < 0)
>> +        return err;
>> +
>> +    err = av_nvtegra_cmdbuf_push_word(cmdbuf, fence);
>> +    if (err < 0)
>> +        return err;
>> +
>> +    err = av_nvtegra_cmdbuf_push_word(cmdbuf, syncpt);
>> +    if (err < 0)
>> +        return err;
>> +
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_cmdbuf_add_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence)
>> +{
>> +    void *tmp1;
>> +#ifndef __SWITCH__
>> +    void *tmp2;
>> +#endif
>> +
>> +    tmp1 = av_realloc_array(cmdbuf->syncpt_incrs, cmdbuf->num_syncpt_incrs + 1, sizeof(*cmdbuf->syncpt_incrs));
>> +#ifndef __SWITCH__
>> +    tmp2 = av_realloc_array(cmdbuf->fences,       cmdbuf->num_syncpt_incrs + 1, sizeof(*cmdbuf->fences));
>> +#endif
>> +
>> +#ifndef __SWITCH__
>> +    if (!tmp1 || !tmp2)
>> +#else
>> +    if (!tmp1)
>> +#endif
>> +        return AVERROR(ENOMEM);
>> +
>> +    cmdbuf->syncpt_incrs = tmp1;
>> +#ifndef __SWITCH__
>> +    cmdbuf->fences       = tmp2;
>> +#endif
>> +
>> +    cmdbuf->syncpt_incrs[cmdbuf->num_syncpt_incrs] = (struct nvhost_syncpt_incr){
>> +        .syncpt_id    = syncpt,
>> +        .syncpt_incrs = 1,
>> +    };
>> +
>> +#ifndef __SWITCH__
>> +    cmdbuf->fences[cmdbuf->num_syncpt_incrs]       = fence;
>> +#endif
>> +
>> +    cmdbuf->num_syncpt_incrs++;
>> +
>> +    return av_nvtegra_cmdbuf_push_syncpt_incr(cmdbuf, syncpt);
>> +}
>> +
>> +int av_nvtegra_cmdbuf_add_waitchk(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence) {
>> +#ifndef __SWITCH__
>> +    uint8_t *mem;
>> +    void *tmp;
>> +
>> +    mem = av_nvtegra_map_get_addr(cmdbuf->map);
>> +
>> +    tmp = av_realloc_array(cmdbuf->waitchks, cmdbuf->num_waitchks + 1, sizeof(*cmdbuf->waitchks));
>> +    if (!tmp)
>> +        return AVERROR(ENOMEM);
>> +
>> +    cmdbuf->waitchks = tmp;
>> +
>> +    cmdbuf->waitchks[cmdbuf->num_waitchks] = (struct nvhost_waitchk){
>> +        .mem       = av_nvtegra_map_get_handle(cmdbuf->map),
>> +        .offset    = (uint8_t *)cmdbuf->cur_word - mem - sizeof(uint32_t),
>> +        .syncpt_id = syncpt,
>> +        .thresh    = fence,
>> +    };
>> +
>> +    cmdbuf->num_waitchks++;
>> +#endif
>> +
>> +    return av_nvtegra_cmdbuf_push_wait(cmdbuf, syncpt, fence);
>> +}
>> +
>> +static void nvtegra_job_free(void *opaque, uint8_t *data) {
>> +    AVNVTegraJob *job = (AVNVTegraJob *)data;
>> +
>> +    if (!job)
>> +        return;
>> +
>> +    av_nvtegra_cmdbuf_deinit(&job->cmdbuf);
>> +    av_nvtegra_map_destroy(&job->input_map);
>> +
>> +    av_freep(&job);
>> +}
>> +
>> +static AVBufferRef *nvtegra_job_alloc(void *opaque, size_t size) {
>> +    AVNVTegraJobPool *pool = opaque;
>> +
>> +    AVBufferRef  *buffer;
>> +    AVNVTegraJob *job;
>> +    int err;
>> +
>> +    job = av_mallocz(sizeof(*job));
>> +    if (!job)
>> +        return NULL;
>> +
>> +    err = av_nvtegra_map_create(&job->input_map, pool->channel, pool->input_map_size, 0x100,
>> +                                NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
>> +    if (err < 0)
>> +        goto fail;
>> +
>> +    err = av_nvtegra_cmdbuf_init(&job->cmdbuf);
>> +    if (err < 0)
>> +        goto fail;
>> +
>> +    err = av_nvtegra_cmdbuf_add_memory(&job->cmdbuf, &job->input_map, pool->cmdbuf_off, pool->max_cmdbuf_size);
>> +    if (err < 0)
>> +        goto fail;
>> +
>> +    buffer = av_buffer_create((uint8_t *)job, sizeof(*job), nvtegra_job_free, pool, 0);
>> +    if (!buffer)
>> +        goto fail;
>> +
>> +    return buffer;
>> +
>> +fail:
>> +    av_nvtegra_cmdbuf_deinit(&job->cmdbuf);
>> +    av_nvtegra_map_destroy(&job->input_map);
>> +    av_freep(job);
>> +    return NULL;
>> +}
>> +
>> +int av_nvtegra_job_pool_init(AVNVTegraJobPool *pool, AVNVTegraChannel *channel,
>> +                             size_t input_map_size, off_t cmdbuf_off, size_t max_cmdbuf_size)
>> +{
>> +    pool->channel         = channel;
>> +    pool->input_map_size  = input_map_size;
>> +    pool->cmdbuf_off      = cmdbuf_off;
>> +    pool->max_cmdbuf_size = max_cmdbuf_size;
>> +    pool->pool            = av_buffer_pool_init2(sizeof(AVNVTegraJob), pool,
>> +                                                 nvtegra_job_alloc, NULL);
>> +    if (!pool->pool)
>> +        return AVERROR(ENOMEM);
>> +
>> +    return 0;
>> +}
>> +
>> +int av_nvtegra_job_pool_uninit(AVNVTegraJobPool *pool) {
>> +    av_buffer_pool_uninit(&pool->pool);
>> +    return 0;
>> +}
>> +
>> +AVBufferRef *av_nvtegra_job_pool_get(AVNVTegraJobPool *pool) {
>> +    return av_buffer_pool_get(pool->pool);
>> +}
>> +
>> +int av_nvtegra_job_submit(AVNVTegraJobPool *pool, AVNVTegraJob *job) {
>> +    return av_nvtegra_channel_submit(pool->channel, &job->cmdbuf, &job->fence);
>> +}
>> +
>> +int av_nvtegra_job_wait(AVNVTegraJobPool *pool, AVNVTegraJob *job, int timeout) {
>> +    return av_nvtegra_syncpt_wait(pool->channel, job->fence, timeout);
>> +}
>> diff --git a/libavutil/nvtegra.h b/libavutil/nvtegra.h
>> new file mode 100644
>> index 0000000000..3b63335d6c
>> --- /dev/null
>> +++ b/libavutil/nvtegra.h
>> @@ -0,0 +1,258 @@
>> +/*
>> + * Copyright (c) 2024 averne <averne381@gmail.com>
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
>> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>> + */
>> +
>> +#ifndef AVUTIL_NVTEGRA_H
>> +#define AVUTIL_NVTEGRA_H
>> +
>> +#include <stdint.h>
>> +#include <stdbool.h>
>> +
>> +#include "buffer.h"
>> +
>> +#include "nvhost_ioctl.h"
>> +#include "nvmap_ioctl.h"
>> +
>> +typedef struct AVNVTegraChannel {
>> +#ifndef __SWITCH__
>> +    int fd;
>> +    int module_id;
>> +#else
>> +    NvChannel channel;
>> +#endif
>> +
>> +    uint32_t syncpt;
>> +
>> +#ifdef __SWITCH__
>> +    MmuRequest mmu_request;
>> +#endif
>> +    uint32_t clock;
>> +} AVNVTegraChannel;
>> +
>> +typedef struct AVNVTegraMap {
>> +#ifndef __SWITCH__
>> +    uint32_t handle;
>> +    uint32_t size;
>> +    void *cpu_addr;
>> +#else
>> +    NvMap map;
>> +    uint32_t iova;
>> +    uint32_t owner;
>> +#endif
>> +    bool is_linear;
>> +} AVNVTegraMap;
>> +
>> +typedef struct AVNVTegraCmdbuf {
>> +    AVNVTegraMap *map;
>> +
>> +    uint32_t mem_offset, mem_size;
>> +
>> +    uint32_t *cur_word;
>> +
>> +    struct nvhost_cmdbuf       *cmdbufs;
>> +#ifndef __SWITCH__
>> +    struct nvhost_cmdbuf_ext   *cmdbuf_exts;
>> +    uint32_t                   *class_ids;
>> +#endif
>> +    uint32_t num_cmdbufs;
>> +
>> +#ifndef __SWITCH__
>> +    struct nvhost_reloc        *relocs;
>> +    struct nvhost_reloc_type   *reloc_types;
>> +    struct nvhost_reloc_shift  *reloc_shifts;
>> +    uint32_t num_relocs;
>> +#endif
>> +
>> +    struct nvhost_syncpt_incr  *syncpt_incrs;
>> +#ifndef __SWITCH__
>> +    uint32_t                   *fences;
>> +#endif
>> +    uint32_t num_syncpt_incrs;
>> +
>> +#ifndef __SWITCH__
>> +    struct nvhost_waitchk      *waitchks;
>> +    uint32_t num_waitchks;
>> +#endif
>> +} AVNVTegraCmdbuf;
>> +
>> +typedef struct AVNVTegraJobPool {
>> +    /*
>> +     * Pool object for job allocation
>> +     */
>> +    AVBufferPool *pool;
>> +
>> +    /*
>> +     * Hardware channel the jobs will be submitted to
>> +     */
>> +    AVNVTegraChannel *channel;
>> +
>> +    /*
>> +     * Total size of the input memory-mapped buffer
>> +     */
>> +    size_t input_map_size;
>> +
>> +    /*
>> +     * Offset of the command data within the input map
>> +     */
>> +    off_t cmdbuf_off;
>> +
>> +    /*
>> +     * Maximum memory usable by the command buffer
>> +     */
>> +    size_t max_cmdbuf_size;
>> +} AVNVTegraJobPool;
>> +
>> +typedef struct AVNVTegraJob {
>> +    /*
>> +     * Memory-mapped buffer for command buffers, metadata structures, ...
>> +     */
>> +    AVNVTegraMap input_map;
>> +
>> +    /*
>> +     * Object for command recording
>> +     */
>> +    AVNVTegraCmdbuf cmdbuf;
>> +
>> +    /*
>> +     * Fence indicating completion of the job
>> +     */
>> +    uint32_t fence;
>> +} AVNVTegraJob;
>> +
>> +AVBufferRef *av_nvtegra_driver_init(void);
>> +
>> +int av_nvtegra_channel_open(AVNVTegraChannel *channel, const char *dev);
>> +int av_nvtegra_channel_close(AVNVTegraChannel *channel);
>> +int av_nvtegra_channel_get_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t *clock_rate);
>> +int av_nvtegra_channel_set_clock_rate(AVNVTegraChannel *channel, uint32_t moduleid, uint32_t clock_rate);
>> +int av_nvtegra_channel_submit(AVNVTegraChannel *channel, AVNVTegraCmdbuf *cmdbuf, uint32_t *fence);
>> +int av_nvtegra_channel_set_submit_timeout(AVNVTegraChannel *channel, uint32_t timeout_ms);
>> +
>> +int av_nvtegra_syncpt_wait(AVNVTegraChannel *channel, uint32_t threshold, int32_t timeout);
>> +
>> +int av_nvtegra_map_allocate(AVNVTegraMap *map, AVNVTegraChannel *owner, uint32_t size,
>> +                            uint32_t align, int heap_mask, int flags);
>> +int av_nvtegra_map_free(AVNVTegraMap *map);
>> +int av_nvtegra_map_from_va(AVNVTegraMap *map, AVNVTegraChannel *owner, void *mem,
>> +                           uint32_t size, uint32_t align, uint32_t flags);
>> +int av_nvtegra_map_close(AVNVTegraMap *map);
>> +int av_nvtegra_map_map(AVNVTegraMap *map);
>> +int av_nvtegra_map_unmap(AVNVTegraMap *map);
>> +int av_nvtegra_map_cache_op(AVNVTegraMap *map, int op, void *addr, size_t len);
>> +int av_nvtegra_map_realloc(AVNVTegraMap *map, uint32_t size, uint32_t align, int heap_mask, int flags);
>> +
>> +static inline int av_nvtegra_map_create(AVNVTegraMap *map, AVNVTegraChannel *owner, uint32_t size, uint32_t align,
>> +                                        int heap_mask, int flags)
>> +{
>> +    int err;
>> +
>> +    err = av_nvtegra_map_allocate(map, owner, size, align, heap_mask, flags);
>> +    if (err < 0)
>> +        return err;
>> +
>> +    return av_nvtegra_map_map(map);
>> +}
>> +
>> +static inline int av_nvtegra_map_destroy(AVNVTegraMap *map) {
>> +    int err;
>> +
>> +    err = av_nvtegra_map_unmap(map);
>> +    if (err < 0)
>> +        return err;
>> +
>> +    return av_nvtegra_map_free(map);
>> +}
>> +
>> +int av_nvtegra_cmdbuf_init(AVNVTegraCmdbuf *cmdbuf);
>> +int av_nvtegra_cmdbuf_deinit(AVNVTegraCmdbuf *cmdbuf);
>> +int av_nvtegra_cmdbuf_add_memory(AVNVTegraCmdbuf *cmdbuf, AVNVTegraMap *map, uint32_t offset, uint32_t size);
>> +int av_nvtegra_cmdbuf_clear(AVNVTegraCmdbuf *cmdbuf);
>> +int av_nvtegra_cmdbuf_begin(AVNVTegraCmdbuf *cmdbuf, uint32_t class_id);
>> +int av_nvtegra_cmdbuf_end(AVNVTegraCmdbuf *cmdbuf);
>> +int av_nvtegra_cmdbuf_push_word(AVNVTegraCmdbuf *cmdbuf, uint32_t word);
>> +int av_nvtegra_cmdbuf_push_value(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, uint32_t word);
>> +int av_nvtegra_cmdbuf_push_reloc(AVNVTegraCmdbuf *cmdbuf, uint32_t offset, AVNVTegraMap *target, uint32_t target_offset,
>> +                                 int reloc_type, int shift);
>> +int av_nvtegra_cmdbuf_push_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt);
>> +int av_nvtegra_cmdbuf_push_wait(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence);
>> +int av_nvtegra_cmdbuf_add_syncpt_incr(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence);
>> +int av_nvtegra_cmdbuf_add_waitchk(AVNVTegraCmdbuf *cmdbuf, uint32_t syncpt, uint32_t fence);
>> +
>> +/*
>> + * Job allocation and submission routines
>> + */
>> +int av_nvtegra_job_pool_init(AVNVTegraJobPool *pool, AVNVTegraChannel *channel,
>> +                             size_t input_map_size, off_t cmdbuf_off, size_t max_cmdbuf_size);
>> +int av_nvtegra_job_pool_uninit(AVNVTegraJobPool *pool);
>> +AVBufferRef *av_nvtegra_job_pool_get(AVNVTegraJobPool *pool);
>> +
>> +int av_nvtegra_job_submit(AVNVTegraJobPool *pool, AVNVTegraJob *job);
>> +int av_nvtegra_job_wait(AVNVTegraJobPool *pool, AVNVTegraJob *job, int timeout);
>> +
>> +static inline uint32_t av_nvtegra_map_get_handle(AVNVTegraMap *map) {
>> +#ifndef __SWITCH__
>> +    return map->handle;
>> +#else
>> +    return map->map.handle;
>> +#endif
>> +}
>> +
>> +static inline void *av_nvtegra_map_get_addr(AVNVTegraMap *map) {
>> +#ifndef __SWITCH__
>> +    return map->cpu_addr;
>> +#else
>> +    return map->map.cpu_addr;
>> +#endif
>> +}
>> +
>> +static inline uint32_t av_nvtegra_map_get_size(AVNVTegraMap *map) {
>> +#ifndef __SWITCH__
>> +    return map->size;
>> +#else
>> +    return map->map.size;
>> +#endif
>> +}
>> +
>> +/* Addresses are shifted by 8 bits in the command buffer, requiring an alignment to 256 */
>> +#define AV_NVTEGRA_MAP_ALIGN (1 << 8)
>> +
>> +#define AV_NVTEGRA_VALUE(offset, field, value)                                                    \
>> +    ((value &                                                                                     \
>> +    ((uint32_t)((UINT64_C(1) << ((1?offset ## _ ## field) - (0?offset ## _ ## field) + 1)) - 1))) \
>> +    << (0?offset ## _ ## field))
>> +
>> +#define AV_NVTEGRA_ENUM(offset, field, value)                                                     \
>> +    ((offset ## _ ## field ## _ ## value &                                                        \
>> +    ((uint32_t)((UINT64_C(1) << ((1?offset ## _ ## field) - (0?offset ## _ ## field) + 1)) - 1))) \
>> +    << (0?offset ## _ ## field))
>> +
>> +#define AV_NVTEGRA_PUSH_VALUE(cmdbuf, offset, value) ({                                  \
>> +    int _err = av_nvtegra_cmdbuf_push_value(cmdbuf, (offset) / sizeof(uint32_t), value); \
>> +    if (_err < 0)                                                                        \
>> +        return _err;                                                                     \
>> +})
>> +
>> +#define AV_NVTEGRA_PUSH_RELOC(cmdbuf, offset, target, target_offset, type) ({    \
>> +    int _err = av_nvtegra_cmdbuf_push_reloc(cmdbuf, (offset) / sizeof(uint32_t), \
>> +                                            target, target_offset, type, 8);     \
>> +    if (_err < 0)                                                                \
>> +        return _err;                                                             \
>> +})
>> +
>> +#endif /* AVUTIL_NVTEGRA_H */
>> diff --git a/libavutil/nvtegra_host1x.h b/libavutil/nvtegra_host1x.h
>> new file mode 100644
>> index 0000000000..25e37eae61
>> --- /dev/null
>> +++ b/libavutil/nvtegra_host1x.h
>> @@ -0,0 +1,94 @@
>> +/*
>> + * Copyright (c) 2024 averne <averne381@gmail.com>
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
>> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>> + */
>> +
>> +#ifndef AVUTIL_NVTEGRA_HOST1X_H
>> +#define AVUTIL_NVTEGRA_HOST1X_H
>> +
>> +#include <stdint.h>
>> +
>> +#include "macros.h"
>> +
>> +/* From L4T include/linux/host1x.h */
>> +enum host1x_class {
>> +    HOST1X_CLASS_HOST1X  = 0x01,
>> +    HOST1X_CLASS_NVENC   = 0x21,
>> +    HOST1X_CLASS_VI      = 0x30,
>> +    HOST1X_CLASS_ISPA    = 0x32,
>> +    HOST1X_CLASS_ISPB    = 0x34,
>> +    HOST1X_CLASS_GR2D    = 0x51,
>> +    HOST1X_CLASS_GR2D_SB = 0x52,
>> +    HOST1X_CLASS_VIC     = 0x5d,
>> +    HOST1X_CLASS_GR3D    = 0x60,
>> +    HOST1X_CLASS_NVJPG   = 0xc0,
>> +    HOST1X_CLASS_NVDEC   = 0xf0,
>> +};
>> +
>> +static inline uint32_t host1x_opcode_setclass(unsigned class_id, unsigned offset, unsigned mask) {
>> +    return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
>> +}
>> +
>> +static inline uint32_t host1x_opcode_incr(unsigned offset, unsigned count) {
>> +    return (1 << 28) | (offset << 16) | count;
>> +}
>> +
>> +static inline uint32_t host1x_opcode_nonincr(unsigned offset, unsigned count) {
>> +    return (2 << 28) | (offset << 16) | count;
>> +}
>> +
>> +static inline uint32_t host1x_opcode_mask(unsigned offset, unsigned mask) {
>> +    return (3 << 28) | (offset << 16) | mask;
>> +}
>> +
>> +static inline uint32_t host1x_opcode_imm(unsigned offset, unsigned value) {
>> +    return (4 << 28) | (offset << 16) | value;
>> +}
>> +
>> +#define NV_CLASS_HOST_LOAD_SYNCPT_PAYLOAD                                  (0x00000138)
>> +#define NV_CLASS_HOST_WAIT_SYNCPT                                          (0x00000140)
>> +
>> +#define NV_THI_INCR_SYNCPT                                                 (0x00000000)
>> +#define NV_THI_INCR_SYNCPT_INDX                                            7:0
>> +#define NV_THI_INCR_SYNCPT_COND                                            15:8
>> +#define NV_THI_INCR_SYNCPT_COND_IMMEDIATE                                  (0x00000000)
>> +#define NV_THI_INCR_SYNCPT_COND_OP_DONE                                    (0x00000001)
>> +#define NV_THI_INCR_SYNCPT_ERR                                             (0x00000008)
>> +#define NV_THI_INCR_SYNCPT_ERR_COND_STS_IMM                                0:0
>> +#define NV_THI_INCR_SYNCPT_ERR_COND_STS_OPDONE                             1:1
>> +#define NV_THI_CTXSW_INCR_SYNCPT                                           (0x0000000c)
>> +#define NV_THI_CTXSW_INCR_SYNCPT_INDX                                      7:0
>> +#define NV_THI_CTXSW                                                       (0x00000020)
>> +#define NV_THI_CTXSW_CURR_CLASS                                            9:0
>> +#define NV_THI_CTXSW_AUTO_ACK                                              11:11
>> +#define NV_THI_CTXSW_CURR_CHANNEL                                          15:12
>> +#define NV_THI_CTXSW_NEXT_CLASS                                            25:16
>> +#define NV_THI_CTXSW_NEXT_CHANNEL                                          31:28
>> +#define NV_THI_CONT_SYNCPT_EOF                                             (0x00000028)
>> +#define NV_THI_CONT_SYNCPT_EOF_INDEX                                       7:0
>> +#define NV_THI_CONT_SYNCPT_EOF_COND                                        8:8
>> +#define NV_THI_METHOD0                                                     (0x00000040)
>> +#define NV_THI_METHOD0_OFFSET                                              11:0
>> +#define NV_THI_METHOD1                                                     (0x00000044)
>> +#define NV_THI_METHOD1_DATA                                                31:0
>> +#define NV_THI_INT_STATUS                                                  (0x00000078)
>> +#define NV_THI_INT_STATUS_FALCON_INT                                       0:0
>> +#define NV_THI_INT_MASK                                                    (0x0000007c)
>> +#define NV_THI_INT_MASK_FALCON_INT                                         0:0
>> +
>> +#endif /* AVUTIL_NVTEGRA_HOST1X_H */
>> diff --git a/libavutil/pixdesc.c b/libavutil/pixdesc.c
>> index 1c0bcf2232..bb14b1b306 100644
>> --- a/libavutil/pixdesc.c
>> +++ b/libavutil/pixdesc.c
>> @@ -2791,6 +2791,10 @@ static const AVPixFmtDescriptor av_pix_fmt_descriptors[AV_PIX_FMT_NB] = {
>>         },
>>         .flags = AV_PIX_FMT_FLAG_PLANAR,
>>     },
>> +    [AV_PIX_FMT_NVTEGRA] = {
>> +        .name = "nvtegra",
>> +        .flags = AV_PIX_FMT_FLAG_HWACCEL,
>> +    },
>> };
>>
>> static const char * const color_range_names[] = {
>> diff --git a/libavutil/pixfmt.h b/libavutil/pixfmt.h
>> index a7f50e1690..a3213c792a 100644
>> --- a/libavutil/pixfmt.h
>> +++ b/libavutil/pixfmt.h
>> @@ -439,6 +439,14 @@ enum AVPixelFormat {
>>      */
>>     AV_PIX_FMT_D3D12,
>>
>> +    /**
>> +     * Hardware surfaces for Tegra devices.
>> +     *
>> +     * data[0..2] points to memory-mapped buffer containing frame data
>> +     * buf[0] contains an AVBufferRef to an AVNTegraMap
>> +     */
>> +    AV_PIX_FMT_NVTEGRA,
>> +
>>     AV_PIX_FMT_NB         ///< number of pixel formats, DO NOT USE THIS if you want to link with shared libav* because the number of formats might differ between versions
>> };
>>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices
  2024-05-31 21:06     ` averne
@ 2024-05-31 21:16       ` Timo Rothenpieler
  2024-06-02 18:37         ` averne
  0 siblings, 1 reply; 37+ messages in thread
From: Timo Rothenpieler @ 2024-05-31 21:16 UTC (permalink / raw)
  To: ffmpeg-devel

On 31.05.2024 23:06, averne wrote:
> Le 30/05/2024 à 22:42, Rémi Denis-Courmont a écrit :
>> Le torstaina 30. toukokuuta 2024, 22.43.05 EEST averne a écrit :
>>> These files are taken with minimal modifications from nvidia's Linux4Tegra
>>> (L4T) tree. nvmap enables management of memory-mapped buffers for hardware
>>> devices. nvhost enables interaction with different hardware modules
>>> (multimedia engines, display engine, ...), through a common block, host1x.
>>>
>>> Signed-off-by: averne <averne381@gmail.com>
>>> ---
>>>   libavutil/Makefile       |   2 +
>>>   libavutil/nvhost_ioctl.h | 511 +++++++++++++++++++++++++++++++++++++++
>>>   libavutil/nvmap_ioctl.h  | 451 ++++++++++++++++++++++++++++++++++
>>>   3 files changed, 964 insertions(+)
>>>   create mode 100644 libavutil/nvhost_ioctl.h
>>>   create mode 100644 libavutil/nvmap_ioctl.h
>>>
>>> diff --git a/libavutil/Makefile b/libavutil/Makefile
>>> index 6e6fa8d800..9c112bc58a 100644
>>> --- a/libavutil/Makefile
>>> +++ b/libavutil/Makefile
>>> @@ -52,6 +52,8 @@ HEADERS = adler32.h
>>>               \ hwcontext_videotoolbox.h
>>>   \ hwcontext_vdpau.h                                             \
>>> hwcontext_vulkan.h                                            \ +
>>> nvhost_ioctl.h                                                \ +
>>> nvmap_ioctl.h                                                 \ iamf.h
>>>                                                    \ imgutils.h
>>>                                        \ intfloat.h
>>>                            \ diff --git a/libavutil/nvhost_ioctl.h
>>> b/libavutil/nvhost_ioctl.h
>>> new file mode 100644
>>> index 0000000000..b0bf3e3ae6
>>> --- /dev/null
>>> +++ b/libavutil/nvhost_ioctl.h
>>> @@ -0,0 +1,511 @@
>>> +/*
>>> + * include/uapi/linux/nvhost_ioctl.h
>>
>> Well, then that should be provided by linux-libc-dev or equivalent. I don't
>> think that this should be vendored into FFmpeg.
>>
> 
> Agreed. On L4T this is provided by nvidia-l4t-kernel-headers, but
> on the HOS side there is no such equivalent yet. If this patch
> series moves forward, I will integrate the relevant bits in libnx
> and get rid of those headers.
> As for the hardware definitions (in the following patch), I think
> they should be put in nv-codec-headers.

I disagree there, the nv-codec-headers track the upstream codec SDK.
Making it track two independent things would be a mess.

This patchset is implementing parts of an entire video-driver into 
FFmpeg. Which really looks out of scope to me.
Can't all that be moved into a library, which then also comes with the 
necessary headers for applications to use it?

On that note, how stable are those headers? Given they're part of the 
nvidia driver, couldn't they randomly change at any time?
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory
  2024-05-31 21:06     ` averne
@ 2024-05-31 21:44       ` Michael Niedermayer
  2024-06-02 18:37         ` averne
  2024-06-01  6:59       ` Rémi Denis-Courmont
  1 sibling, 1 reply; 37+ messages in thread
From: Michael Niedermayer @ 2024-05-31 21:44 UTC (permalink / raw)
  To: FFmpeg development discussions and patches


[-- Attachment #1.1: Type: text/plain, Size: 2209 bytes --]

On Fri, May 31, 2024 at 11:06:49PM +0200, averne wrote:
> Le 30/05/2024 à 22:38, Rémi Denis-Courmont a écrit :
> > Le torstaina 30. toukokuuta 2024, 22.43.03 EEST averne a écrit :
> >> This is useful eg. for memory-mapped buffers that need page-aligned memory,
> >> when dealing with hardware devices
> >>
> >> Signed-off-by: averne <averne381@gmail.com>
> >> ---
> >>  libavutil/buffer.c | 31 +++++++++++++++++++++++++++++++
> >>  libavutil/buffer.h |  7 +++++++
> >>  2 files changed, 38 insertions(+)
> >>
> >> diff --git a/libavutil/buffer.c b/libavutil/buffer.c
> >> index e4562a79b1..b8e357f540 100644
> >> --- a/libavutil/buffer.c
> >> +++ b/libavutil/buffer.c
> >> @@ -16,9 +16,14 @@
> >>   * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> >> USA */
> >>
> >> +#include "config.h"
> >> +
> >>  #include <stdatomic.h>
> >>  #include <stdint.h>
> >>  #include <string.h>
> >> +#if HAVE_MALLOC_H
> >> +#include <malloc.h>
> >> +#endif
> >>
> >>  #include "avassert.h"
> >>  #include "buffer_internal.h"
> >> @@ -100,6 +105,32 @@ AVBufferRef *av_buffer_allocz(size_t size)
> >>      return ret;
> >>  }
> >>
> >> +AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align)
> >> +{
> >> +    AVBufferRef *ret = NULL;
> >> +    uint8_t    *data = NULL;
> >> +
> >> +#if HAVE_POSIX_MEMALIGN
> >> +    if (posix_memalign((void **)&data, align, size))
> > 
> > Invalid cast.
> > 
> 
> Neither gcc or clang emit a warning here, even on -Weverything.
> What would be your idea of a valid cast then? First cast to intptr_t, 
> then void** ?
> 
> >> +        return NULL;
> >> +#elif HAVE_ALIGNED_MALLOC
> >> +    data = aligned_alloc(align, size);

on mingw64:

src/libavutil/buffer.c: In function ‘av_buffer_aligned_alloc’:
src/libavutil/buffer.c:117:12: error: implicit declaration of function ‘aligned_alloc’ [-Werror=implicit-function-declaration]
  117 |     data = aligned_alloc(align, size);
      |            ^~~~~~~~~~~~~

thx

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

There will always be a question for which you do not know the correct answer.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 251 bytes --]

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory
  2024-05-31 21:06     ` averne
  2024-05-31 21:44       ` Michael Niedermayer
@ 2024-06-01  6:59       ` Rémi Denis-Courmont
  1 sibling, 0 replies; 37+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-01  6:59 UTC (permalink / raw)
  To: FFmpeg development discussions and patches



Le 1 juin 2024 00:06:49 GMT+03:00, averne <averne381@gmail.com> a écrit :
>Le 30/05/2024 à 22:38, Rémi Denis-Courmont a écrit :
>> Le torstaina 30. toukokuuta 2024, 22.43.03 EEST averne a écrit :
>>> This is useful eg. for memory-mapped buffers that need page-aligned memory,
>>> when dealing with hardware devices
>>>
>>> Signed-off-by: averne <averne381@gmail.com>
>>> ---

>> Invalid cast.
>> 
>
>Neither gcc or clang emit a warning here, even on -Weverything.
>What would be your idea of a valid cast then? First cast to intptr_t, 
>then void** ?

They don't warn because the explicit cast conventionally mutes the warning. Remove it and you'll get the warning.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra
  2024-05-31 21:06     ` averne
@ 2024-06-01  7:29       ` Rémi Denis-Courmont
  0 siblings, 0 replies; 37+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-01  7:29 UTC (permalink / raw)
  To: ffmpeg-devel

Le lauantaina 1. kesäkuuta 2024, 0.06.55 EEST averne a écrit :
> As for your second question, I probably should've given some
> context about this decision. Initially I thought about writing a
> vaapi driver, but for a number of reasons I decided against it.

VA-API would be difficult anyway as it is tied to Linux DRM. My question remains 
though: why isn't this implemented with the NVDEC API? FFmpeg already support 
NVDEC natively with the same set of codecs as this patchset, and presumably 
with all the necessary codec state tracking.

This would also address Timo's concerns by moving the driver into a library, 
and shield FFmpeg from hypothetical L4T ABI breaks.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices
  2024-05-31 21:16       ` Timo Rothenpieler
@ 2024-06-02 18:37         ` averne
  0 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-06-02 18:37 UTC (permalink / raw)
  To: ffmpeg-devel

Le 31/05/2024 à 23:16, Timo Rothenpieler a écrit :
> On 31.05.2024 23:06, averne wrote:
>> Le 30/05/2024 à 22:42, Rémi Denis-Courmont a écrit :
>>> Le torstaina 30. toukokuuta 2024, 22.43.05 EEST averne a écrit :
>>>> These files are taken with minimal modifications from nvidia's Linux4Tegra
>>>> (L4T) tree. nvmap enables management of memory-mapped buffers for hardware
>>>> devices. nvhost enables interaction with different hardware modules
>>>> (multimedia engines, display engine, ...), through a common block, host1x.
>>>>
>>>> Signed-off-by: averne <averne381@gmail.com>
>>>> ---
>>>>   libavutil/Makefile       |   2 +
>>>>   libavutil/nvhost_ioctl.h | 511 +++++++++++++++++++++++++++++++++++++++
>>>>   libavutil/nvmap_ioctl.h  | 451 ++++++++++++++++++++++++++++++++++
>>>>   3 files changed, 964 insertions(+)
>>>>   create mode 100644 libavutil/nvhost_ioctl.h
>>>>   create mode 100644 libavutil/nvmap_ioctl.h
>>>>
>>>> diff --git a/libavutil/Makefile b/libavutil/Makefile
>>>> index 6e6fa8d800..9c112bc58a 100644
>>>> --- a/libavutil/Makefile
>>>> +++ b/libavutil/Makefile
>>>> @@ -52,6 +52,8 @@ HEADERS = adler32.h
>>>>               \ hwcontext_videotoolbox.h
>>>>   \ hwcontext_vdpau.h                                             \
>>>> hwcontext_vulkan.h                                            \ +
>>>> nvhost_ioctl.h                                                \ +
>>>> nvmap_ioctl.h                                                 \ iamf.h
>>>>                                                    \ imgutils.h
>>>>                                        \ intfloat.h
>>>>                            \ diff --git a/libavutil/nvhost_ioctl.h
>>>> b/libavutil/nvhost_ioctl.h
>>>> new file mode 100644
>>>> index 0000000000..b0bf3e3ae6
>>>> --- /dev/null
>>>> +++ b/libavutil/nvhost_ioctl.h
>>>> @@ -0,0 +1,511 @@
>>>> +/*
>>>> + * include/uapi/linux/nvhost_ioctl.h
>>>
>>> Well, then that should be provided by linux-libc-dev or equivalent. I don't
>>> think that this should be vendored into FFmpeg.
>>>
>>
>> Agreed. On L4T this is provided by nvidia-l4t-kernel-headers, but
>> on the HOS side there is no such equivalent yet. If this patch
>> series moves forward, I will integrate the relevant bits in libnx
>> and get rid of those headers.
>> As for the hardware definitions (in the following patch), I think
>> they should be put in nv-codec-headers.
> 
> I disagree there, the nv-codec-headers track the upstream codec SDK.
> Making it track two independent things would be a mess.
> 
> This patchset is implementing parts of an entire video-driver into FFmpeg. Which really looks out of scope to me.
> Can't all that be moved into a library, which then also comes with the necessary headers for applications to use it?
> 
> On that note, how stable are those headers? Given they're part of the nvidia driver, couldn't they randomly change at any time?

Hi, firstly those uapi headers are stable as can be. They haven't 
changed throughout all the L4T releases, from the TX1 to the latest 
SoC. You can check the packages here: 
https://repo.download.nvidia.com/jetson. 
Ideally I would use a new uapi nvidia has been working on upstream: 
https://github.com/torvalds/linux/blob/master/include/uapi/drm/tegra_drm.h. 
Unfortunately, the kernel that comes with the Jetson Nano is too old 
for it. The Switch also doesn't have this API.
Secondly, the hardware headers in the next patch cannot change, 
since they represent hardware structures and constants. Also one can 
note how nvidia preserves backwards compatibility between engine 
releases: new features are programmed through extensions to the main 
structures (see for instance the v1/v2/v3 substructures in 
nvdec_hevc_pic_s).

I'll also respond to Rémi's question here. The nvdec hwaccel relies 
on the cuda framework, which is present on L4T (not cuvid though) 
but not the Switch. So I would have to implement a good part of the 
cuda runtime on the Switch, while making FFmpeg and cuda on L4T happy 
about it. That feels like even more work than a vaapi implementation, 
and frankly, as an unpaid hobbyist, I really don't feel like doing 
what should be nvidia's job.
That leaves the option of moving the code to a library of my own and 
abstract away the hardware details. I don't think that comes with much 
benefit, except moving parts of the code outside FFmpeg, and you'll 
notice there isn't so much of it (the decoder implementations are 
usually shorter than their vaapi counterpart, for instance). It also 
comes at some performance cost, like I said in my previous answer to 
Rémi. 
Finally, I'm not sure coming up with yet another hwaccel API for a 
niche platform would be any more acceptable for upstreaming into 
FFmpeg.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory
  2024-05-31 21:44       ` Michael Niedermayer
@ 2024-06-02 18:37         ` averne
  0 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-06-02 18:37 UTC (permalink / raw)
  To: ffmpeg-devel

Le 31/05/2024 à 23:44, Michael Niedermayer a écrit :
> On Fri, May 31, 2024 at 11:06:49PM +0200, averne wrote:
>> Le 30/05/2024 à 22:38, Rémi Denis-Courmont a écrit :
>>> Le torstaina 30. toukokuuta 2024, 22.43.03 EEST averne a écrit :
>>>> This is useful eg. for memory-mapped buffers that need page-aligned memory,
>>>> when dealing with hardware devices
>>>>
>>>> Signed-off-by: averne <averne381@gmail.com>
>>>> ---
>>>>  libavutil/buffer.c | 31 +++++++++++++++++++++++++++++++
>>>>  libavutil/buffer.h |  7 +++++++
>>>>  2 files changed, 38 insertions(+)
>>>>
>>>> diff --git a/libavutil/buffer.c b/libavutil/buffer.c
>>>> index e4562a79b1..b8e357f540 100644
>>>> --- a/libavutil/buffer.c
>>>> +++ b/libavutil/buffer.c
>>>> @@ -16,9 +16,14 @@
>>>>   * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
>>>> USA */
>>>>
>>>> +#include "config.h"
>>>> +
>>>>  #include <stdatomic.h>
>>>>  #include <stdint.h>
>>>>  #include <string.h>
>>>> +#if HAVE_MALLOC_H
>>>> +#include <malloc.h>
>>>> +#endif
>>>>
>>>>  #include "avassert.h"
>>>>  #include "buffer_internal.h"
>>>> @@ -100,6 +105,32 @@ AVBufferRef *av_buffer_allocz(size_t size)
>>>>      return ret;
>>>>  }
>>>>
>>>> +AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align)
>>>> +{
>>>> +    AVBufferRef *ret = NULL;
>>>> +    uint8_t    *data = NULL;
>>>> +
>>>> +#if HAVE_POSIX_MEMALIGN
>>>> +    if (posix_memalign((void **)&data, align, size))
>>>
>>> Invalid cast.
>>>
>>
>> Neither gcc or clang emit a warning here, even on -Weverything.
>> What would be your idea of a valid cast then? First cast to intptr_t, 
>> then void** ?
>>
>>>> +        return NULL;
>>>> +#elif HAVE_ALIGNED_MALLOC
>>>> +    data = aligned_alloc(align, size);
> 
> on mingw64:
> 
> src/libavutil/buffer.c: In function ‘av_buffer_aligned_alloc’:
> src/libavutil/buffer.c:117:12: error: implicit declaration of function ‘aligned_alloc’ [-Werror=implicit-function-declaration]
>   117 |     data = aligned_alloc(align, size);
>       |            ^~~~~~~~~~~~~
> 
> thx

Missing include of stdlib.h, probably. Will fix, thanks.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra averne
  2024-05-31  8:32   ` Rémi Denis-Courmont
@ 2024-06-05 20:29   ` Mark Thompson
  2024-06-29 19:35     ` averne
  1 sibling, 1 reply; 37+ messages in thread
From: Mark Thompson @ 2024-06-05 20:29 UTC (permalink / raw)
  To: ffmpeg-devel

On 30/05/2024 20:43, averne wrote:
> This includes a new pixel format for nvtegra hardware frames, and several objects for interaction with hardware blocks.
> In particular, this contains code for channels (handles to hardware engines), maps (memory-mapped buffers shared with engines), and command buffers (abstraction for building command lists sent to the engines).
> 
> Signed-off-by: averne <averne381@gmail.com>
> ---
>  configure                  |    2 +
>  libavutil/Makefile         |    4 +
>  libavutil/nvtegra.c        | 1035 ++++++++++++++++++++++++++++++++++++
>  libavutil/nvtegra.h        |  258 +++++++++
>  libavutil/nvtegra_host1x.h |   94 ++++
>  libavutil/pixdesc.c        |    4 +
>  libavutil/pixfmt.h         |    8 +
>  7 files changed, 1405 insertions(+)
>  create mode 100644 libavutil/nvtegra.c
>  create mode 100644 libavutil/nvtegra.h
>  create mode 100644 libavutil/nvtegra_host1x.h

I don't think it is reasonable for all of this to be public API surface of ffmpeg.

A separate library containing the headers and exposing some set of functions like this might make more sense.

If this has to be in ffmpeg then it really needs to all go in one library (libavcodec I guess) so that it's not exposing all this internal detail in the public API.

Thanks,

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext averne
@ 2024-06-05 20:47   ` Mark Thompson
  2024-06-29 19:35     ` averne
  0 siblings, 1 reply; 37+ messages in thread
From: Mark Thompson @ 2024-06-05 20:47 UTC (permalink / raw)
  To: ffmpeg-devel

On 30/05/2024 20:43, averne wrote:
> This includes hwdevice and hwframes objects.
> As the multimedia engines work with tiled surfaces (block linear in nvidia jargon), two frame transfer methods are implemented.
> The first makes use of the VIC to perform the copy. Since some revisions of the VIC (such as the one found in the tegra X1) did not support 10+ bit formats, these go through two separate copy steps for the luma and chroma planes.
> The second method copies on the CPU, and is used as a fallback if the VIC constraints are not satisfied.
> 
> Signed-off-by: averne <averne381@gmail.com>
> ---
>  libavutil/Makefile             |   7 +-
>  libavutil/hwcontext.c          |   4 +
>  libavutil/hwcontext.h          |   1 +
>  libavutil/hwcontext_internal.h |   1 +
>  libavutil/hwcontext_nvtegra.c  | 880 +++++++++++++++++++++++++++++++++
>  libavutil/hwcontext_nvtegra.h  |  85 ++++
>  6 files changed, 976 insertions(+), 2 deletions(-)
>  create mode 100644 libavutil/hwcontext_nvtegra.c
>  create mode 100644 libavutil/hwcontext_nvtegra.h
> 
> ...> +
> +static int nvtegra_transfer_data(AVHWFramesContext *ctx, AVFrame *dst, const AVFrame *src) {
> +    const AVFrame *swframe;
> +    bool from;
> +    int num_planes, i;
> +
> +    from    = !dst->hw_frames_ctx;
> +    swframe = from ? dst : src;
> +
> +    if (swframe->hw_frames_ctx)
> +        return AVERROR(ENOSYS);
> +
> +    num_planes = av_pix_fmt_count_planes(swframe->format);
> +
> +    for (i = 0; i < num_planes; ++i) {
> +        if (((uintptr_t)swframe->data[i] & 0xff) || (swframe->linesize[i] & 0xff)) {
> +            av_log(ctx, AV_LOG_WARNING, "Frame address/pitch not aligned to 256, "
> +                                        "falling back to cpu transfer\n");
> +            return nvtegra_cpu_transfer_data(ctx, dst, src, num_planes, from);

Are you doing something somewhere to avoid this case?  It seems like it should be the normal one (given alignment is typically set signficantly lower than 256), so this warning would be very spammy.

> +        }
> +    }
> +
> +    return nvtegra_vic_transfer_data(ctx, dst, src, num_planes, from);
> +}
> +
> +const HWContextType ff_hwcontext_type_nvtegra = {
> +    .type                   = AV_HWDEVICE_TYPE_NVTEGRA,
> +    .name                   = "nvtegra",
> +
> +    .device_hwctx_size      = sizeof(NVTegraDevicePriv),
> +    .device_hwconfig_size   = 0,
> +    .frames_hwctx_size      = 0,
> +
> +    .device_create          = &nvtegra_device_create,
> +    .device_init            = &nvtegra_device_init,
> +    .device_uninit          = &nvtegra_device_uninit,
> +
> +    .frames_get_constraints = &nvtegra_frames_get_constraints,
> +    .frames_init            = &nvtegra_frames_init,
> +    .frames_uninit          = &nvtegra_frames_uninit,
> +    .frames_get_buffer      = &nvtegra_get_buffer,
> +
> +    .transfer_get_formats   = &nvtegra_transfer_get_formats,
> +    .transfer_data_to       = &nvtegra_transfer_data,
> +    .transfer_data_from     = &nvtegra_transfer_data,
> +
> +    .pix_fmts = (const enum AVPixelFormat[]) {
> +        AV_PIX_FMT_NVTEGRA,
> +        AV_PIX_FMT_NONE,
> +    },
> +};

What controls whether frames are linear or not?

It seems like the linear case could be exposed directly to the user rather than having to wrap it like this - the decoder could return read-only NV12 (or whatever) frames directly and they would work with other components.

> diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h
> new file mode 100644
> index 0000000000..8a2383d304
> --- /dev/null
> +++ b/libavutil/hwcontext_nvtegra.h
> @@ -0,0 +1,85 @@
> +/*
> + * Copyright (c) 2024 averne <averne381@gmail.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
> + */
> +
> +#ifndef AVUTIL_HWCONTEXT_NVTEGRA_H
> +#define AVUTIL_HWCONTEXT_NVTEGRA_H
> +
> +#include <stdint.h>
> +
> +#include "hwcontext.h"
> +#include "buffer.h"
> +#include "frame.h"
> +#include "pixfmt.h"
> +
> +#include "nvtegra.h"
> +
> +/*
> + * Encode a hardware revision into a version number
> + */
> +#define AV_NVTEGRA_ENCODE_REV(maj, min) (((maj & 0xff) << 8) | (min & 0xff))
> +
> +/*
> + * Decode a version number
> + */
> +static inline void av_nvtegra_decode_rev(int rev, int *maj, int *min) {
> +    *maj = (rev >> 8) & 0xff;
> +    *min = (rev >> 0) & 0xff;
> +}
> +
> +/**
> + * @file
> + * API-specific header for AV_HWDEVICE_TYPE_NVTEGRA.
> + *
> + * For user-allocated pools, AVHWFramesContext.pool must return AVBufferRefs
> + * with the data pointer set to an AVNVTegraMap.
> + */
> +
> +typedef struct AVNVTegraDeviceContext {
> +    /*
> +     * Hardware multimedia engines
> +     */
> +    AVNVTegraChannel nvdec_channel, nvenc_channel, nvjpg_channel, vic_channel;

Does a user need to supply all of these when making a device?

> +
> +    /*
> +     * Hardware revisions for associated engines, or 0 if invalid
> +     */
> +    int nvdec_version, nvenc_version, nvjpg_version, vic_version;

Why does a user setting up a device context need to supply the version numbers for each thing?

> +} AVNVTegraDeviceContext;
> +
> +typedef struct AVNVTegraFrame {
> +    /*
> +     * Reference to an AVNVTegraMap object
> +     */
> +    AVBufferRef *map_ref;
> +} AVNVTegraFrame;

What is the indirection doing here?  Can't it return the buffer inside this structure instead of making an intermediate structure?

> +
> +/*
> + * Helper to retrieve a map object from the corresponding frame
> + */
> +static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame) {
> +    return (AVNVTegraMap *)((AVNVTegraFrame *)frame->buf[0]->data)->map_ref->data;
> +}
> +
> +/*
> + * Converts a pixel format to the equivalent code for the VIC engine
> + */
> +int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt);
> +
> +#endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */

The mix of normal implementation code and weird specific detail for the particular platform is pretty nasty.  It does seem like exporting more of this into a separate library rather than embedding it in ffmpeg would be a good idea.

Thanks,

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 07/16] hwcontext_nvtegra: add dynamic frequency scaling routines
  2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 07/16] hwcontext_nvtegra: add dynamic frequency scaling routines averne
@ 2024-06-05 20:50   ` Mark Thompson
  2024-06-29 19:35     ` averne
  0 siblings, 1 reply; 37+ messages in thread
From: Mark Thompson @ 2024-06-05 20:50 UTC (permalink / raw)
  To: ffmpeg-devel

On 30/05/2024 20:43, averne wrote:
> To save on energy, the clock speed of multimedia engines should be adapted to their workload.
> 
> Signed-off-by: averne <averne381@gmail.com>
> ---
>  libavutil/hwcontext_nvtegra.c | 165 ++++++++++++++++++++++++++++++++++
>  libavutil/hwcontext_nvtegra.h |   7 ++
>  2 files changed, 172 insertions(+)
> 
> ...
> diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h
> index 8a2383d304..7c845951d9 100644
> --- a/libavutil/hwcontext_nvtegra.h
> +++ b/libavutil/hwcontext_nvtegra.h
> @@ -82,4 +82,11 @@ static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame)
>   */
>  int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt);
>  
> +/*
> + * Dynamic frequency scaling routines
> + */
> +int av_nvtegra_dfs_init(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int width, int height, double framerate_hz);
> +int av_nvtegra_dfs_update(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int bitstream_len, int decode_cycles);
> +int av_nvtegra_dfs_uninit(AVHWDeviceContext *ctx, AVNVTegraChannel *channel);
> +
>  #endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */

This really isn't a sensible thing to have in the public API of ffmpeg.  Why on earth isn't this sort of detail dealt with by the kernel?  (Which can actually see all of the different processes using it, as well.)

Thanks,

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra
  2024-06-05 20:29   ` Mark Thompson
@ 2024-06-29 19:35     ` averne
  0 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-06-29 19:35 UTC (permalink / raw)
  To: ffmpeg-devel

Le 05/06/2024 à 22:29, Mark Thompson a écrit :
> On 30/05/2024 20:43, averne wrote:
>> This includes a new pixel format for nvtegra hardware frames, and several objects for interaction with hardware blocks.
>> In particular, this contains code for channels (handles to hardware engines), maps (memory-mapped buffers shared with engines), and command buffers (abstraction for building command lists sent to the engines).
>>
>> Signed-off-by: averne <averne381@gmail.com>
>> ---
>>  configure                  |    2 +
>>  libavutil/Makefile         |    4 +
>>  libavutil/nvtegra.c        | 1035 ++++++++++++++++++++++++++++++++++++
>>  libavutil/nvtegra.h        |  258 +++++++++
>>  libavutil/nvtegra_host1x.h |   94 ++++
>>  libavutil/pixdesc.c        |    4 +
>>  libavutil/pixfmt.h         |    8 +
>>  7 files changed, 1405 insertions(+)
>>  create mode 100644 libavutil/nvtegra.c
>>  create mode 100644 libavutil/nvtegra.h
>>  create mode 100644 libavutil/nvtegra_host1x.h
>
> I don't think it is reasonable for all of this to be public API surface of ffmpeg.
>
> A separate library containing the headers and exposing some set of functions like this might make more sense.
>
> If this has to be in ffmpeg then it really needs to all go in one library (libavcodec I guess) so that it's not exposing all this internal detail in the public API.
>
> Thanks,
>
> - Mark

Sorry for the delayed answer.
I'm considering writing a library to abstract all those platform 
details (buffer allocation, channel management, job submission, command 
buffers, synchronization, frequency scaling), essentially replacing 
all of the nvtegra.c and much of hwcontext_nvtegra.c code. The library 
would also handle frame transfers, since that code is somewhat low-level 
and multiple methods can be used depending on the situation (VIC on 
Tegra platforms, the GPU copy and 2D engines, or CPU copies on platforms 
with unified memory).
This would also shield FFmpeg from API changes, though I don't think 
this was ever an actual problem.
The FFmpeg-side code would remain in charge of building the metadata 
sent to the decode hardware, but multiple platforms could be supported 
with very minimal #ifdef juggling. I'm especially interested in discrete 
GPUs on linux.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext
  2024-06-05 20:47   ` Mark Thompson
@ 2024-06-29 19:35     ` averne
  0 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-06-29 19:35 UTC (permalink / raw)
  To: ffmpeg-devel

Le 05/06/2024 à 22:47, Mark Thompson a écrit :> 
> On 30/05/2024 20:43, averne wrote:
>> This includes hwdevice and hwframes objects.
>> As the multimedia engines work with tiled surfaces (block linear in nvidia jargon), two frame transfer methods are implemented.
>> The first makes use of the VIC to perform the copy. Since some revisions of the VIC (such as the one found in the tegra X1) did not support 10+ bit formats, these go through two separate copy steps for the luma and chroma planes.
>> The second method copies on the CPU, and is used as a fallback if the VIC constraints are not satisfied.
>>
>> Signed-off-by: averne <averne381@gmail.com>
>> ---
>>  libavutil/Makefile             |   7 +-
>>  libavutil/hwcontext.c          |   4 +
>>  libavutil/hwcontext.h          |   1 +
>>  libavutil/hwcontext_internal.h |   1 +
>>  libavutil/hwcontext_nvtegra.c  | 880 +++++++++++++++++++++++++++++++++
>>  libavutil/hwcontext_nvtegra.h  |  85 ++++
>>  6 files changed, 976 insertions(+), 2 deletions(-)
>>  create mode 100644 libavutil/hwcontext_nvtegra.c
>>  create mode 100644 libavutil/hwcontext_nvtegra.h
>>
>> ...> +
>> +static int nvtegra_transfer_data(AVHWFramesContext *ctx, AVFrame *dst, const AVFrame *src) {
>> +    const AVFrame *swframe;
>> +    bool from;
>> +    int num_planes, i;
>> +
>> +    from    = !dst->hw_frames_ctx;
>> +    swframe = from ? dst : src;
>> +
>> +    if (swframe->hw_frames_ctx)
>> +        return AVERROR(ENOSYS);
>> +
>> +    num_planes = av_pix_fmt_count_planes(swframe->format);
>> +
>> +    for (i = 0; i < num_planes; ++i) {
>> +        if (((uintptr_t)swframe->data[i] & 0xff) || (swframe->linesize[i] & 0xff)) {
>> +            av_log(ctx, AV_LOG_WARNING, "Frame address/pitch not aligned to 256, "
>> +                                        "falling back to cpu transfer\n");
>> +            return nvtegra_cpu_transfer_data(ctx, dst, src, num_planes, from);
>
> Are you doing something somewhere to avoid this case?  It seems like it should be the normal one (given alignment is typically set signficantly lower than 256), so this warning would be very spammy.

Nothing is done on the FFmpeg side. I could print this warning 
just once per frames context.

>> +        }
>> +    }
>> +
>> +    return nvtegra_vic_transfer_data(ctx, dst, src, num_planes, from);
>> +}
>> +
>> +const HWContextType ff_hwcontext_type_nvtegra = {
>> +    .type                   = AV_HWDEVICE_TYPE_NVTEGRA,
>> +    .name                   = "nvtegra",
>> +
>> +    .device_hwctx_size      = sizeof(NVTegraDevicePriv),
>> +    .device_hwconfig_size   = 0,
>> +    .frames_hwctx_size      = 0,
>> +
>> +    .device_create          = &nvtegra_device_create,
>> +    .device_init            = &nvtegra_device_init,
>> +    .device_uninit          = &nvtegra_device_uninit,
>> +
>> +    .frames_get_constraints = &nvtegra_frames_get_constraints,
>> +    .frames_init            = &nvtegra_frames_init,
>> +    .frames_uninit          = &nvtegra_frames_uninit,
>> +    .frames_get_buffer      = &nvtegra_get_buffer,
>> +
>> +    .transfer_get_formats   = &nvtegra_transfer_get_formats,
>> +    .transfer_data_to       = &nvtegra_transfer_data,
>> +    .transfer_data_from     = &nvtegra_transfer_data,
>> +
>> +    .pix_fmts = (const enum AVPixelFormat[]) {
>> +        AV_PIX_FMT_NVTEGRA,
>> +        AV_PIX_FMT_NONE,
>> +    },
>> +};
>
> What controls whether frames are linear or not?
>
> It seems like the linear case could be exposed directly to the user rather than having to wrap it like this - the decoder could return read-only NV12 (or whatever) frames directly and they would work with other components.

NVDEC can only output in block linear format (tiled layout). I'm not 
very keen on exposing that data, considering cache management can be 
somewhat tricky. Typically, after a decode operations, the CPU cache
must be invalidated. 

>> diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h
>> new file mode 100644
>> index 0000000000..8a2383d304
>> --- /dev/null
>> +++ b/libavutil/hwcontext_nvtegra.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * Copyright (c) 2024 averne <averne381@gmail.com>
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
>> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>> + */
>> +
>> +#ifndef AVUTIL_HWCONTEXT_NVTEGRA_H
>> +#define AVUTIL_HWCONTEXT_NVTEGRA_H
>> +
>> +#include <stdint.h>
>> +
>> +#include "hwcontext.h"
>> +#include "buffer.h"
>> +#include "frame.h"

Le 05/06/2024 à 22:47, Mark Thompson a écrit :
> On 30/05/2024 20:43, averne wrote:
>> This includes hwdevice and hwframes objects.
>> As the multimedia engines work with tiled surfaces (block linear in nvidia jargon), two frame transfer methods are implemented.
>> The first makes use of the VIC to perform the copy. Since some revisions of the VIC (such as the one found in the tegra X1) did not support 10+ bit formats, these go through two separate copy steps for the luma and chroma planes.
>> The second method copies on the CPU, and is used as a fallback if the VIC constraints are not satisfied.
>>
>> Signed-off-by: averne <averne381@gmail.com>
>> ---
>>  libavutil/Makefile             |   7 +-
>>  libavutil/hwcontext.c          |   4 +
>>  libavutil/hwcontext.h          |   1 +
>>  libavutil/hwcontext_internal.h |   1 +
>>  libavutil/hwcontext_nvtegra.c  | 880 +++++++++++++++++++++++++++++++++
>>  libavutil/hwcontext_nvtegra.h  |  85 ++++
>>  6 files changed, 976 insertions(+), 2 deletions(-)
>>  create mode 100644 libavutil/hwcontext_nvtegra.c
>>  create mode 100644 libavutil/hwcontext_nvtegra.h
>>
>> ...> +
>> +static int nvtegra_transfer_data(AVHWFramesContext *ctx, AVFrame *dst, const AVFrame *src) {
>> +    const AVFrame *swframe;
>> +    bool from;
>> +    int num_planes, i;
>> +
>> +    from    = !dst->hw_frames_ctx;
>> +    swframe = from ? dst : src;
>> +
>> +    if (swframe->hw_frames_ctx)
>> +        return AVERROR(ENOSYS);
>> +
>> +    num_planes = av_pix_fmt_count_planes(swframe->format);
>> +
>> +    for (i = 0; i < num_planes; ++i) {
>> +        if (((uintptr_t)swframe->data[i] & 0xff) || (swframe->linesize[i] & 0xff)) {
>> +            av_log(ctx, AV_LOG_WARNING, "Frame address/pitch not aligned to 256, "
>> +                                        "falling back to cpu transfer\n");
>> +            return nvtegra_cpu_transfer_data(ctx, dst, src, num_planes, from);
> 
> Are you doing something somewhere to avoid this case?  It seems like it should be the normal one (given alignment is typically set signficantly lower than 256), so this warning would be very spammy.
> 
>> +        }
>> +    }
>> +
>> +    return nvtegra_vic_transfer_data(ctx, dst, src, num_planes, from);
>> +}
>> +
>> +const HWContextType ff_hwcontext_type_nvtegra = {
>> +    .type                   = AV_HWDEVICE_TYPE_NVTEGRA,
>> +    .name                   = "nvtegra",
>> +
>> +    .device_hwctx_size      = sizeof(NVTegraDevicePriv),
>> +    .device_hwconfig_size   = 0,
>> +    .frames_hwctx_size      = 0,
>> +
>> +    .device_create          = &nvtegra_device_create,
>> +    .device_init            = &nvtegra_device_init,
>> +    .device_uninit          = &nvtegra_device_uninit,
>> +
>> +    .frames_get_constraints = &nvtegra_frames_get_constraints,
>> +    .frames_init            = &nvtegra_frames_init,
>> +    .frames_uninit          = &nvtegra_frames_uninit,
>> +    .frames_get_buffer      = &nvtegra_get_buffer,
>> +
>> +    .transfer_get_formats   = &nvtegra_transfer_get_formats,
>> +    .transfer_data_to       = &nvtegra_transfer_data,
>> +    .transfer_data_from     = &nvtegra_transfer_data,
>> +
>> +    .pix_fmts = (const enum AVPixelFormat[]) {
>> +        AV_PIX_FMT_NVTEGRA,
>> +        AV_PIX_FMT_NONE,
>> +    },
>> +};
> 
> What controls whether frames are linear or not?
> 
> It seems like the linear case could be exposed directly to the user rather than having to wrap it like this - the decoder could return read-only NV12 (or whatever) frames directly and they would work with other components.
> 
>> diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h
>> new file mode 100644
>> index 0000000000..8a2383d304
>> --- /dev/null
>> +++ b/libavutil/hwcontext_nvtegra.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * Copyright (c) 2024 averne <averne381@gmail.com>
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
>> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
>> + */
>> +
>> +#ifndef AVUTIL_HWCONTEXT_NVTEGRA_H
>> +#define AVUTIL_HWCONTEXT_NVTEGRA_H
>> +
>> +#include <stdint.h>
>> +
>> +#include "hwcontext.h"
>> +#include "buffer.h"
>> +#include "frame.h"
>> +#include "pixfmt.h"
>> +
>> +#include "nvtegra.h"
>> +
>> +/*
>> + * Encode a hardware revision into a version number
>> + */
>> +#define AV_NVTEGRA_ENCODE_REV(maj, min) (((maj & 0xff) << 8) | (min & 0xff))
>> +
>> +/*
>> + * Decode a version number
>> + */
>> +static inline void av_nvtegra_decode_rev(int rev, int *maj, int *min) {
>> +    *maj = (rev >> 8) & 0xff;
>> +    *min = (rev >> 0) & 0xff;
>> +}
>> +
>> +/**
>> + * @file
>> + * API-specific header for AV_HWDEVICE_TYPE_NVTEGRA.
>> + *
>> + * For user-allocated pools, AVHWFramesContext.pool must return AVBufferRefs
>> + * with the data pointer set to an AVNVTegraMap.
>> + */
>> +
>> +typedef struct AVNVTegraDeviceContext {
>> +    /*
>> +     * Hardware multimedia engines
>> +     */
>> +    AVNVTegraChannel nvdec_channel, nvenc_channel, nvjpg_channel, vic_channel;
> 
> Does a user need to supply all of these when making a device?
> 
>> +
>> +    /*
>> +     * Hardware revisions for associated engines, or 0 if invalid
>> +     */
>> +    int nvdec_version, nvenc_version, nvjpg_version, vic_version;
> 
> Why does a user setting up a device context need to supply the version numbers for each thing?
> 
>> +} AVNVTegraDeviceContext;
>> +
>> +typedef struct AVNVTegraFrame {
>> +    /*
>> +     * Reference to an AVNVTegraMap object
>> +     */
>> +    AVBufferRef *map_ref;
>> +} AVNVTegraFrame;
> 
> What is the indirection doing here?  Can't it return the buffer inside this structure instead of making an intermediate structure?
> 
>> +
>> +/*
>> + * Helper to retrieve a map object from the corresponding frame
>> + */
>> +static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame) {
>> +    return (AVNVTegraMap *)((AVNVTegraFrame *)frame->buf[0]->data)->map_ref->data;
>> +}
>> +
>> +/*
>> + * Converts a pixel format to the equivalent code for the VIC engine
>> + */
>> +int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt);
>> +
>> +#endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */
> 
> The mix of normal implementation code and weird specific detail for the particular platform is pretty nasty.  It does seem like exporting more of this into a separate library rather than embedding it in ffmpeg would be a good idea.
> 
> Thanks,
> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>> +#include "pixfmt.h"
>> +
>> +#include "nvtegra.h"
>> +
>> +/*
>> + * Encode a hardware revision into a version number
>> + */
>> +#define AV_NVTEGRA_ENCODE_REV(maj, min) (((maj & 0xff) << 8) | (min & 0xff))
>> +
>> +/*
>> + * Decode a version number
>> + */
>> +static inline void av_nvtegra_decode_rev(int rev, int *maj, int *min) {
>> +    *maj = (rev >> 8) & 0xff;
>> +    *min = (rev >> 0) & 0xff;
>> +}
>> +
>> +/**
>> + * @file
>> + * API-specific header for AV_HWDEVICE_TYPE_NVTEGRA.
>> + *
>> + * For user-allocated pools, AVHWFramesContext.pool must return AVBufferRefs
>> + * with the data pointer set to an AVNVTegraMap.
>> + */
>> +
>> +typedef struct AVNVTegraDeviceContext {
>> +    /*
>> +     * Hardware multimedia engines
>> +     */
>> +    AVNVTegraChannel nvdec_channel, nvenc_channel, nvjpg_channel, vic_channel;
>
> Does a user need to supply all of these when making a device?

These are filled out by the library when creating the hardware device. 

>> +
>> +    /*
>> +     * Hardware revisions for associated engines, or 0 if invalid
>> +     */
>> +    int nvdec_version, nvenc_version, nvjpg_version, vic_version;
>
> Why does a user setting up a device context need to supply the version numbers for each thing?

Same as above, the version is filled when creating the context.
Knowledge of the hardware revision is useful when determining hardware
capabilities.

>> +} AVNVTegraDeviceContext;
>> +
>> +typedef struct AVNVTegraFrame {
>> +    /*
>> +     * Reference to an AVNVTegraMap object
>> +     */
>> +    AVBufferRef *map_ref;
>> +} AVNVTegraFrame;
>
> What is the indirection doing here?  Can't it return the buffer inside this structure instead of making an intermediate structure?

That's an artifact of an optimization regarding my mpv graphics 
backend.
Since mapping/unmapping data in the GPU is sort of expensive (gmmu 
configuration), the mpv mapper would keep a reference to the frame 
object to avoid constantly doing mapping operations on the same 
memory ranges.
I'll remove that.

>> +
>> +/*
>> + * Helper to retrieve a map object from the corresponding frame
>> + */
>> +static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame) {
>> +    return (AVNVTegraMap *)((AVNVTegraFrame *)frame->buf[0]->data)->map_ref->data;
>> +}
>> +
>> +/*
>> + * Converts a pixel format to the equivalent code for the VIC engine
>> + */
>> +int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt);
>> +
>> +#endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */
>
> The mix of normal implementation code and weird specific detail for the particular platform is pretty nasty.  It does seem like exporting more of this into a separate library rather than embedding it in ffmpeg would be a good idea.
>
> Thanks,
>
> - Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [FFmpeg-devel] [PATCH 07/16] hwcontext_nvtegra: add dynamic frequency scaling routines
  2024-06-05 20:50   ` Mark Thompson
@ 2024-06-29 19:35     ` averne
  0 siblings, 0 replies; 37+ messages in thread
From: averne @ 2024-06-29 19:35 UTC (permalink / raw)
  To: ffmpeg-devel

Le 05/06/2024 à 22:50, Mark Thompson a écrit :
> On 30/05/2024 20:43, averne wrote:
>> To save on energy, the clock speed of multimedia engines should be adapted to their workload.
>>
>> Signed-off-by: averne <averne381@gmail.com>
>> ---
>>  libavutil/hwcontext_nvtegra.c | 165 ++++++++++++++++++++++++++++++++++
>>  libavutil/hwcontext_nvtegra.h |   7 ++
>>  2 files changed, 172 insertions(+)
>>
>> ...
>> diff --git a/libavutil/hwcontext_nvtegra.h b/libavutil/hwcontext_nvtegra.h
>> index 8a2383d304..7c845951d9 100644
>> --- a/libavutil/hwcontext_nvtegra.h
>> +++ b/libavutil/hwcontext_nvtegra.h
>> @@ -82,4 +82,11 @@ static inline AVNVTegraMap *av_nvtegra_frame_get_fbuf_map(const AVFrame *frame)
>>   */
>>  int av_nvtegra_pixfmt_to_vic(enum AVPixelFormat fmt);
>>  
>> +/*
>> + * Dynamic frequency scaling routines
>> + */
>> +int av_nvtegra_dfs_init(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int width, int height, double framerate_hz);
>> +int av_nvtegra_dfs_update(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int bitstream_len, int decode_cycles);
>> +int av_nvtegra_dfs_uninit(AVHWDeviceContext *ctx, AVNVTegraChannel *channel);
>> +
>>  #endif /* AVUTIL_HWCONTEXT_NVTEGRA_H */
>
> This really isn't a sensible thing to have in the public API of ffmpeg.  Why on earth isn't this sort of detail dealt with by the kernel?  (Which can actually see all of the different processes using it, as well.)
>
> Thanks,
>
> - Mark

I completely agree but this is how nvidia does it, as dumb as it may 
seem (at least on Tegra, I don't know about discrete GPUs).
As far as I can tell the kernel has no mechanism in place to monitor 
the occupancy of the decode engine.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2024-06-29 19:36 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-30 19:43 [FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory averne
2024-05-30 20:38   ` Rémi Denis-Courmont
2024-05-31 21:06     ` averne
2024-05-31 21:44       ` Michael Niedermayer
2024-06-02 18:37         ` averne
2024-06-01  6:59       ` Rémi Denis-Courmont
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 02/16] configure, avutil: add support for HorizonOS averne
2024-05-30 20:37   ` Rémi Denis-Courmont
2024-05-31 21:06     ` averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices averne
2024-05-30 20:42   ` Rémi Denis-Courmont
2024-05-31 21:06     ` averne
2024-05-31 21:16       ` Timo Rothenpieler
2024-06-02 18:37         ` averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 04/16] avutil: add hardware definitions for NVDEC, NVJPG and VIC averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra averne
2024-05-31  8:32   ` Rémi Denis-Courmont
2024-05-31 21:06     ` averne
2024-06-01  7:29       ` Rémi Denis-Courmont
2024-06-05 20:29   ` Mark Thompson
2024-06-29 19:35     ` averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext averne
2024-06-05 20:47   ` Mark Thompson
2024-06-29 19:35     ` averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 07/16] hwcontext_nvtegra: add dynamic frequency scaling routines averne
2024-06-05 20:50   ` Mark Thompson
2024-06-29 19:35     ` averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 08/16] nvtegra: add common hardware decoding code averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 09/16] nvtegra: add mpeg1/2 hardware decoding averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 10/16] nvtegra: add mpeg4 " averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 11/16] nvtegra: add vc1 " averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 12/16] nvtegra: add h264 " averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 13/16] nvtegra: add hevc " averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 14/16] nvtegra: add vp8 " averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 15/16] nvtegra: add vp9 " averne
2024-05-30 19:43 ` [FFmpeg-devel] [PATCH 16/16] nvtegra: add mjpeg " averne

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git