From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 594294C6E0 for ; Wed, 10 Sep 2025 16:16:24 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'+a7lk1LLMkPQZQyM6/rU9EWeWvGz4/A+tYvYDALag0s=', expected b'1rFEGpamNN0Thb0r5aokOE+4L43CNu9bOnQSr0GpQu8=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1757520981; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=+a7lk1LLMkPQZQyM6/rU9EWeWvGz4/A+tYvYDALag0s=; b=3unUAqFlHjXHGEdodMtRKywP9AxKB/U5OLX0JyM1Ehlr+UIV6HjPct+lKJ7u1cGLedL6n U4aGcCxhaPlZsTQT/wjNk11Di6SQoHAP00p25yHCAPaCphKjKpWXIOmNWh5htoEdsUYWPDo O+nC4DuEmV1xrF2pfS/t34k6yXQjnLR6AonrBvqVFhZO9ri7jy9tInLHCPzoVWLqbvvZ3pt 02OvznnXSHdwh7tG3k6wZ8UKF+GwjNmbAaMog6lxQwsDNI/iiixoFE+9amrvRDb0tLqcr6O olEom+QRyC0ObfVLeeT7PpKQxq3ew1b20l5ryBTLl3s6yk4PGNdbIYk6HdsA== Received: from [172.19.0.4] (unknown [172.19.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 3A2F468E859; Wed, 10 Sep 2025 19:16:21 +0300 (EEST) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1757520980; b=DBlLV5I54nBpKVSBdNTN7e2aYy2pRZrViqo+vVAUnaTnaYfA2z6xF4hpj9IyBKAJZftcQ KB6dIKSDAEmxcbmHwqPR1rc+tLMubgtU82RB1tgDDwGy04v3PAx5f4coHSCHj6Ukw5QOye7 CXRZGFuQZJU0G4HOKINoM9hsgppMvq+bKRlqJfmB2nVN9C/1lwZHlX9j+I27N6FdGDW4J00 2l56MFAeeZo+YjuTMP29nqq2wd27KiV61ET8SFBZ3WWeNvzqWTR38bzIfdQedztg/AIOMns w0/bErslwQjEZ1OHbjnBbcjFVmX0y9uD2YO5QGjbrCYm3CImg4LKRa5qGTgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1757520980; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=51PdVwE4RL4+LyxhUgBX+EWW3JbbhXcOLsPKNdGhnUY=; b=g1LQH5fp05d4Eb3U0v9dF7V3TwiTgJg3nVvD5bx32kK9CLF62Hlvr7kMdvk5/t3vbifMc YnErLBbdCKszOjqFROp4ZeJfdchj1Enpk/hdUwQkCXM2M+rpIha5j9QGae31vqEtVd7aJtl q/cjS7sGVSi/m+v+VWVSN1CfUgsmmDzThVi/kDRT144n0M5XXrNXve2YQqAku3nHhJMtysT o0P+Ax8MgYuZ2W00AqI3L4VBNwNVGGbL3BzKN2XFOEbSirmOPSX4pP5UIw6b/rlFScX5oJM LW2p6KZOhBbLKG1SXCSIXIaFiFgQKXrYPVA/zuocY/MmUhsKlvLUmIBZHTYg== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=none Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1757520971; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=1rFEGpamNN0Thb0r5aokOE+4L43CNu9bOnQSr0GpQu8=; b=pl7IQuhX+ZtPWBVjn2jJEwGJARPOS/D2jocNzJnoNyW975oRCHjwJEdk19a5xrQaf+sWv rAYj7rYicNudQbQF+RvzMRbOUFt6BHl7UVvYt8OaOc+fmO19S1xmiTGLnNE18XKA6AKzIA1 6On/HEe84KACriScwYEE4GrFO0MVUtkX/1c33qGAAUzSeryEZmCf2BovyqvTvZtAGKID9sO qYmfCco4OoD2qufT6ecQwOFmudo8L44HM+mVdZU/GavSJDYZkef9pVSo+ilgs+qNAFfw1c0 yqsiqJbt6fxCrSCxW3mAHw53hAeC4Aof3B7/kfwJFMEJ8VF3QHgqT9a2IlWw== Received: from 3f9d35a0eedc (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 2048C68E80A for ; Wed, 10 Sep 2025 19:16:11 +0300 (EEST) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Wed, 10 Sep 2025 16:16:10 -0000 Message-ID: <175752097135.25.17927811454977336611@463a07221176> Message-ID-Hash: GCOKKJGLZYXVIPSPTRSFGM5BIDYFIDAL X-Message-ID-Hash: GCOKKJGLZYXVIPSPTRSFGM5BIDYFIDAL X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] libavfilter/vf_remap_opencl: make kernel handle subsampled chroma (PR #20490) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Matthias Welwarsky via ffmpeg-devel Cc: Matthias Welwarsky Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #20490 opened by Matthias Welwarsky (thinkfat) URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20490 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20490.patch This patch fixes handling YUV frames with subsampled chroma. Before this patch, the remap kernel was already aware of multiple planes, but it treated the subsampled chroma incorrectly. The maps were read based on destination coordinates and the resulting source coordinates were also not correctly transformed back to the source image. Also, for both Y and UV, the wrong vector components were used. With this patch, it is possible to have a decode->remap->encode chain entirely on the GPU. I'm using it to reframe videos from an insta360 camera. On a Lenovo T480 with the built-in UHD 620 GPU, it can achieve about 5x real-time (VAAPI decode -> remap_opencl -> VAAPI encode). Fixes: #20489 Signed-off-by: Matthias Welwarsky >>From 427d37c7bf0fb94f60d05c4d3a618d87e09d0bad Mon Sep 17 00:00:00 2001 From: Matthias Welwarsky Date: Fri, 5 Sep 2025 15:56:44 +0200 Subject: [PATCH] libavfilter/vf_remap_opencl: make kernel handle subsampled chroma This patch fixes handling YUV frames with subsampled chroma. Before this patch, the remap kernel was already aware of multiple planes, but it treated the subsampled chroma incorrectly. The maps were read based on destination coordinates and the resulting source coordinates were also not correctly transformed back to the source image. Also, for both Y and UV, the wrong vector components were used. With this patch, it is possible to have a decode->remap->encode chain entirely on the GPU. I'm using it to reframe videos from an insta360 camera. On a Lenovo T480 with the built-in UHD 620 GPU, it can achieve about 5x real-time (VAAPI decode -> remap_opencl -> VAAPI encode). Fixes: #20489 Signed-off-by: Matthias Welwarsky --- libavfilter/opencl/remap.cl | 124 ++++++++++++++++++++++++---------- libavfilter/vf_remap_opencl.c | 43 ++++++++++-- 2 files changed, 128 insertions(+), 39 deletions(-) diff --git a/libavfilter/opencl/remap.cl b/libavfilter/opencl/remap.cl index fba82d134e..1a18f1afff 100644 --- a/libavfilter/opencl/remap.cl +++ b/libavfilter/opencl/remap.cl @@ -22,52 +22,108 @@ const sampler_t linear_sampler = (CLK_NORMALIZED_COORDS_FALSE | const sampler_t nearest_sampler = (CLK_NORMALIZED_COORDS_FALSE | CLK_FILTER_NEAREST); -__kernel void remap_near(__write_only image2d_t dst, - __read_only image2d_t src, - __read_only image2d_t xmapi, - __read_only image2d_t ymapi, - float4 fill_color) +__kernel void remap_near( + __write_only image2d_t dst, + __read_only image2d_t src, + __read_only image2d_t xmapi, + __read_only image2d_t ymapi, + float4 fill_color, + float4 scale, + int4 swizzle) { int2 p = (int2)(get_global_id(0), get_global_id(1)); - int2 dimi = get_image_dim(src); - float2 dimf = (float2)(dimi.x, dimi.y); - float4 val; - int2 mi; - float m; - float4 xmap = read_imagef(xmapi, nearest_sampler, p); - float4 ymap = read_imagef(ymapi, nearest_sampler, p); - float2 pos = (float2)(xmap.x, ymap.x); - pos.xy = pos.xy * 65535.f; + /* image dimensions */ + int2 src_dim = get_image_dim(src); + int2 dst_dim = get_image_dim(dst); + int2 map_dim = get_image_dim(xmapi); - mi = ((pos >= (float2)(0.f, 0.f)) * (pos < dimf) * (p <= dimi)); - m = mi.x && mi.y; - val = mix(fill_color, read_imagef(src, nearest_sampler, pos), m); + float2 src_dimf = (float2)(src_dim.x, src_dim.y); + float2 dst_dimf = (float2)(dst_dim.x, dst_dim.y); + float2 map_dimf = (float2)(map_dim.x, map_dim.y); + + /* compute map scaling to full-res */ + float2 map_scale = map_dimf / dst_dimf; + + /* scaled position to fetch from the maps */ + float2 map_p = (float2)(p.x, p.y) * map_scale; + + /* read mapping coordinates from full-res maps */ + float4 xmap = read_imagef(xmapi, nearest_sampler, map_p); + float4 ymap = read_imagef(ymapi, nearest_sampler, map_p); + float2 pos = (float2)(xmap.x, ymap.x) * 65535.f; + + pos /= map_scale; + + /* check bounds */ + int2 mi = ((pos >= (float2)(0.f,0.f)) * (pos < src_dimf)); + float m = mi.x && mi.y; + + /* read source and apply swizzle + scale */ + float4 src_val = read_imagef(src, nearest_sampler, pos); + + float tmp[4]; + vstore4(src_val, 0, tmp); + src_val = (float4)(tmp[swizzle.x] * scale.x, + tmp[swizzle.y] * scale.y, + tmp[swizzle.z] * scale.z, + tmp[swizzle.w] * scale.w); + + /* mix with fill color if out-of-bounds */ + float4 val = mix(fill_color, src_val, m); write_imagef(dst, p, val); } -__kernel void remap_linear(__write_only image2d_t dst, - __read_only image2d_t src, - __read_only image2d_t xmapi, - __read_only image2d_t ymapi, - float4 fill_color) +__kernel void remap_linear( + __write_only image2d_t dst, + __read_only image2d_t src, + __read_only image2d_t xmapi, + __read_only image2d_t ymapi, + float4 fill_color, + float4 scale, + int4 swizzle) { int2 p = (int2)(get_global_id(0), get_global_id(1)); - int2 dimi = get_image_dim(src); - float2 dimf = (float2)(dimi.x, dimi.y); - float4 val; - int2 mi; - float m; - float4 xmap = read_imagef(xmapi, nearest_sampler, p); - float4 ymap = read_imagef(ymapi, nearest_sampler, p); - float2 pos = (float2)(xmap.x, ymap.x); - pos.xy = pos.xy * 65535.f; + /* image dimensions */ + int2 src_dim = get_image_dim(src); + int2 dst_dim = get_image_dim(dst); + int2 map_dim = get_image_dim(xmapi); - mi = ((pos >= (float2)(0.f, 0.f)) * (pos < dimf) * (p <= dimi)); - m = mi.x && mi.y; - val = mix(fill_color, read_imagef(src, linear_sampler, pos), m); + float2 src_dimf = (float2)(src_dim.x, src_dim.y); + float2 dst_dimf = (float2)(dst_dim.x, dst_dim.y); + float2 map_dimf = (float2)(map_dim.x, map_dim.y); + + /* compute map scaling to full-res */ + float2 map_scale = map_dimf / dst_dimf; + + /* scaled position to fetch from the maps */ + float2 map_p = (float2)(p.x, p.y) * map_scale; + + /* read mapping coordinates from full-res maps */ + float4 xmap = read_imagef(xmapi, nearest_sampler, map_p); + float4 ymap = read_imagef(ymapi, nearest_sampler, map_p); + float2 pos = (float2)(xmap.x, ymap.x) * 65535.f; + + pos /= map_scale; + + /* check bounds */ + int2 mi = ((pos >= (float2)(0.f,0.f)) * (pos < src_dimf)); + float m = mi.x && mi.y; + + /* read source and apply swizzle + scale */ + float4 src_val = read_imagef(src, linear_sampler, pos); + + float tmp[4]; + vstore4(src_val, 0, tmp); + src_val = (float4)(tmp[swizzle.x] * scale.x, + tmp[swizzle.y] * scale.y, + tmp[swizzle.z] * scale.z, + tmp[swizzle.w] * scale.w); + + /* mix with fill color if out-of-bounds */ + float4 val = mix(fill_color, src_val, m); write_imagef(dst, p, val); } diff --git a/libavfilter/vf_remap_opencl.c b/libavfilter/vf_remap_opencl.c index bb83944b2f..f6c5211f73 100644 --- a/libavfilter/vf_remap_opencl.c +++ b/libavfilter/vf_remap_opencl.c @@ -29,10 +29,18 @@ #include "opencl_source.h" #include "video.h" +#define MAX_PLANES 4 + +typedef struct PlaneParams { + cl_float4 scale; + cl_int4 swizzle; +} PlaneParams; + typedef struct RemapOpenCLContext { OpenCLFilterContext ocf; int nb_planes; + PlaneParams plane[MAX_PLANES]; int interp; uint8_t fill_rgba[4]; cl_float4 cl_fill_color; @@ -41,6 +49,7 @@ typedef struct RemapOpenCLContext { cl_kernel kernel; cl_command_queue command_queue; + FFFrameSync fs; } RemapOpenCLContext; @@ -73,10 +82,9 @@ static int remap_opencl_load(AVFilterContext *avctx, cl_int cle; const char *source = ff_source_remap_cl; const char *kernel = kernels[ctx->interp]; - const AVPixFmtDescriptor *main_desc; int err, main_planes; - const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(main_format); - int is_rgb = !!(desc->flags & AV_PIX_FMT_FLAG_RGB); + const AVPixFmtDescriptor *main_desc = av_pix_fmt_desc_get(main_format); + int is_rgb = !!(main_desc->flags & AV_PIX_FMT_FLAG_RGB); const float scale = 1.f / 255.f; uint8_t rgba_map[4]; @@ -94,8 +102,6 @@ static int remap_opencl_load(AVFilterContext *avctx, ctx->cl_fill_color.s[3] = ctx->fill_rgba[3] * scale; } - main_desc = av_pix_fmt_desc_get(main_format); - main_planes = 0; for (int i = 0; i < main_desc->nb_components; i++) main_planes = FFMAX(main_planes, @@ -103,6 +109,26 @@ static int remap_opencl_load(AVFilterContext *avctx, ctx->nb_planes = main_planes; + for (int p = 0; p < ctx->nb_planes; p++) { + PlaneParams *pp = &ctx->plane[p]; + + if (is_rgb) { + // RGB plane (single plane) + pp->scale = (cl_float4){{1.0f, 1.0f, 1.0f, 1.0f}}; + pp->swizzle = (cl_int4){{0, 1, 2, 3}}; + } else { + if (p == 0) { + // Y plane + pp->scale = (cl_float4){{1.0f, 0.0f, 0.0f, 1.0f}}; + pp->swizzle = (cl_int4){{0, 0, 0, 3}}; + } else { + // UV plane (assume 4:2:0, adjust if necessary) + pp->scale = (cl_float4){{1.0f, 1.0f, 0.0f, 1.0f}}; + pp->swizzle = (cl_int4){{0, 1, 0, 3}}; + } + } + } + err = ff_opencl_filter_load_program(avctx, &source, 1); if (err < 0) goto fail; @@ -132,6 +158,7 @@ static int remap_opencl_process_frame(FFFrameSync *fs) AVFilterContext *avctx = fs->parent; AVFilterLink *outlink = avctx->outputs[0]; RemapOpenCLContext *ctx = avctx->priv; + AVFrame *input_main, *input_xmap, *input_ymap; AVFrame *output; cl_mem mem; @@ -199,6 +226,12 @@ static int remap_opencl_process_frame(FFFrameSync *fs) CL_SET_KERNEL_ARG(ctx->kernel, kernel_arg, cl_float4, &cl_fill_color); kernel_arg++; + CL_SET_KERNEL_ARG(ctx->kernel, kernel_arg, cl_float4, &(ctx->plane[plane].scale)); + kernel_arg++; + + CL_SET_KERNEL_ARG(ctx->kernel, kernel_arg, cl_int4, &(ctx->plane[plane].swizzle)); + kernel_arg++; + err = ff_opencl_filter_work_size_from_image(avctx, global_work, output, plane, 0); if (err < 0) -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org