From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id B39384C39F for ; Fri, 5 Sep 2025 15:05:25 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'8ncfMgHbcnyiTp9zu1mGtGpquBf3HMx95H10QL3BLeQ=', expected b'yfgYnnDyB70ML/ouRjxpZA+ONLBXkt4zAKNBHleEa/Q=')) header.d=welwarsky.de header.i=ffmpeg@welwarsky.de header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1757084714; h=to : date : message-id : mime-version : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=6GMeUXhrd811sgJ2m71CREuw1+9/JHjFrSvOgK+Yi4U=; b=LkRXQy4dHoRsbHDLjBvrtpj2azMrF7yVnRR1WmmDGGmNMbGOJQS9J6Kze3ogURBBYcHwB fRln4vf9CetppoC3lMU+gQHw7e2BVYeSrZRJu+spUUc1UryQun1gwPpZwkhwsnKgRByAJB8 /S6VbcrlUdgDBoHkzCpA7V4VIZYnqfdmREDpahQLvvIxkOPV202M+Nb4Cq8wakN+5AbD31r PiI0bO9Gu6m11Ifgp60EWvBt9+pRVG9RRWbmqt2ScfhTz3VwT6WbKKeEUtK7gvI/vE9sAn7 Qsl4eeEHihroFc3sJSvsh5Oj6CYVp2d97nu9MNtPfocOiiCZ2osRMto47pSg== Received: from [172.19.0.4] (unknown [172.19.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id A496868DA91; Fri, 5 Sep 2025 18:05:14 +0300 (EEST) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1757084713; b=TeeK71JZjBbpNCcxdmSNR+DEZYmcPYT4dr54OKBdkQLGFCEtsUAtF2zd/i2uhWee95CcL CzjgzDFQ2MxG5xwd2szrevsQAbG+hVulMdXDT2+o6lsCrgov84ijKEY5dffi5xt0pHa2gDB EmVJ1/Z6MB8PDDLOH/SdiFbHkzsW7ZC3ZO0Pk9J8dwyE+xbfr04OnAN1LXXk7JfTHsLgNFf McsodlBRaw/dlFCoGJnPm57TTI9P23wvYJjEVm+Dzm2MYbIxcsKtfI/s1pZTyuvi3U4GU+e e0n7n0fHBRnSr2QfR3Xh2SoS4wE6i3elsc2JaAolu7d5vnnNhANJsOnhiUIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1757084713; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=8ncfMgHbcnyiTp9zu1mGtGpquBf3HMx95H10QL3BLeQ=; b=R4GRQpWThOpqWolRqVjDVrMil1v11FqQaHVp6o0CwyW0Q/yl1+3jSxTbE4Ok2QkC4JvIz lIgrLqqCzeixeQW/GoqZrmh8RDgDQkPI5ujuWsq1+Irx1b6GsNsPq6225xwM3jsSRBNRnIs Oi9cVqE6XLA4zZHyocwPps2SWJHKydlLu/o5qQsTa2PDHEQceWOOuhxFTlN7Fjrp1F7flFV 0yD1Tynh8FN4uSqFFDkYuhQHb3+UQ5djnRD1nLJeUGzimNMVOtRcKXcwIAqSOWjVNZf5Af+ g4sfb4LvJXAu9FKOdgMMY7g2Gi5PxwUgpAL5n+6mpJ1rzrYw7clKN22FoMJQ== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=welwarsky.de header.i=ffmpeg@welwarsky.de; arc=none; dmarc=pass header.from=welwarsky.de policy.dmarc=none Authentication-Results: ffmpeg.org; dkim=pass header.d=welwarsky.de header.i=ffmpeg@welwarsky.de; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=welwarsky.de policy.dmarc=none Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.17.13]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 54B52687AED for ; Fri, 5 Sep 2025 18:04:59 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=welwarsky.de; s=s1-ionos; t=1757084698; x=1757689498; i=ffmpeg@welwarsky.de; bh=yfgYnnDyB70ML/ouRjxpZA+ONLBXkt4zAKNBHleEa/Q=; h=X-UI-Sender-Class:From:To:Cc:Subject:Date:Message-ID: MIME-Version:Content-Transfer-Encoding:cc: content-transfer-encoding:content-type:date:from:message-id: mime-version:reply-to:subject:to; b=KrSyPJl4hHcgB7JKYzWlwhG+PLeSh9gujHJadDDB/60n0CMad9nYXMQ8JKtJO9jb 6NaTs59gMbaCHYcY303PxBsxQmhXv9jJKWVn+tfRo61JfwnekKzZ110a1a5GiWHhn HqV0TBqg/hvKZ2QX/aRdE4H1imBN3iawkoD0OUb0q81YxtyrXT5cqEquBMk+HFjIW OtWAhCkv2kKnz7V/KOZ25lguZSj2dQWSYVhxxiujrNKYoyPkUN/2BBfNuj4/+CkyQ yFNnf/+yTGj0b8k2cCLSC7pqSfKvoijy4p6HhC8med9qQ7rlZPtdxVHpuMtf0NEdH FHmCaFrTr7ycYquRLA== X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Received: from linux.fritz.box ([79.250.184.197]) by mrelayeu.kundenserver.de (mreue106 [213.165.67.113]) with ESMTPSA (Nemesis) id 1Mhl0I-1uGnOM2TsO-00oFKl; Fri, 05 Sep 2025 17:04:58 +0200 To: ffmpeg-devel@ffmpeg.org Date: Fri, 5 Sep 2025 17:02:36 +0200 Message-ID: <20250905150343.84933-1-ffmpeg@welwarsky.de> X-Mailer: git-send-email 2.51.0 MIME-Version: 1.0 X-Provags-ID: V03:K1:Pe3GTPW058Dqi9krPsaKZpsPXvK49p7UsokwqoaQYeJmqY0nro8 lvH+v8WS9l6g6g+aFHxsIxWd/JdZErPVRMOlhXAqRQiEOHYn/klwtgrkYfkVM+uJMVy1iSu XMXCurE/OThbxFjbGPKN4V2emtSlxiAVug1XVYDl6wnx1VMOlCoCoKzCQjnp1OYZjb1ceJu zrA6rCbDIjYfE6zm3pp1A== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:GMsqghs4jRA=;4zeLXhLgPqNPlRkCWT8VFbqL9zt 4XOBI6MzpO/5OWbjUUCi6gtSBPW7EMzxksdhRP+GbS8OMQRfb7IFFvmecyFa2L59GQdlJWMG4 hsNDgANhwyCe6HRuJmfTt/3Ccx/gWkfqWlGP+yZtQ4e93E/OMguQqM7/nyBUss3Ugfd5EmcKk pY1kUP2ppNXx/Gc6RRunudnx5mZi3UgeHFi+IEwmLg/HYs+awca8n/3hraspztL3utLpqqoWJ HWZo3Og5fZyP2y0pxv70LgYJIGUZ3raEEqrOBxhDVYFbOvm6bN6aUrgdnhWz2dVqSSyNBwM4J cco4SFeK8FT+JtEG1hPjNzSPN/pD0U9BnH5F/0OQC3UHM7Scv++9v9qxRs2nn/KSbqBWCzpFa zJSfxsezU2hSQxpqgO9n1483IFCfNrOSuiYN9nirUfL/NtqsHPWfTRPgldvmUIqEw4g8Vg4S8 c7ctsCzvamGP3qs0VkcXPSK0yS5CjG1JJJq66bsfNe/Su4NHqRsMSIW/po2OdBcYB25/uqRzK czgx9AAYkkuiWM5TFLD+2WvpsMbjbRQeZVpdTJ7PBjbsjZOlUNFrWWarNhscFpMoG4TLZQbtP Ogyv+G2lJb3Ha+XVwzayIKjwK3N1QoDewTRr7Mg+DLLPLWEAo6Pfpuolq4+Fo9p+AX8l2Ryw8 xzu9QR6eU5mJxg72TDhWrqTCTDJDsUYD1u41X1ht5NW479lIY736ziLY5IC4J+nLoFkfgtwX0 K73IMJvJCzxkoEUdY4ciWVqosh8qWyo2zN0HM507ys/4kBQYuGkqM7JIQyM9WtunCXMaS2kRw IblfvUeiDFkuJflca2NT9cgl2310iA7/6Dw1CLVn817wPK9vsJ3Q89/gqQ9f0wif2kuqjiHar FQlG1TFg48dI7O3bGbGhR4/R1dCSMEMtIeSHFYu9wvUcs88TPpBHXCNWUjOeU5DPzcjETS8tm 7uurwOZiFIegLwzYkcq3jd1Ua27AMF3WwSNGWF6vJUnIBWNBjVw8UnrmnLnujFdsKPmblbaqY zHRrgqB3v5CSZ79zhLtjpIf95DKgtRaNiud4jzPhBwP4py/ptxkXkcV4jFpwWMake2RjptfdM r31wNCMXL3z+iOrZsMLheVFtcjze+XKOzGPrsbJGE66j4/OG92PYloIwExOlOhKPchCRqYKxC iUF4VtZood4CnZDWEkhbUpFWoJ/oq7o+NI1uxqf91s46fO4dO85AGfKyZsMBAPKUCpfzkeYtj 0kVytIsEmlW9OjKhnXZtqMwds8ySSz7MZ8ejAFrtsx9CLFOGganR4p1XQKpKdEih3eXmPihvn aIZbC41bPZcxz1/ula8ndBqqTFGy2ldhCe1ZTL5uDKgiDM3OuUnMkGAC5yOsfkvSUAYqmkdB0 qPEcA+gGseqyjcQasWf6K1Jf8pz/Bk3kdKeQYG4BikOm/+vL1zQfpJoDrtUq5rdXkJuHEN/sY LUJDgbrzOGp1ExY5/Zl5i2peaZ283cnGjkkAibNm+1Qev8KN+5aK35wRZgLjCejKyIp1uib26 PKFoIXQZG+MMrCPCa5Ql94cboZQAJmg6KAP58vRTms4f7EEI4T2ijcTeYQPJk517xkL79X5Fv u53lCbqp41I9c/jrFaiKNyB7SHI9MHZ0pdVOyD0Okwtov3Z5n6Iv89f09UhTcsqQMAi9Zz4aq EKjn3X6ZOiraMSsPIk/k0NZEqtRk9bKq8xhUi0nBleZ6qsgHu4BHTzvYQD3l63nCZhs6oFf4P 8GUl6ZpzojTj+dvpkQ0QfJMvSX9qePVQCGsH9MukLWDaGjezuJFryeu8= Message-ID-Hash: ARXW5WFMAHY7FDDKM5QRRACQA4KOHHY3 X-Message-ID-Hash: ARXW5WFMAHY7FDDKM5QRRACQA4KOHHY3 X-MailFrom: SRS0=igIk=3Q=welwarsky.de=ffmpeg@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH v2] libavfilter/vf_remap_opencl: make kernel handle subsampled chroma List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Matthias Welwarsky via ffmpeg-devel Cc: Matthias Welwarsky Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: This patch fixes handling YUV frames with subsampled chroma. Before this patch, the remap kernel was already aware of multiple planes, but it treated the subsampled chroma incorrectly. The maps were read based on destination coordinates and the resulting source coordinates were also not correctly transformed back to the source image. Also, for both Y and UV, the wrong vector components were used. With this patch, it is possible to have a decode->remap->encode chain entirely on the GPU. I'm using it to reframe videos from an insta360 camera. On a Lenovo T480 with the built-in UHD 620 GPU, it can achieve about 5x real-time (VAAPI decode -> remap_opencl -> VAAPI encode). Signed-off-by: Matthias Welwarsky --- Resend because first patch was garbled by email client libavfilter/opencl/remap.cl | 128 ++++++++++++++++++++++++---------- libavfilter/vf_remap_opencl.c | 43 ++++++++++-- 2 files changed, 130 insertions(+), 41 deletions(-) diff --git a/libavfilter/opencl/remap.cl b/libavfilter/opencl/remap.cl index fba82d134e..1a18f1afff 100644 --- a/libavfilter/opencl/remap.cl +++ b/libavfilter/opencl/remap.cl @@ -22,52 +22,108 @@ const sampler_t linear_sampler = (CLK_NORMALIZED_COORDS_FALSE | const sampler_t nearest_sampler = (CLK_NORMALIZED_COORDS_FALSE | CLK_FILTER_NEAREST); -__kernel void remap_near(__write_only image2d_t dst, - __read_only image2d_t src, - __read_only image2d_t xmapi, - __read_only image2d_t ymapi, - float4 fill_color) +__kernel void remap_near( + __write_only image2d_t dst, + __read_only image2d_t src, + __read_only image2d_t xmapi, + __read_only image2d_t ymapi, + float4 fill_color, + float4 scale, + int4 swizzle) { int2 p = (int2)(get_global_id(0), get_global_id(1)); - int2 dimi = get_image_dim(src); - float2 dimf = (float2)(dimi.x, dimi.y); - float4 val; - int2 mi; - float m; - float4 xmap = read_imagef(xmapi, nearest_sampler, p); - float4 ymap = read_imagef(ymapi, nearest_sampler, p); - float2 pos = (float2)(xmap.x, ymap.x); - pos.xy = pos.xy * 65535.f; + /* image dimensions */ + int2 src_dim = get_image_dim(src); + int2 dst_dim = get_image_dim(dst); + int2 map_dim = get_image_dim(xmapi); - mi = ((pos >= (float2)(0.f, 0.f)) * (pos < dimf) * (p <= dimi)); - m = mi.x && mi.y; - val = mix(fill_color, read_imagef(src, nearest_sampler, pos), m); + float2 src_dimf = (float2)(src_dim.x, src_dim.y); + float2 dst_dimf = (float2)(dst_dim.x, dst_dim.y); + float2 map_dimf = (float2)(map_dim.x, map_dim.y); + + /* compute map scaling to full-res */ + float2 map_scale = map_dimf / dst_dimf; + + /* scaled position to fetch from the maps */ + float2 map_p = (float2)(p.x, p.y) * map_scale; + + /* read mapping coordinates from full-res maps */ + float4 xmap = read_imagef(xmapi, nearest_sampler, map_p); + float4 ymap = read_imagef(ymapi, nearest_sampler, map_p); + float2 pos = (float2)(xmap.x, ymap.x) * 65535.f; + + pos /= map_scale; + + /* check bounds */ + int2 mi = ((pos >= (float2)(0.f,0.f)) * (pos < src_dimf)); + float m = mi.x && mi.y; + + /* read source and apply swizzle + scale */ + float4 src_val = read_imagef(src, nearest_sampler, pos); + + float tmp[4]; + vstore4(src_val, 0, tmp); + src_val = (float4)(tmp[swizzle.x] * scale.x, + tmp[swizzle.y] * scale.y, + tmp[swizzle.z] * scale.z, + tmp[swizzle.w] * scale.w); + + /* mix with fill color if out-of-bounds */ + float4 val = mix(fill_color, src_val, m); write_imagef(dst, p, val); } -__kernel void remap_linear(__write_only image2d_t dst, - __read_only image2d_t src, - __read_only image2d_t xmapi, - __read_only image2d_t ymapi, - float4 fill_color) +__kernel void remap_linear( + __write_only image2d_t dst, + __read_only image2d_t src, + __read_only image2d_t xmapi, + __read_only image2d_t ymapi, + float4 fill_color, + float4 scale, + int4 swizzle) { int2 p = (int2)(get_global_id(0), get_global_id(1)); - int2 dimi = get_image_dim(src); - float2 dimf = (float2)(dimi.x, dimi.y); - float4 val; - int2 mi; - float m; - - float4 xmap = read_imagef(xmapi, nearest_sampler, p); - float4 ymap = read_imagef(ymapi, nearest_sampler, p); - float2 pos = (float2)(xmap.x, ymap.x); - pos.xy = pos.xy * 65535.f; - - mi = ((pos >= (float2)(0.f, 0.f)) * (pos < dimf) * (p <= dimi)); - m = mi.x && mi.y; - val = mix(fill_color, read_imagef(src, linear_sampler, pos), m); + + /* image dimensions */ + int2 src_dim = get_image_dim(src); + int2 dst_dim = get_image_dim(dst); + int2 map_dim = get_image_dim(xmapi); + + float2 src_dimf = (float2)(src_dim.x, src_dim.y); + float2 dst_dimf = (float2)(dst_dim.x, dst_dim.y); + float2 map_dimf = (float2)(map_dim.x, map_dim.y); + + /* compute map scaling to full-res */ + float2 map_scale = map_dimf / dst_dimf; + + /* scaled position to fetch from the maps */ + float2 map_p = (float2)(p.x, p.y) * map_scale; + + /* read mapping coordinates from full-res maps */ + float4 xmap = read_imagef(xmapi, nearest_sampler, map_p); + float4 ymap = read_imagef(ymapi, nearest_sampler, map_p); + float2 pos = (float2)(xmap.x, ymap.x) * 65535.f; + + pos /= map_scale; + + /* check bounds */ + int2 mi = ((pos >= (float2)(0.f,0.f)) * (pos < src_dimf)); + float m = mi.x && mi.y; + + /* read source and apply swizzle + scale */ + float4 src_val = read_imagef(src, linear_sampler, pos); + + float tmp[4]; + vstore4(src_val, 0, tmp); + src_val = (float4)(tmp[swizzle.x] * scale.x, + tmp[swizzle.y] * scale.y, + tmp[swizzle.z] * scale.z, + tmp[swizzle.w] * scale.w); + + /* mix with fill color if out-of-bounds */ + float4 val = mix(fill_color, src_val, m); write_imagef(dst, p, val); } diff --git a/libavfilter/vf_remap_opencl.c b/libavfilter/vf_remap_opencl.c index bb83944b2f..f6c5211f73 100644 --- a/libavfilter/vf_remap_opencl.c +++ b/libavfilter/vf_remap_opencl.c @@ -29,10 +29,18 @@ #include "opencl_source.h" #include "video.h" +#define MAX_PLANES 4 + +typedef struct PlaneParams { + cl_float4 scale; + cl_int4 swizzle; +} PlaneParams; + typedef struct RemapOpenCLContext { OpenCLFilterContext ocf; int nb_planes; + PlaneParams plane[MAX_PLANES]; int interp; uint8_t fill_rgba[4]; cl_float4 cl_fill_color; @@ -41,6 +49,7 @@ typedef struct RemapOpenCLContext { cl_kernel kernel; cl_command_queue command_queue; + FFFrameSync fs; } RemapOpenCLContext; @@ -73,10 +82,9 @@ static int remap_opencl_load(AVFilterContext *avctx, cl_int cle; const char *source = ff_source_remap_cl; const char *kernel = kernels[ctx->interp]; - const AVPixFmtDescriptor *main_desc; int err, main_planes; - const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(main_format); - int is_rgb = !!(desc->flags & AV_PIX_FMT_FLAG_RGB); + const AVPixFmtDescriptor *main_desc = av_pix_fmt_desc_get(main_format); + int is_rgb = !!(main_desc->flags & AV_PIX_FMT_FLAG_RGB); const float scale = 1.f / 255.f; uint8_t rgba_map[4]; @@ -94,8 +102,6 @@ static int remap_opencl_load(AVFilterContext *avctx, ctx->cl_fill_color.s[3] = ctx->fill_rgba[3] * scale; } - main_desc = av_pix_fmt_desc_get(main_format); - main_planes = 0; for (int i = 0; i < main_desc->nb_components; i++) main_planes = FFMAX(main_planes, @@ -103,6 +109,26 @@ static int remap_opencl_load(AVFilterContext *avctx, ctx->nb_planes = main_planes; + for (int p = 0; p < ctx->nb_planes; p++) { + PlaneParams *pp = &ctx->plane[p]; + + if (is_rgb) { + // RGB plane (single plane) + pp->scale = (cl_float4){{1.0f, 1.0f, 1.0f, 1.0f}}; + pp->swizzle = (cl_int4){{0, 1, 2, 3}}; + } else { + if (p == 0) { + // Y plane + pp->scale = (cl_float4){{1.0f, 0.0f, 0.0f, 1.0f}}; + pp->swizzle = (cl_int4){{0, 0, 0, 3}}; + } else { + // UV plane (assume 4:2:0, adjust if necessary) + pp->scale = (cl_float4){{1.0f, 1.0f, 0.0f, 1.0f}}; + pp->swizzle = (cl_int4){{0, 1, 0, 3}}; + } + } + } + err = ff_opencl_filter_load_program(avctx, &source, 1); if (err < 0) goto fail; @@ -132,6 +158,7 @@ static int remap_opencl_process_frame(FFFrameSync *fs) AVFilterContext *avctx = fs->parent; AVFilterLink *outlink = avctx->outputs[0]; RemapOpenCLContext *ctx = avctx->priv; + AVFrame *input_main, *input_xmap, *input_ymap; AVFrame *output; cl_mem mem; @@ -199,6 +226,12 @@ static int remap_opencl_process_frame(FFFrameSync *fs) CL_SET_KERNEL_ARG(ctx->kernel, kernel_arg, cl_float4, &cl_fill_color); kernel_arg++; + CL_SET_KERNEL_ARG(ctx->kernel, kernel_arg, cl_float4, &(ctx->plane[plane].scale)); + kernel_arg++; + + CL_SET_KERNEL_ARG(ctx->kernel, kernel_arg, cl_int4, &(ctx->plane[plane].swizzle)); + kernel_arg++; + err = ff_opencl_filter_work_size_from_image(avctx, global_work, output, plane, 0); if (err < 0) -- 2.51.0 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org