* [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters
@ 2024-08-06 10:51 Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 1/5] swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar rgb Ramiro Polla
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Ramiro Polla @ 2024-08-06 10:51 UTC (permalink / raw)
To: ffmpeg-devel
Changes from v1:
- multi-planar rgb for YUV2RGBFUNC no longer uses an array in the stack,
since that gave an overall 1% slowdown because some variables would no
longer be stored in registers.
Ramiro Polla (5):
swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar
rgb
swscale/yuv2rgb: prepare YUV2RGBFUNC macro for multi-planar rgb
swscale/yuv2rgb: add yuv42{0,2}p -> gbrp unscaled colorspace
converters
swscale/x86/yuv2rgb: add ssse3 yuv42{0,2}p -> gbrp unscaled colorspace
converters
swscale/aarch64/yuv2rgb: add neon yuv42{0,2}p -> gbrp unscaled
colorspace converters
libswscale/aarch64/swscale_unscaled.c | 58 +++
libswscale/aarch64/yuv2rgb_neon.S | 73 +++-
libswscale/x86/yuv2rgb.c | 39 ++
libswscale/x86/yuv_2_rgb.asm | 24 +-
libswscale/yuv2rgb.c | 513 ++++++++++++++------------
tests/checkasm/sw_yuv2rgb.c | 60 ++-
6 files changed, 495 insertions(+), 272 deletions(-)
--
2.30.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* [FFmpeg-devel] [PATCH v2 1/5] swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar rgb
2024-08-06 10:51 [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters Ramiro Polla
@ 2024-08-06 10:51 ` Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 2/5] swscale/yuv2rgb: prepare YUV2RGBFUNC macro " Ramiro Polla
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Ramiro Polla @ 2024-08-06 10:51 UTC (permalink / raw)
To: ffmpeg-devel
This will be used in the upcoming yuv42{0,2}p -> gbrp unscaled
colorspace converters.
There is no difference in performance.
---
libswscale/yuv2rgb.c | 402 +++++++++++++++++++++----------------------
1 file changed, 201 insertions(+), 201 deletions(-)
diff --git a/libswscale/yuv2rgb.c b/libswscale/yuv2rgb.c
index cfbc54abd0..d77660b3a3 100644
--- a/libswscale/yuv2rgb.c
+++ b/libswscale/yuv2rgb.c
@@ -65,64 +65,64 @@ const int *sws_getCoefficients(int colorspace)
return ff_yuv2rgb_coeffs[colorspace];
}
-#define LOADCHROMA(pu, pv, i) \
- U = pu[i]; \
- V = pv[i]; \
+#define LOADCHROMA(l, i) \
+ U = pu_##l[i]; \
+ V = pv_##l[i]; \
r = (void *)c->table_rV[V+YUVRGB_TABLE_HEADROOM]; \
g = (void *)(c->table_gU[U+YUVRGB_TABLE_HEADROOM] + c->table_gV[V+YUVRGB_TABLE_HEADROOM]); \
b = (void *)c->table_bU[U+YUVRGB_TABLE_HEADROOM];
-#define PUTRGB(dst, src, asrc, i, abase) \
- Y = src[2 * i]; \
- dst[2 * i] = r[Y] + g[Y] + b[Y]; \
- Y = src[2 * i + 1]; \
- dst[2 * i + 1] = r[Y] + g[Y] + b[Y];
-
-#define PUTRGB24(dst, src, asrc, i, abase) \
- Y = src[2 * i]; \
- dst[6 * i + 0] = r[Y]; \
- dst[6 * i + 1] = g[Y]; \
- dst[6 * i + 2] = b[Y]; \
- Y = src[2 * i + 1]; \
- dst[6 * i + 3] = r[Y]; \
- dst[6 * i + 4] = g[Y]; \
- dst[6 * i + 5] = b[Y];
-
-#define PUTBGR24(dst, src, asrc, i, abase) \
- Y = src[2 * i]; \
- dst[6 * i + 0] = b[Y]; \
- dst[6 * i + 1] = g[Y]; \
- dst[6 * i + 2] = r[Y]; \
- Y = src[2 * i + 1]; \
- dst[6 * i + 3] = b[Y]; \
- dst[6 * i + 4] = g[Y]; \
- dst[6 * i + 5] = r[Y];
-
-#define PUTRGBA(dst, ysrc, asrc, i, abase) \
- Y = ysrc[2 * i]; \
- dst[2 * i] = r[Y] + g[Y] + b[Y] + ((uint32_t)(asrc[2 * i]) << abase); \
- Y = ysrc[2 * i + 1]; \
- dst[2 * i + 1] = r[Y] + g[Y] + b[Y] + ((uint32_t)(asrc[2 * i + 1]) << abase);
-
-#define PUTRGB48(dst, src, asrc, i, abase) \
- Y = src[ 2 * i]; \
- dst[12 * i + 0] = dst[12 * i + 1] = r[Y]; \
- dst[12 * i + 2] = dst[12 * i + 3] = g[Y]; \
- dst[12 * i + 4] = dst[12 * i + 5] = b[Y]; \
- Y = src[ 2 * i + 1]; \
- dst[12 * i + 6] = dst[12 * i + 7] = r[Y]; \
- dst[12 * i + 8] = dst[12 * i + 9] = g[Y]; \
- dst[12 * i + 10] = dst[12 * i + 11] = b[Y];
-
-#define PUTBGR48(dst, src, asrc, i, abase) \
- Y = src[2 * i]; \
- dst[12 * i + 0] = dst[12 * i + 1] = b[Y]; \
- dst[12 * i + 2] = dst[12 * i + 3] = g[Y]; \
- dst[12 * i + 4] = dst[12 * i + 5] = r[Y]; \
- Y = src[2 * i + 1]; \
- dst[12 * i + 6] = dst[12 * i + 7] = b[Y]; \
- dst[12 * i + 8] = dst[12 * i + 9] = g[Y]; \
- dst[12 * i + 10] = dst[12 * i + 11] = r[Y];
+#define PUTRGB(l, i, abase) \
+ Y = py_##l[2 * i]; \
+ dst_##l[2 * i] = r[Y] + g[Y] + b[Y]; \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l[2 * i + 1] = r[Y] + g[Y] + b[Y];
+
+#define PUTRGB24(l, i, abase) \
+ Y = py_##l[2 * i]; \
+ dst_##l[6 * i + 0] = r[Y]; \
+ dst_##l[6 * i + 1] = g[Y]; \
+ dst_##l[6 * i + 2] = b[Y]; \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l[6 * i + 3] = r[Y]; \
+ dst_##l[6 * i + 4] = g[Y]; \
+ dst_##l[6 * i + 5] = b[Y];
+
+#define PUTBGR24(l, i, abase) \
+ Y = py_##l[2 * i]; \
+ dst_##l[6 * i + 0] = b[Y]; \
+ dst_##l[6 * i + 1] = g[Y]; \
+ dst_##l[6 * i + 2] = r[Y]; \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l[6 * i + 3] = b[Y]; \
+ dst_##l[6 * i + 4] = g[Y]; \
+ dst_##l[6 * i + 5] = r[Y];
+
+#define PUTRGBA(l, i, abase) \
+ Y = py_##l[2 * i]; \
+ dst_##l[2 * i] = r[Y] + g[Y] + b[Y] + ((uint32_t)(pa_##l[2 * i]) << abase); \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l[2 * i + 1] = r[Y] + g[Y] + b[Y] + ((uint32_t)(pa_##l[2 * i + 1]) << abase);
+
+#define PUTRGB48(l, i, abase) \
+ Y = py_##l[ 2 * i]; \
+ dst_##l[12 * i + 0] = dst_##l[12 * i + 1] = r[Y]; \
+ dst_##l[12 * i + 2] = dst_##l[12 * i + 3] = g[Y]; \
+ dst_##l[12 * i + 4] = dst_##l[12 * i + 5] = b[Y]; \
+ Y = py_##l[ 2 * i + 1]; \
+ dst_##l[12 * i + 6] = dst_##l[12 * i + 7] = r[Y]; \
+ dst_##l[12 * i + 8] = dst_##l[12 * i + 9] = g[Y]; \
+ dst_##l[12 * i + 10] = dst_##l[12 * i + 11] = b[Y];
+
+#define PUTBGR48(l, i, abase) \
+ Y = py_##l[2 * i]; \
+ dst_##l[12 * i + 0] = dst_##l[12 * i + 1] = b[Y]; \
+ dst_##l[12 * i + 2] = dst_##l[12 * i + 3] = g[Y]; \
+ dst_##l[12 * i + 4] = dst_##l[12 * i + 5] = r[Y]; \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l[12 * i + 6] = dst_##l[12 * i + 7] = b[Y]; \
+ dst_##l[12 * i + 8] = dst_##l[12 * i + 9] = g[Y]; \
+ dst_##l[12 * i + 10] = dst_##l[12 * i + 11] = r[Y];
#define YUV2RGBFUNC(func_name, dst_type, alpha, yuv422) \
static int func_name(SwsContext *c, const uint8_t *src[], \
@@ -183,166 +183,166 @@ const int *sws_getCoefficients(int colorspace)
#define YUV420FUNC(func_name, dst_type, alpha, abase, PUTFUNC, dst_delta) \
YUV2RGBFUNC(func_name, dst_type, alpha, 0) \
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, pa_1, 0, abase); \
- PUTFUNC(dst_2, py_2, pa_2, 0, abase); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, abase); \
+ PUTFUNC(2, 0, abase); \
\
- LOADCHROMA(pu_1, pv_1, 1); \
- PUTFUNC(dst_2, py_2, pa_2, 1, abase); \
- PUTFUNC(dst_1, py_1, pa_1, 1, abase); \
+ LOADCHROMA(1, 1); \
+ PUTFUNC(2, 1, abase); \
+ PUTFUNC(1, 1, abase); \
\
- LOADCHROMA(pu_1, pv_1, 2); \
- PUTFUNC(dst_1, py_1, pa_1, 2, abase); \
- PUTFUNC(dst_2, py_2, pa_2, 2, abase); \
+ LOADCHROMA(1, 2); \
+ PUTFUNC(1, 2, abase); \
+ PUTFUNC(2, 2, abase); \
\
- LOADCHROMA(pu_1, pv_1, 3); \
- PUTFUNC(dst_2, py_2, pa_2, 3, abase); \
- PUTFUNC(dst_1, py_1, pa_1, 3, abase); \
+ LOADCHROMA(1, 3); \
+ PUTFUNC(2, 3, abase); \
+ PUTFUNC(1, 3, abase); \
ENDYUV2RGBLINE(dst_delta, 0, alpha, 0) \
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, pa_1, 0, abase); \
- PUTFUNC(dst_2, py_2, pa_2, 0, abase); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, abase); \
+ PUTFUNC(2, 0, abase); \
\
- LOADCHROMA(pu_1, pv_1, 1); \
- PUTFUNC(dst_2, py_2, pa_2, 1, abase); \
- PUTFUNC(dst_1, py_1, pa_1, 1, abase); \
+ LOADCHROMA(1, 1); \
+ PUTFUNC(2, 1, abase); \
+ PUTFUNC(1, 1, abase); \
ENDYUV2RGBLINE(dst_delta, 1, alpha, 0) \
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, pa_1, 0, abase); \
- PUTFUNC(dst_2, py_2, pa_2, 0, abase); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, abase); \
+ PUTFUNC(2, 0, abase); \
ENDYUV2RGBFUNC()
#define YUV422FUNC(func_name, dst_type, alpha, abase, PUTFUNC, dst_delta) \
YUV2RGBFUNC(func_name, dst_type, alpha, 1) \
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, pa_1, 0, abase); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, abase); \
\
- LOADCHROMA(pu_2, pv_2, 0); \
- PUTFUNC(dst_2, py_2, pa_2, 0, abase); \
+ LOADCHROMA(2, 0); \
+ PUTFUNC(2, 0, abase); \
\
- LOADCHROMA(pu_2, pv_2, 1); \
- PUTFUNC(dst_2, py_2, pa_2, 1, abase); \
+ LOADCHROMA(2, 1); \
+ PUTFUNC(2, 1, abase); \
\
- LOADCHROMA(pu_1, pv_1, 1); \
- PUTFUNC(dst_1, py_1, pa_1, 1, abase); \
+ LOADCHROMA(1, 1); \
+ PUTFUNC(1, 1, abase); \
\
- LOADCHROMA(pu_1, pv_1, 2); \
- PUTFUNC(dst_1, py_1, pa_1, 2, abase); \
+ LOADCHROMA(1, 2); \
+ PUTFUNC(1, 2, abase); \
\
- LOADCHROMA(pu_2, pv_2, 2); \
- PUTFUNC(dst_2, py_2, pa_2, 2, abase); \
+ LOADCHROMA(2, 2); \
+ PUTFUNC(2, 2, abase); \
\
- LOADCHROMA(pu_2, pv_2, 3); \
- PUTFUNC(dst_2, py_2, pa_2, 3, abase); \
+ LOADCHROMA(2, 3); \
+ PUTFUNC(2, 3, abase); \
\
- LOADCHROMA(pu_1, pv_1, 3); \
- PUTFUNC(dst_1, py_1, pa_1, 3, abase); \
+ LOADCHROMA(1, 3); \
+ PUTFUNC(1, 3, abase); \
ENDYUV2RGBLINE(dst_delta, 0, alpha, 1) \
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, pa_1, 0, abase); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, abase); \
\
- LOADCHROMA(pu_2, pv_2, 0); \
- PUTFUNC(dst_2, py_2, pa_2, 0, abase); \
+ LOADCHROMA(2, 0); \
+ PUTFUNC(2, 0, abase); \
\
- LOADCHROMA(pu_2, pv_2, 1); \
- PUTFUNC(dst_2, py_2, pa_2, 1, abase); \
+ LOADCHROMA(2, 1); \
+ PUTFUNC(2, 1, abase); \
\
- LOADCHROMA(pu_1, pv_1, 1); \
- PUTFUNC(dst_1, py_1, pa_1, 1, abase); \
+ LOADCHROMA(1, 1); \
+ PUTFUNC(1, 1, abase); \
ENDYUV2RGBLINE(dst_delta, 1, alpha, 1) \
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, pa_1, 0, abase); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, abase); \
\
- LOADCHROMA(pu_2, pv_2, 0); \
- PUTFUNC(dst_2, py_2, pa_2, 0, abase); \
+ LOADCHROMA(2, 0); \
+ PUTFUNC(2, 0, abase); \
ENDYUV2RGBFUNC()
#define YUV420FUNC_DITHER(func_name, dst_type, LOADDITHER, PUTFUNC, dst_delta) \
YUV2RGBFUNC(func_name, dst_type, 0, 0) \
LOADDITHER \
\
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, 0, 0); \
- PUTFUNC(dst_2, py_2, 0, 0 + 8); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, 0); \
+ PUTFUNC(2, 0, 0 + 8); \
\
- LOADCHROMA(pu_1, pv_1, 1); \
- PUTFUNC(dst_2, py_2, 1, 2 + 8); \
- PUTFUNC(dst_1, py_1, 1, 2); \
+ LOADCHROMA(1, 1); \
+ PUTFUNC(2, 1, 2 + 8); \
+ PUTFUNC(1, 1, 2); \
\
- LOADCHROMA(pu_1, pv_1, 2); \
- PUTFUNC(dst_1, py_1, 2, 4); \
- PUTFUNC(dst_2, py_2, 2, 4 + 8); \
+ LOADCHROMA(1, 2); \
+ PUTFUNC(1, 2, 4); \
+ PUTFUNC(2, 2, 4 + 8); \
\
- LOADCHROMA(pu_1, pv_1, 3); \
- PUTFUNC(dst_2, py_2, 3, 6 + 8); \
- PUTFUNC(dst_1, py_1, 3, 6); \
+ LOADCHROMA(1, 3); \
+ PUTFUNC(2, 3, 6 + 8); \
+ PUTFUNC(1, 3, 6); \
ENDYUV2RGBLINE(dst_delta, 0, 0, 0) \
LOADDITHER \
\
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, 0, 0); \
- PUTFUNC(dst_2, py_2, 0, 0 + 8); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, 0); \
+ PUTFUNC(2, 0, 0 + 8); \
\
- LOADCHROMA(pu_1, pv_1, 1); \
- PUTFUNC(dst_2, py_2, 1, 2 + 8); \
- PUTFUNC(dst_1, py_1, 1, 2); \
+ LOADCHROMA(1, 1); \
+ PUTFUNC(2, 1, 2 + 8); \
+ PUTFUNC(1, 1, 2); \
ENDYUV2RGBLINE(dst_delta, 1, 0, 0) \
LOADDITHER \
\
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, 0, 0); \
- PUTFUNC(dst_2, py_2, 0, 0 + 8); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, 0); \
+ PUTFUNC(2, 0, 0 + 8); \
ENDYUV2RGBFUNC()
#define YUV422FUNC_DITHER(func_name, dst_type, LOADDITHER, PUTFUNC, dst_delta) \
YUV2RGBFUNC(func_name, dst_type, 0, 1) \
LOADDITHER \
\
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, 0, 0); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, 0); \
\
- LOADCHROMA(pu_2, pv_2, 0); \
- PUTFUNC(dst_2, py_2, 0, 0 + 8); \
+ LOADCHROMA(2, 0); \
+ PUTFUNC(2, 0, 0 + 8); \
\
- LOADCHROMA(pu_2, pv_2, 1); \
- PUTFUNC(dst_2, py_2, 1, 2 + 8); \
+ LOADCHROMA(2, 1); \
+ PUTFUNC(2, 1, 2 + 8); \
\
- LOADCHROMA(pu_1, pv_1, 1); \
- PUTFUNC(dst_1, py_1, 1, 2); \
+ LOADCHROMA(1, 1); \
+ PUTFUNC(1, 1, 2); \
\
- LOADCHROMA(pu_1, pv_1, 2); \
- PUTFUNC(dst_1, py_1, 2, 4); \
+ LOADCHROMA(1, 2); \
+ PUTFUNC(1, 2, 4); \
\
- LOADCHROMA(pu_2, pv_2, 2); \
- PUTFUNC(dst_2, py_2, 2, 4 + 8); \
+ LOADCHROMA(2, 2); \
+ PUTFUNC(2, 2, 4 + 8); \
\
- LOADCHROMA(pu_2, pv_2, 3); \
- PUTFUNC(dst_2, py_2, 3, 6 + 8); \
+ LOADCHROMA(2, 3); \
+ PUTFUNC(2, 3, 6 + 8); \
\
- LOADCHROMA(pu_1, pv_1, 3); \
- PUTFUNC(dst_1, py_1, 3, 6); \
+ LOADCHROMA(1, 3); \
+ PUTFUNC(1, 3, 6); \
ENDYUV2RGBLINE(dst_delta, 0, 0, 1) \
LOADDITHER \
\
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, 0, 0); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, 0); \
\
- LOADCHROMA(pu_2, pv_2, 0); \
- PUTFUNC(dst_2, py_2, 0, 0 + 8); \
+ LOADCHROMA(2, 0); \
+ PUTFUNC(2, 0, 0 + 8); \
\
- LOADCHROMA(pu_2, pv_2, 1); \
- PUTFUNC(dst_2, py_2, 1, 2 + 8); \
+ LOADCHROMA(2, 1); \
+ PUTFUNC(2, 1, 2 + 8); \
\
- LOADCHROMA(pu_1, pv_1, 1); \
- PUTFUNC(dst_1, py_1, 1, 2); \
+ LOADCHROMA(1, 1); \
+ PUTFUNC(1, 1, 2); \
ENDYUV2RGBLINE(dst_delta, 1, 0, 1) \
LOADDITHER \
\
- LOADCHROMA(pu_1, pv_1, 0); \
- PUTFUNC(dst_1, py_1, 0, 0); \
+ LOADCHROMA(1, 0); \
+ PUTFUNC(1, 0, 0); \
\
- LOADCHROMA(pu_2, pv_2, 0); \
- PUTFUNC(dst_2, py_2, 0, 0 + 8); \
+ LOADCHROMA(2, 0); \
+ PUTFUNC(2, 0, 0 + 8); \
ENDYUV2RGBFUNC()
#define LOADDITHER16 \
@@ -350,86 +350,86 @@ const int *sws_getCoefficients(int colorspace)
const uint8_t *e16 = ff_dither_2x2_4[y & 1]; \
const uint8_t *f16 = ff_dither_2x2_8[(y & 1)^1];
-#define PUTRGB16(dst, src, i, o) \
- Y = src[2 * i]; \
- dst[2 * i] = r[Y + d16[0 + o]] + \
- g[Y + e16[0 + o]] + \
- b[Y + f16[0 + o]]; \
- Y = src[2 * i + 1]; \
- dst[2 * i + 1] = r[Y + d16[1 + o]] + \
- g[Y + e16[1 + o]] + \
- b[Y + f16[1 + o]];
+#define PUTRGB16(l, i, o) \
+ Y = py_##l[2 * i]; \
+ dst_##l[2 * i] = r[Y + d16[0 + o]] + \
+ g[Y + e16[0 + o]] + \
+ b[Y + f16[0 + o]]; \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l[2 * i + 1] = r[Y + d16[1 + o]] + \
+ g[Y + e16[1 + o]] + \
+ b[Y + f16[1 + o]];
#define LOADDITHER15 \
const uint8_t *d16 = ff_dither_2x2_8[y & 1]; \
const uint8_t *e16 = ff_dither_2x2_8[(y & 1)^1];
-#define PUTRGB15(dst, src, i, o) \
- Y = src[2 * i]; \
- dst[2 * i] = r[Y + d16[0 + o]] + \
- g[Y + d16[1 + o]] + \
- b[Y + e16[0 + o]]; \
- Y = src[2 * i + 1]; \
- dst[2 * i + 1] = r[Y + d16[1 + o]] + \
- g[Y + d16[0 + o]] + \
- b[Y + e16[1 + o]];
+#define PUTRGB15(l, i, o) \
+ Y = py_##l[2 * i]; \
+ dst_##l[2 * i] = r[Y + d16[0 + o]] + \
+ g[Y + d16[1 + o]] + \
+ b[Y + e16[0 + o]]; \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l[2 * i + 1] = r[Y + d16[1 + o]] + \
+ g[Y + d16[0 + o]] + \
+ b[Y + e16[1 + o]];
#define LOADDITHER12 \
const uint8_t *d16 = ff_dither_4x4_16[y & 3];
-#define PUTRGB12(dst, src, i, o) \
- Y = src[2 * i]; \
- dst[2 * i] = r[Y + d16[0 + o]] + \
- g[Y + d16[0 + o]] + \
- b[Y + d16[0 + o]]; \
- Y = src[2 * i + 1]; \
- dst[2 * i + 1] = r[Y + d16[1 + o]] + \
- g[Y + d16[1 + o]] + \
- b[Y + d16[1 + o]];
+#define PUTRGB12(l, i, o) \
+ Y = py_##l[2 * i]; \
+ dst_##l[2 * i] = r[Y + d16[0 + o]] + \
+ g[Y + d16[0 + o]] + \
+ b[Y + d16[0 + o]]; \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l[2 * i + 1] = r[Y + d16[1 + o]] + \
+ g[Y + d16[1 + o]] + \
+ b[Y + d16[1 + o]];
#define LOADDITHER8 \
const uint8_t *d32 = ff_dither_8x8_32[yd & 7]; \
const uint8_t *d64 = ff_dither_8x8_73[yd & 7];
-#define PUTRGB8(dst, src, i, o) \
- Y = src[2 * i]; \
- dst[2 * i] = r[Y + d32[0 + o]] + \
- g[Y + d32[0 + o]] + \
- b[Y + d64[0 + o]]; \
- Y = src[2 * i + 1]; \
- dst[2 * i + 1] = r[Y + d32[1 + o]] + \
- g[Y + d32[1 + o]] + \
- b[Y + d64[1 + o]];
+#define PUTRGB8(l, i, o) \
+ Y = py_##l[2 * i]; \
+ dst_##l[2 * i] = r[Y + d32[0 + o]] + \
+ g[Y + d32[0 + o]] + \
+ b[Y + d64[0 + o]]; \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l[2 * i + 1] = r[Y + d32[1 + o]] + \
+ g[Y + d32[1 + o]] + \
+ b[Y + d64[1 + o]];
#define LOADDITHER4D \
const uint8_t * d64 = ff_dither_8x8_73[yd & 7]; \
const uint8_t *d128 = ff_dither_8x8_220[yd & 7]; \
int acc;
-#define PUTRGB4D(dst, src, i, o) \
- Y = src[2 * i]; \
+#define PUTRGB4D(l, i, o) \
+ Y = py_##l[2 * i]; \
acc = r[Y + d128[0 + o]] + \
g[Y + d64[0 + o]] + \
b[Y + d128[0 + o]]; \
- Y = src[2 * i + 1]; \
+ Y = py_##l[2 * i + 1]; \
acc |= (r[Y + d128[1 + o]] + \
g[Y + d64[1 + o]] + \
b[Y + d128[1 + o]]) << 4; \
- dst[i] = acc;
+ dst_##l[i] = acc;
#define LOADDITHER4DB \
const uint8_t *d64 = ff_dither_8x8_73[yd & 7]; \
const uint8_t *d128 = ff_dither_8x8_220[yd & 7];
-#define PUTRGB4DB(dst, src, i, o) \
- Y = src[2 * i]; \
- dst[2 * i] = r[Y + d128[0 + o]] + \
- g[Y + d64[0 + o]] + \
- b[Y + d128[0 + o]]; \
- Y = src[2 * i + 1]; \
- dst[2 * i + 1] = r[Y + d128[1 + o]] + \
- g[Y + d64[1 + o]] + \
- b[Y + d128[1 + o]];
+#define PUTRGB4DB(l, i, o) \
+ Y = py_##l[2 * i]; \
+ dst_##l[2 * i] = r[Y + d128[0 + o]] + \
+ g[Y + d64[0 + o]] + \
+ b[Y + d128[0 + o]]; \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l[2 * i + 1] = r[Y + d128[1 + o]] + \
+ g[Y + d64[1 + o]] + \
+ b[Y + d128[1 + o]];
YUV2RGBFUNC(yuv2rgb_c_1_ordered_dither, uint8_t, 0, 0)
const uint8_t *d128 = ff_dither_8x8_220[yd & 7];
--
2.30.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* [FFmpeg-devel] [PATCH v2 2/5] swscale/yuv2rgb: prepare YUV2RGBFUNC macro for multi-planar rgb
2024-08-06 10:51 [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 1/5] swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar rgb Ramiro Polla
@ 2024-08-06 10:51 ` Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 3/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters Ramiro Polla
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Ramiro Polla @ 2024-08-06 10:51 UTC (permalink / raw)
To: ffmpeg-devel
This will be used in the upcoming yuv42{0,2}p -> gbrp unscaled
colorspace converters.
There is no difference in performance.
---
libswscale/yuv2rgb.c | 83 +++++++++++++++++++++++++-------------------
1 file changed, 48 insertions(+), 35 deletions(-)
diff --git a/libswscale/yuv2rgb.c b/libswscale/yuv2rgb.c
index d77660b3a3..31d10235ef 100644
--- a/libswscale/yuv2rgb.c
+++ b/libswscale/yuv2rgb.c
@@ -124,7 +124,7 @@ const int *sws_getCoefficients(int colorspace)
dst_##l[12 * i + 8] = dst_##l[12 * i + 9] = g[Y]; \
dst_##l[12 * i + 10] = dst_##l[12 * i + 11] = r[Y];
-#define YUV2RGBFUNC(func_name, dst_type, alpha, yuv422) \
+#define YUV2RGBFUNC(func_name, dst_type, alpha, yuv422, nb_dst_planes) \
static int func_name(SwsContext *c, const uint8_t *src[], \
int srcStride[], int srcSliceY, int srcSliceH, \
uint8_t *dst[], int dstStride[]) \
@@ -137,6 +137,7 @@ const int *sws_getCoefficients(int colorspace)
(dst_type *)(dst[0] + (yd) * dstStride[0]); \
dst_type *dst_2 = \
(dst_type *)(dst[0] + (yd + 1) * dstStride[0]); \
+ dst_type av_unused *dst1_1, *dst1_2, *dst2_1, *dst2_2; \
dst_type av_unused *r, *g, *b; \
const uint8_t *py_1 = src[0] + y * srcStride[0]; \
const uint8_t *py_2 = py_1 + srcStride[0]; \
@@ -145,6 +146,12 @@ const int *sws_getCoefficients(int colorspace)
const uint8_t av_unused *pu_2, *pv_2; \
const uint8_t av_unused *pa_1, *pa_2; \
unsigned int h_size = c->dstW >> 3; \
+ if (nb_dst_planes > 1) { \
+ dst1_1 = (dst_type *)(dst[1] + (yd) * dstStride[1]); \
+ dst1_2 = (dst_type *)(dst[1] + (yd + 1) * dstStride[1]); \
+ dst2_1 = (dst_type *)(dst[2] + (yd) * dstStride[2]); \
+ dst2_2 = (dst_type *)(dst[2] + (yd + 1) * dstStride[2]); \
+ } \
if (yuv422) { \
pu_2 = pu_1 + srcStride[1]; \
pv_2 = pv_1 + srcStride[2]; \
@@ -156,7 +163,7 @@ const int *sws_getCoefficients(int colorspace)
while (h_size--) { \
int av_unused U, V, Y; \
-#define ENDYUV2RGBLINE(dst_delta, ss, alpha, yuv422) \
+#define ENDYUV2RGBLINE(dst_delta, ss, alpha, yuv422, nb_dst_planes) \
pu_1 += 4 >> ss; \
pv_1 += 4 >> ss; \
if (yuv422) { \
@@ -171,6 +178,12 @@ const int *sws_getCoefficients(int colorspace)
} \
dst_1 += dst_delta >> ss; \
dst_2 += dst_delta >> ss; \
+ if (nb_dst_planes > 1) { \
+ dst1_1 += dst_delta >> ss; \
+ dst1_2 += dst_delta >> ss; \
+ dst2_1 += dst_delta >> ss; \
+ dst2_2 += dst_delta >> ss; \
+ } \
} \
if (c->dstW & (4 >> ss)) { \
int av_unused Y, U, V; \
@@ -181,8 +194,8 @@ const int *sws_getCoefficients(int colorspace)
return srcSliceH; \
}
-#define YUV420FUNC(func_name, dst_type, alpha, abase, PUTFUNC, dst_delta) \
- YUV2RGBFUNC(func_name, dst_type, alpha, 0) \
+#define YUV420FUNC(func_name, dst_type, alpha, abase, PUTFUNC, dst_delta, nb_dst_planes) \
+ YUV2RGBFUNC(func_name, dst_type, alpha, 0, nb_dst_planes) \
LOADCHROMA(1, 0); \
PUTFUNC(1, 0, abase); \
PUTFUNC(2, 0, abase); \
@@ -198,7 +211,7 @@ const int *sws_getCoefficients(int colorspace)
LOADCHROMA(1, 3); \
PUTFUNC(2, 3, abase); \
PUTFUNC(1, 3, abase); \
- ENDYUV2RGBLINE(dst_delta, 0, alpha, 0) \
+ ENDYUV2RGBLINE(dst_delta, 0, alpha, 0, nb_dst_planes) \
LOADCHROMA(1, 0); \
PUTFUNC(1, 0, abase); \
PUTFUNC(2, 0, abase); \
@@ -206,14 +219,14 @@ const int *sws_getCoefficients(int colorspace)
LOADCHROMA(1, 1); \
PUTFUNC(2, 1, abase); \
PUTFUNC(1, 1, abase); \
- ENDYUV2RGBLINE(dst_delta, 1, alpha, 0) \
+ ENDYUV2RGBLINE(dst_delta, 1, alpha, 0, nb_dst_planes) \
LOADCHROMA(1, 0); \
PUTFUNC(1, 0, abase); \
PUTFUNC(2, 0, abase); \
ENDYUV2RGBFUNC()
-#define YUV422FUNC(func_name, dst_type, alpha, abase, PUTFUNC, dst_delta) \
- YUV2RGBFUNC(func_name, dst_type, alpha, 1) \
+#define YUV422FUNC(func_name, dst_type, alpha, abase, PUTFUNC, dst_delta, nb_dst_planes) \
+ YUV2RGBFUNC(func_name, dst_type, alpha, 1, nb_dst_planes) \
LOADCHROMA(1, 0); \
PUTFUNC(1, 0, abase); \
\
@@ -237,7 +250,7 @@ const int *sws_getCoefficients(int colorspace)
\
LOADCHROMA(1, 3); \
PUTFUNC(1, 3, abase); \
- ENDYUV2RGBLINE(dst_delta, 0, alpha, 1) \
+ ENDYUV2RGBLINE(dst_delta, 0, alpha, 1, nb_dst_planes) \
LOADCHROMA(1, 0); \
PUTFUNC(1, 0, abase); \
\
@@ -249,7 +262,7 @@ const int *sws_getCoefficients(int colorspace)
\
LOADCHROMA(1, 1); \
PUTFUNC(1, 1, abase); \
- ENDYUV2RGBLINE(dst_delta, 1, alpha, 1) \
+ ENDYUV2RGBLINE(dst_delta, 1, alpha, 1, nb_dst_planes) \
LOADCHROMA(1, 0); \
PUTFUNC(1, 0, abase); \
\
@@ -258,7 +271,7 @@ const int *sws_getCoefficients(int colorspace)
ENDYUV2RGBFUNC()
#define YUV420FUNC_DITHER(func_name, dst_type, LOADDITHER, PUTFUNC, dst_delta) \
- YUV2RGBFUNC(func_name, dst_type, 0, 0) \
+ YUV2RGBFUNC(func_name, dst_type, 0, 0, 1) \
LOADDITHER \
\
LOADCHROMA(1, 0); \
@@ -276,7 +289,7 @@ const int *sws_getCoefficients(int colorspace)
LOADCHROMA(1, 3); \
PUTFUNC(2, 3, 6 + 8); \
PUTFUNC(1, 3, 6); \
- ENDYUV2RGBLINE(dst_delta, 0, 0, 0) \
+ ENDYUV2RGBLINE(dst_delta, 0, 0, 0, 1) \
LOADDITHER \
\
LOADCHROMA(1, 0); \
@@ -286,7 +299,7 @@ const int *sws_getCoefficients(int colorspace)
LOADCHROMA(1, 1); \
PUTFUNC(2, 1, 2 + 8); \
PUTFUNC(1, 1, 2); \
- ENDYUV2RGBLINE(dst_delta, 1, 0, 0) \
+ ENDYUV2RGBLINE(dst_delta, 1, 0, 0, 1) \
LOADDITHER \
\
LOADCHROMA(1, 0); \
@@ -295,7 +308,7 @@ const int *sws_getCoefficients(int colorspace)
ENDYUV2RGBFUNC()
#define YUV422FUNC_DITHER(func_name, dst_type, LOADDITHER, PUTFUNC, dst_delta) \
- YUV2RGBFUNC(func_name, dst_type, 0, 1) \
+ YUV2RGBFUNC(func_name, dst_type, 0, 1, 1) \
LOADDITHER \
\
LOADCHROMA(1, 0); \
@@ -321,7 +334,7 @@ const int *sws_getCoefficients(int colorspace)
\
LOADCHROMA(1, 3); \
PUTFUNC(1, 3, 6); \
- ENDYUV2RGBLINE(dst_delta, 0, 0, 1) \
+ ENDYUV2RGBLINE(dst_delta, 0, 0, 1, 1) \
LOADDITHER \
\
LOADCHROMA(1, 0); \
@@ -335,7 +348,7 @@ const int *sws_getCoefficients(int colorspace)
\
LOADCHROMA(1, 1); \
PUTFUNC(1, 1, 2); \
- ENDYUV2RGBLINE(dst_delta, 1, 0, 1) \
+ ENDYUV2RGBLINE(dst_delta, 1, 0, 1, 1) \
LOADDITHER \
\
LOADCHROMA(1, 0); \
@@ -431,7 +444,7 @@ const int *sws_getCoefficients(int colorspace)
g[Y + d64[1 + o]] + \
b[Y + d128[1 + o]];
-YUV2RGBFUNC(yuv2rgb_c_1_ordered_dither, uint8_t, 0, 0)
+YUV2RGBFUNC(yuv2rgb_c_1_ordered_dither, uint8_t, 0, 0, 1)
const uint8_t *d128 = ff_dither_8x8_220[yd & 7];
char out_1 = 0, out_2 = 0;
g = c->table_gU[128 + YUVRGB_TABLE_HEADROOM] + c->table_gV[128 + YUVRGB_TABLE_HEADROOM];
@@ -494,18 +507,18 @@ YUV2RGBFUNC(yuv2rgb_c_1_ordered_dither, uint8_t, 0, 0)
ENDYUV2RGBFUNC()
// YUV420
-YUV420FUNC(yuv2rgb_c_48, uint8_t, 0, 0, PUTRGB48, 48)
-YUV420FUNC(yuv2rgb_c_bgr48, uint8_t, 0, 0, PUTBGR48, 48)
-YUV420FUNC(yuv2rgb_c_32, uint32_t, 0, 0, PUTRGB, 8)
+YUV420FUNC(yuv2rgb_c_48, uint8_t, 0, 0, PUTRGB48, 48, 1)
+YUV420FUNC(yuv2rgb_c_bgr48, uint8_t, 0, 0, PUTBGR48, 48, 1)
+YUV420FUNC(yuv2rgb_c_32, uint32_t, 0, 0, PUTRGB, 8, 1)
#if HAVE_BIGENDIAN
-YUV420FUNC(yuva2argb_c, uint32_t, 1, 24, PUTRGBA, 8)
-YUV420FUNC(yuva2rgba_c, uint32_t, 1, 0, PUTRGBA, 8)
+YUV420FUNC(yuva2argb_c, uint32_t, 1, 24, PUTRGBA, 8, 1)
+YUV420FUNC(yuva2rgba_c, uint32_t, 1, 0, PUTRGBA, 8, 1)
#else
-YUV420FUNC(yuva2rgba_c, uint32_t, 1, 24, PUTRGBA, 8)
-YUV420FUNC(yuva2argb_c, uint32_t, 1, 0, PUTRGBA, 8)
+YUV420FUNC(yuva2rgba_c, uint32_t, 1, 24, PUTRGBA, 8, 1)
+YUV420FUNC(yuva2argb_c, uint32_t, 1, 0, PUTRGBA, 8, 1)
#endif
-YUV420FUNC(yuv2rgb_c_24_rgb, uint8_t, 0, 0, PUTRGB24, 24)
-YUV420FUNC(yuv2rgb_c_24_bgr, uint8_t, 0, 0, PUTBGR24, 24)
+YUV420FUNC(yuv2rgb_c_24_rgb, uint8_t, 0, 0, PUTRGB24, 24, 1)
+YUV420FUNC(yuv2rgb_c_24_bgr, uint8_t, 0, 0, PUTBGR24, 24, 1)
YUV420FUNC_DITHER(yuv2rgb_c_16_ordered_dither, uint16_t, LOADDITHER16, PUTRGB16, 8)
YUV420FUNC_DITHER(yuv2rgb_c_15_ordered_dither, uint16_t, LOADDITHER15, PUTRGB15, 8)
YUV420FUNC_DITHER(yuv2rgb_c_12_ordered_dither, uint16_t, LOADDITHER12, PUTRGB12, 8)
@@ -514,18 +527,18 @@ YUV420FUNC_DITHER(yuv2rgb_c_4_ordered_dither, uint8_t, LOADDITHER4D, PUTRGB4D
YUV420FUNC_DITHER(yuv2rgb_c_4b_ordered_dither, uint8_t, LOADDITHER4DB, PUTRGB4DB, 8)
// YUV422
-YUV422FUNC(yuv422p_rgb48_c, uint8_t, 0, 0, PUTRGB48, 48)
-YUV422FUNC(yuv422p_bgr48_c, uint8_t, 0, 0, PUTBGR48, 48)
-YUV422FUNC(yuv422p_rgb32_c, uint32_t, 0, 0, PUTRGB, 8)
+YUV422FUNC(yuv422p_rgb48_c, uint8_t, 0, 0, PUTRGB48, 48, 1)
+YUV422FUNC(yuv422p_bgr48_c, uint8_t, 0, 0, PUTBGR48, 48, 1)
+YUV422FUNC(yuv422p_rgb32_c, uint32_t, 0, 0, PUTRGB, 8, 1)
#if HAVE_BIGENDIAN
-YUV422FUNC(yuva422p_argb_c, uint32_t, 1, 24, PUTRGBA, 8)
-YUV422FUNC(yuva422p_rgba_c, uint32_t, 1, 0, PUTRGBA, 8)
+YUV422FUNC(yuva422p_argb_c, uint32_t, 1, 24, PUTRGBA, 8, 1)
+YUV422FUNC(yuva422p_rgba_c, uint32_t, 1, 0, PUTRGBA, 8, 1)
#else
-YUV422FUNC(yuva422p_rgba_c, uint32_t, 1, 24, PUTRGBA, 8)
-YUV422FUNC(yuva422p_argb_c, uint32_t, 1, 0, PUTRGBA, 8)
+YUV422FUNC(yuva422p_rgba_c, uint32_t, 1, 24, PUTRGBA, 8, 1)
+YUV422FUNC(yuva422p_argb_c, uint32_t, 1, 0, PUTRGBA, 8, 1)
#endif
-YUV422FUNC(yuv422p_rgb24_c, uint8_t, 0, 0, PUTRGB24, 24)
-YUV422FUNC(yuv422p_bgr24_c, uint8_t, 0, 0, PUTBGR24, 24)
+YUV422FUNC(yuv422p_rgb24_c, uint8_t, 0, 0, PUTRGB24, 24, 1)
+YUV422FUNC(yuv422p_bgr24_c, uint8_t, 0, 0, PUTBGR24, 24, 1)
YUV422FUNC_DITHER(yuv422p_bgr16, uint16_t, LOADDITHER16, PUTRGB16, 8)
YUV422FUNC_DITHER(yuv422p_bgr15, uint16_t, LOADDITHER15, PUTRGB15, 8)
YUV422FUNC_DITHER(yuv422p_bgr12, uint16_t, LOADDITHER12, PUTRGB12, 8)
--
2.30.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* [FFmpeg-devel] [PATCH v2 3/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters
2024-08-06 10:51 [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 1/5] swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar rgb Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 2/5] swscale/yuv2rgb: prepare YUV2RGBFUNC macro " Ramiro Polla
@ 2024-08-06 10:51 ` Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 4/5] swscale/x86/yuv2rgb: add ssse3 " Ramiro Polla
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Ramiro Polla @ 2024-08-06 10:51 UTC (permalink / raw)
To: ffmpeg-devel
---
libswscale/yuv2rgb.c | 16 ++++++++++
tests/checkasm/sw_yuv2rgb.c | 60 +++++++++++++++++++++++++++----------
2 files changed, 60 insertions(+), 16 deletions(-)
diff --git a/libswscale/yuv2rgb.c b/libswscale/yuv2rgb.c
index 31d10235ef..52fe2093e7 100644
--- a/libswscale/yuv2rgb.c
+++ b/libswscale/yuv2rgb.c
@@ -124,6 +124,16 @@ const int *sws_getCoefficients(int colorspace)
dst_##l[12 * i + 8] = dst_##l[12 * i + 9] = g[Y]; \
dst_##l[12 * i + 10] = dst_##l[12 * i + 11] = r[Y];
+#define PUTGBRP(l, i, abase) \
+ Y = py_##l[2 * i]; \
+ dst_##l [2 * i + 0] = g[Y]; \
+ dst1_##l[2 * i + 0] = b[Y]; \
+ dst2_##l[2 * i + 0] = r[Y]; \
+ Y = py_##l[2 * i + 1]; \
+ dst_##l [2 * i + 1] = g[Y]; \
+ dst1_##l[2 * i + 1] = b[Y]; \
+ dst2_##l[2 * i + 1] = r[Y];
+
#define YUV2RGBFUNC(func_name, dst_type, alpha, yuv422, nb_dst_planes) \
static int func_name(SwsContext *c, const uint8_t *src[], \
int srcStride[], int srcSliceY, int srcSliceH, \
@@ -519,6 +529,7 @@ YUV420FUNC(yuva2argb_c, uint32_t, 1, 0, PUTRGBA, 8, 1)
#endif
YUV420FUNC(yuv2rgb_c_24_rgb, uint8_t, 0, 0, PUTRGB24, 24, 1)
YUV420FUNC(yuv2rgb_c_24_bgr, uint8_t, 0, 0, PUTBGR24, 24, 1)
+YUV420FUNC(yuv420p_gbrp_c, uint8_t, 0, 0, PUTGBRP, 8, 3)
YUV420FUNC_DITHER(yuv2rgb_c_16_ordered_dither, uint16_t, LOADDITHER16, PUTRGB16, 8)
YUV420FUNC_DITHER(yuv2rgb_c_15_ordered_dither, uint16_t, LOADDITHER15, PUTRGB15, 8)
YUV420FUNC_DITHER(yuv2rgb_c_12_ordered_dither, uint16_t, LOADDITHER12, PUTRGB12, 8)
@@ -539,6 +550,7 @@ YUV422FUNC(yuva422p_argb_c, uint32_t, 1, 0, PUTRGBA, 8, 1)
#endif
YUV422FUNC(yuv422p_rgb24_c, uint8_t, 0, 0, PUTRGB24, 24, 1)
YUV422FUNC(yuv422p_bgr24_c, uint8_t, 0, 0, PUTBGR24, 24, 1)
+YUV422FUNC(yuv422p_gbrp_c, uint8_t, 0, 0, PUTGBRP, 8, 3)
YUV422FUNC_DITHER(yuv422p_bgr16, uint16_t, LOADDITHER16, PUTRGB16, 8)
YUV422FUNC_DITHER(yuv422p_bgr15, uint16_t, LOADDITHER15, PUTRGB15, 8)
YUV422FUNC_DITHER(yuv422p_bgr12, uint16_t, LOADDITHER12, PUTRGB12, 8)
@@ -604,6 +616,8 @@ SwsFunc ff_yuv2rgb_get_func_ptr(SwsContext *c)
return yuv422p_bgr4_byte;
case AV_PIX_FMT_MONOBLACK:
return yuv2rgb_c_1_ordered_dither;
+ case AV_PIX_FMT_GBRP:
+ return yuv422p_gbrp_c;
}
} else {
switch (c->dstFormat) {
@@ -644,6 +658,8 @@ SwsFunc ff_yuv2rgb_get_func_ptr(SwsContext *c)
return yuv2rgb_c_4b_ordered_dither;
case AV_PIX_FMT_MONOBLACK:
return yuv2rgb_c_1_ordered_dither;
+ case AV_PIX_FMT_GBRP:
+ return yuv420p_gbrp_c;
}
}
return NULL;
diff --git a/tests/checkasm/sw_yuv2rgb.c b/tests/checkasm/sw_yuv2rgb.c
index 02ed9a74d5..5125f83968 100644
--- a/tests/checkasm/sw_yuv2rgb.c
+++ b/tests/checkasm/sw_yuv2rgb.c
@@ -58,6 +58,7 @@ static const int dst_fmts[] = {
// AV_PIX_FMT_RGB4_BYTE,
// AV_PIX_FMT_BGR4_BYTE,
// AV_PIX_FMT_MONOBLACK,
+ AV_PIX_FMT_GBRP,
};
static int cmp_off_by_n(const uint8_t *ref, const uint8_t *test, size_t n, int accuracy)
@@ -116,13 +117,25 @@ static void check_yuv2rgb(int src_pix_fmt)
LOCAL_ALIGNED_8(uint8_t, src_a, [MAX_LINE_SIZE * 2]);
const uint8_t *src[4] = { src_y, src_u, src_v, src_a };
- LOCAL_ALIGNED_8(uint8_t, dst0_, [2 * MAX_LINE_SIZE * 6]);
- uint8_t *dst0[4] = { dst0_ };
- uint8_t *lines0[2] = { dst0_, dst0_ + MAX_LINE_SIZE * 6 };
-
- LOCAL_ALIGNED_8(uint8_t, dst1_, [2 * MAX_LINE_SIZE * 6]);
- uint8_t *dst1[4] = { dst1_ };
- uint8_t *lines1[2] = { dst1_, dst1_ + MAX_LINE_SIZE * 6 };
+ LOCAL_ALIGNED_8(uint8_t, dst0_0, [2 * MAX_LINE_SIZE * 6]);
+ LOCAL_ALIGNED_8(uint8_t, dst0_1, [2 * MAX_LINE_SIZE]);
+ LOCAL_ALIGNED_8(uint8_t, dst0_2, [2 * MAX_LINE_SIZE]);
+ uint8_t *dst0[4] = { dst0_0, dst0_1, dst0_2 };
+ uint8_t *lines0[4][2] = {
+ { dst0_0, dst0_0 + MAX_LINE_SIZE * 6 },
+ { dst0_1, dst0_1 + MAX_LINE_SIZE },
+ { dst0_2, dst0_2 + MAX_LINE_SIZE }
+ };
+
+ LOCAL_ALIGNED_8(uint8_t, dst1_0, [2 * MAX_LINE_SIZE * 6]);
+ LOCAL_ALIGNED_8(uint8_t, dst1_1, [2 * MAX_LINE_SIZE]);
+ LOCAL_ALIGNED_8(uint8_t, dst1_2, [2 * MAX_LINE_SIZE]);
+ uint8_t *dst1[4] = { dst1_0, dst1_1, dst1_2 };
+ uint8_t *lines1[4][2] = {
+ { dst1_0, dst1_0 + MAX_LINE_SIZE * 6 },
+ { dst1_1, dst1_1 + MAX_LINE_SIZE },
+ { dst1_2, dst1_2 + MAX_LINE_SIZE }
+ };
randomize_buffers(src_y, MAX_LINE_SIZE * 2);
randomize_buffers(src_u, MAX_LINE_SIZE);
@@ -145,7 +158,11 @@ static void check_yuv2rgb(int src_pix_fmt)
width >> src_desc->log2_chroma_w,
width,
};
- int dstStride[4] = { MAX_LINE_SIZE * 6 };
+ int dstStride[4] = {
+ MAX_LINE_SIZE * 6,
+ MAX_LINE_SIZE,
+ MAX_LINE_SIZE,
+ };
// override log level to prevent spamming of the message
// "No accelerated colorspace conversion found from %s to %s"
@@ -159,8 +176,14 @@ static void check_yuv2rgb(int src_pix_fmt)
fail();
if (check_func(ctx->convert_unscaled, "%s_%s_%d", src_desc->name, dst_desc->name, width)) {
- memset(dst0_, 0xFF, 2 * MAX_LINE_SIZE * 6);
- memset(dst1_, 0xFF, 2 * MAX_LINE_SIZE * 6);
+ memset(dst0_0, 0xFF, 2 * MAX_LINE_SIZE * 6);
+ memset(dst1_0, 0xFF, 2 * MAX_LINE_SIZE * 6);
+ if (dst_pix_fmt == AV_PIX_FMT_GBRP) {
+ memset(dst0_1, 0xFF, MAX_LINE_SIZE);
+ memset(dst0_2, 0xFF, MAX_LINE_SIZE);
+ memset(dst1_1, 0xFF, MAX_LINE_SIZE);
+ memset(dst1_2, 0xFF, MAX_LINE_SIZE);
+ }
call_ref(ctx, src, srcStride, srcSliceY,
srcSliceH, dst0, dstStride);
@@ -173,19 +196,24 @@ static void check_yuv2rgb(int src_pix_fmt)
dst_pix_fmt == AV_PIX_FMT_BGRA ||
dst_pix_fmt == AV_PIX_FMT_RGB24 ||
dst_pix_fmt == AV_PIX_FMT_BGR24) {
- if (cmp_off_by_n(lines0[0], lines1[0], width * sample_size, 3) ||
- cmp_off_by_n(lines0[1], lines1[1], width * sample_size, 3))
+ if (cmp_off_by_n(lines0[0][0], lines1[0][0], width * sample_size, 3) ||
+ cmp_off_by_n(lines0[0][1], lines1[0][1], width * sample_size, 3))
fail();
} else if (dst_pix_fmt == AV_PIX_FMT_RGB565 ||
dst_pix_fmt == AV_PIX_FMT_BGR565) {
- if (cmp_565_by_n(lines0[0], lines1[0], width, 2) ||
- cmp_565_by_n(lines0[1], lines1[1], width, 2))
+ if (cmp_565_by_n(lines0[0][0], lines1[0][0], width, 2) ||
+ cmp_565_by_n(lines0[0][1], lines1[0][1], width, 2))
fail();
} else if (dst_pix_fmt == AV_PIX_FMT_RGB555 ||
dst_pix_fmt == AV_PIX_FMT_BGR555) {
- if (cmp_555_by_n(lines0[0], lines1[0], width, 2) ||
- cmp_555_by_n(lines0[1], lines1[1], width, 2))
+ if (cmp_555_by_n(lines0[0][0], lines1[0][0], width, 2) ||
+ cmp_555_by_n(lines0[0][1], lines1[0][1], width, 2))
fail();
+ } else if (dst_pix_fmt == AV_PIX_FMT_GBRP) {
+ for (int p = 0; p < 3; p++)
+ for (int l = 0; l < 2; l++)
+ if (cmp_off_by_n(lines0[p][l], lines1[p][l], width, 3))
+ fail();
} else {
fail();
}
--
2.30.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* [FFmpeg-devel] [PATCH v2 4/5] swscale/x86/yuv2rgb: add ssse3 yuv42{0, 2}p -> gbrp unscaled colorspace converters
2024-08-06 10:51 [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters Ramiro Polla
` (2 preceding siblings ...)
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 3/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters Ramiro Polla
@ 2024-08-06 10:51 ` Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 5/5] swscale/aarch64/yuv2rgb: add neon " Ramiro Polla
2024-08-15 14:32 ` [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add " Ramiro Polla
5 siblings, 0 replies; 8+ messages in thread
From: Ramiro Polla @ 2024-08-06 10:51 UTC (permalink / raw)
To: ffmpeg-devel
Note: this implementation is limited to x86_64 due to general purpose
register pressure.
checkasm --bench on an Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
yuv420p_gbrp_8_c: 118.5
yuv420p_gbrp_8_ssse3: 93.3
yuv420p_gbrp_128_c: 1068.3
yuv420p_gbrp_128_ssse3: 319.3
yuv420p_gbrp_1080_c: 8841.8
yuv420p_gbrp_1080_ssse3: 2211.8
yuv420p_gbrp_1920_c: 15903.8
yuv420p_gbrp_1920_ssse3: 3814.3
yuv422p_gbrp_8_c: 144.8
yuv422p_gbrp_8_ssse3: 93.8
yuv422p_gbrp_128_c: 1395.8
yuv422p_gbrp_128_ssse3: 313.0
yuv422p_gbrp_1080_c: 11551.5
yuv422p_gbrp_1080_ssse3: 2240.8
yuv422p_gbrp_1920_c: 20585.3
yuv422p_gbrp_1920_ssse3: 5249.5
yuva420p_gbrp_8_c: 117.5
yuva420p_gbrp_8_ssse3: 92.0
yuva420p_gbrp_128_c: 1593.0
yuva420p_gbrp_128_ssse3: 319.3
yuva420p_gbrp_1080_c: 8694.5
yuva420p_gbrp_1080_ssse3: 2186.0
yuva420p_gbrp_1920_c: 15946.5
yuva420p_gbrp_1920_ssse3: 3805.3
---
libswscale/x86/yuv2rgb.c | 39 ++++++++++++++++++++++++++++++++++++
libswscale/x86/yuv_2_rgb.asm | 24 +++++++++++++++++++++-
2 files changed, 62 insertions(+), 1 deletion(-)
diff --git a/libswscale/x86/yuv2rgb.c b/libswscale/x86/yuv2rgb.c
index 68e903c6ad..2a4505fa90 100644
--- a/libswscale/x86/yuv2rgb.c
+++ b/libswscale/x86/yuv2rgb.c
@@ -79,6 +79,12 @@ extern void ff_yuva_420_rgb32_ssse3(x86_reg index, uint8_t *image, const uint8_t
extern void ff_yuva_420_bgr32_ssse3(x86_reg index, uint8_t *image, const uint8_t *pu_index,
const uint8_t *pv_index, const uint64_t *pointer_c_dither,
const uint8_t *py_2index, const uint8_t *pa_2index);
+#if ARCH_X86_64
+extern void ff_yuv_420_gbrp24_ssse3(x86_reg index, uint8_t *image, uint8_t *dst_b, uint8_t *dst_r,
+ const uint8_t *pu_index, const uint8_t *pv_index,
+ const uint64_t *pointer_c_dither,
+ const uint8_t *py_2index);
+#endif
static inline int yuv420_rgb15_ssse3(SwsContext *c, const uint8_t *src[],
int srcStride[],
@@ -201,6 +207,35 @@ static inline int yuv420_bgr24_ssse3(SwsContext *c, const uint8_t *src[],
return srcSliceH;
}
+#if ARCH_X86_64
+static inline int yuv420_gbrp_ssse3(SwsContext *c, const uint8_t *src[],
+ int srcStride[],
+ int srcSliceY, int srcSliceH,
+ uint8_t *dst[], int dstStride[])
+{
+ int y, h_size, vshift;
+
+ h_size = (c->dstW + 7) & ~7;
+ if (h_size * 3 > FFABS(dstStride[0]))
+ h_size -= 8;
+
+ vshift = c->srcFormat != AV_PIX_FMT_YUV422P;
+
+ for (y = 0; y < srcSliceH; y++) {
+ uint8_t *dst_g = dst[0] + (y + srcSliceY) * dstStride[0];
+ uint8_t *dst_b = dst[1] + (y + srcSliceY) * dstStride[1];
+ uint8_t *dst_r = dst[2] + (y + srcSliceY) * dstStride[2];
+ const uint8_t *py = src[0] + y * srcStride[0];
+ const uint8_t *pu = src[1] + (y >> vshift) * srcStride[1];
+ const uint8_t *pv = src[2] + (y >> vshift) * srcStride[2];
+ x86_reg index = -h_size / 2;
+
+ ff_yuv_420_gbrp24_ssse3(index, dst_g, dst_b, dst_r, pu - index, pv - index, &(c->redDither), py - 2 * index);
+ }
+ return srcSliceH;
+}
+#endif
+
#endif /* HAVE_X86ASM */
av_cold SwsFunc ff_yuv2rgb_init_x86(SwsContext *c)
@@ -234,6 +269,10 @@ av_cold SwsFunc ff_yuv2rgb_init_x86(SwsContext *c)
return yuv420_rgb16_ssse3;
case AV_PIX_FMT_RGB555:
return yuv420_rgb15_ssse3;
+#if ARCH_X86_64
+ case AV_PIX_FMT_GBRP:
+ return yuv420_gbrp_ssse3;
+#endif
}
}
diff --git a/libswscale/x86/yuv_2_rgb.asm b/libswscale/x86/yuv_2_rgb.asm
index b67ab162d2..eeb1d25942 100644
--- a/libswscale/x86/yuv_2_rgb.asm
+++ b/libswscale/x86/yuv_2_rgb.asm
@@ -32,6 +32,7 @@ mask_dw25 : db 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0
rgb24_shuf1: db 0, 1, 6, 7, 12, 13, 2, 3, 8, 9, 14, 15, 4, 5, 10, 11
rgb24_shuf2: db 10, 11, 0, 1, 6, 7, 12, 13, 2, 3, 8, 9, 14, 15, 4, 5
rgb24_shuf3: db 4, 5, 10, 11, 0, 1, 6, 7, 12, 13, 2, 3, 8, 9, 14, 15
+gbrp_shuf : db 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15
pw_00ff: times 8 dw 255
pb_f8: times 16 db 248
pb_e0: times 16 db 224
@@ -60,8 +61,13 @@ SECTION .text
%define GPR_num 6
%endif
%else
+ %ifidn %2, gbrp
+ %define parameters index, image, dst_b, dst_r, pu_index, pv_index, pointer_c_dither, py_2index
+ %define GPR_num 8
+ %else
%define parameters index, image, pu_index, pv_index, pointer_c_dither, py_2index
%define GPR_num 6
+ %endif
%endif
%define m_green m2
@@ -172,10 +178,22 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, parameters
paddsw m2, m6 ; G0 G2 G4 G6 ...
%if %3 == 24 ; PACK RGB24
-%define depth 3
packuswb m0, m3 ; B0 B2 B4 B6 ... B1 B3 B5 B7 ...
packuswb m1, m5 ; R0 R2 R4 R6 ... R1 R3 R5 R7 ...
packuswb m2, m7 ; G0 G2 G4 G6 ... G1 G3 G5 G7 ...
+%ifidn %2, gbrp ; PLANAR GBRP
+%define depth 1
+ mova m4, [gbrp_shuf]
+ pshufb m0, m4
+ pshufb m1, m4
+ pshufb m2, m4
+ movu [imageq], m2
+ movu [dst_bq], m0
+ movu [dst_rq], m1
+ add dst_bq, 8 * depth * time_num
+ add dst_rq, 8 * depth * time_num
+%else
+%define depth 3
mova m3, m_red
mova m6, m_blue
psrldq m_red, 8
@@ -206,6 +224,7 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, parameters
movu [imageq], m0
movu [imageq + 16], m1
movu [imageq + 32], m2
+%endif ; PLANAR GBRP
%else ; PACK RGB15/16/32
packuswb m0, m1
packuswb m3, m5
@@ -292,3 +311,6 @@ yuv2rgb_fn yuva, rgb, 32
yuv2rgb_fn yuva, bgr, 32
yuv2rgb_fn yuv, rgb, 15
yuv2rgb_fn yuv, rgb, 16
+%if ARCH_X86_64
+yuv2rgb_fn yuv, gbrp, 24
+%endif
--
2.30.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* [FFmpeg-devel] [PATCH v2 5/5] swscale/aarch64/yuv2rgb: add neon yuv42{0, 2}p -> gbrp unscaled colorspace converters
2024-08-06 10:51 [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters Ramiro Polla
` (3 preceding siblings ...)
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 4/5] swscale/x86/yuv2rgb: add ssse3 " Ramiro Polla
@ 2024-08-06 10:51 ` Ramiro Polla
2024-08-14 12:23 ` Martin Storsjö
2024-08-15 14:32 ` [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add " Ramiro Polla
5 siblings, 1 reply; 8+ messages in thread
From: Ramiro Polla @ 2024-08-06 10:51 UTC (permalink / raw)
To: ffmpeg-devel
checkasm --bench on a Raspberry Pi 5 Model B Rev 1.0:
yuv420p_gbrp_128_c: 1243.0
yuv420p_gbrp_128_neon: 453.5
yuv420p_gbrp_1920_c: 18165.5
yuv420p_gbrp_1920_neon: 6700.0
yuv422p_gbrp_128_c: 1463.5
yuv422p_gbrp_128_neon: 471.5
yuv422p_gbrp_1920_c: 21343.7
yuv422p_gbrp_1920_neon: 6743.5
---
libswscale/aarch64/swscale_unscaled.c | 58 +++++++++++++++++++++
libswscale/aarch64/yuv2rgb_neon.S | 73 ++++++++++++++++++++++-----
2 files changed, 118 insertions(+), 13 deletions(-)
diff --git a/libswscale/aarch64/swscale_unscaled.c b/libswscale/aarch64/swscale_unscaled.c
index b3093bbc9d..5c4f6fee34 100644
--- a/libswscale/aarch64/swscale_unscaled.c
+++ b/libswscale/aarch64/swscale_unscaled.c
@@ -52,11 +52,41 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[],
c->yuv2rgb_y_coeff); \
} \
+#define DECLARE_FF_YUVX_TO_GBRP_FUNCS(ifmt, ofmt) \
+int ff_##ifmt##_to_##ofmt##_neon(int w, int h, \
+ uint8_t *dst, int linesize, \
+ const uint8_t *srcY, int linesizeY, \
+ const uint8_t *srcU, int linesizeU, \
+ const uint8_t *srcV, int linesizeV, \
+ const int16_t *table, \
+ int y_offset, \
+ int y_coeff, \
+ uint8_t *dst1, int linesize1, \
+ uint8_t *dst2, int linesize2); \
+ \
+static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], \
+ int srcStride[], int srcSliceY, int srcSliceH, \
+ uint8_t *dst[], int dstStride[]) { \
+ const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE }; \
+ \
+ return ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH, \
+ dst[0] + srcSliceY * dstStride[0], dstStride[0], \
+ src[0], srcStride[0], \
+ src[1], srcStride[1], \
+ src[2], srcStride[2], \
+ yuv2rgb_table, \
+ c->yuv2rgb_y_offset >> 6, \
+ c->yuv2rgb_y_coeff, \
+ dst[1] + srcSliceY * dstStride[1], dstStride[1], \
+ dst[2] + srcSliceY * dstStride[2], dstStride[2]); \
+} \
+
#define DECLARE_FF_YUVX_TO_ALL_RGBX_FUNCS(yuvx) \
DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, argb) \
DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, rgba) \
DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, abgr) \
DECLARE_FF_YUVX_TO_RGBX_FUNCS(yuvx, bgra) \
+DECLARE_FF_YUVX_TO_GBRP_FUNCS(yuvx, gbrp) \
DECLARE_FF_YUVX_TO_ALL_RGBX_FUNCS(yuv420p)
DECLARE_FF_YUVX_TO_ALL_RGBX_FUNCS(yuv422p)
@@ -83,11 +113,38 @@ static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[],
c->yuv2rgb_y_coeff); \
} \
+#define DECLARE_FF_NVX_TO_GBRP_FUNCS(ifmt, ofmt) \
+int ff_##ifmt##_to_##ofmt##_neon(int w, int h, \
+ uint8_t *dst, int linesize, \
+ const uint8_t *srcY, int linesizeY, \
+ const uint8_t *srcC, int linesizeC, \
+ const int16_t *table, \
+ int y_offset, \
+ int y_coeff, \
+ uint8_t *dst1, int linesize1, \
+ uint8_t *dst2, int linesize2); \
+ \
+static int ifmt##_to_##ofmt##_neon_wrapper(SwsContext *c, const uint8_t *src[], \
+ int srcStride[], int srcSliceY, int srcSliceH, \
+ uint8_t *dst[], int dstStride[]) { \
+ const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE }; \
+ \
+ return ff_##ifmt##_to_##ofmt##_neon(c->srcW, srcSliceH, \
+ dst[0] + srcSliceY * dstStride[0], dstStride[0], \
+ src[0], srcStride[0], src[1], srcStride[1], \
+ yuv2rgb_table, \
+ c->yuv2rgb_y_offset >> 6, \
+ c->yuv2rgb_y_coeff, \
+ dst[1] + srcSliceY * dstStride[1], dstStride[1], \
+ dst[2] + srcSliceY * dstStride[2], dstStride[2]); \
+} \
+
#define DECLARE_FF_NVX_TO_ALL_RGBX_FUNCS(nvx) \
DECLARE_FF_NVX_TO_RGBX_FUNCS(nvx, argb) \
DECLARE_FF_NVX_TO_RGBX_FUNCS(nvx, rgba) \
DECLARE_FF_NVX_TO_RGBX_FUNCS(nvx, abgr) \
DECLARE_FF_NVX_TO_RGBX_FUNCS(nvx, bgra) \
+DECLARE_FF_NVX_TO_GBRP_FUNCS(nvx, gbrp) \
DECLARE_FF_NVX_TO_ALL_RGBX_FUNCS(nv12)
DECLARE_FF_NVX_TO_ALL_RGBX_FUNCS(nv21)
@@ -110,6 +167,7 @@ DECLARE_FF_NVX_TO_ALL_RGBX_FUNCS(nv21)
SET_FF_NVX_TO_RGBX_FUNC(nvx, NVX, rgba, RGBA, accurate_rnd); \
SET_FF_NVX_TO_RGBX_FUNC(nvx, NVX, abgr, ABGR, accurate_rnd); \
SET_FF_NVX_TO_RGBX_FUNC(nvx, NVX, bgra, BGRA, accurate_rnd); \
+ SET_FF_NVX_TO_RGBX_FUNC(nvx, NVX, gbrp, GBRP, accurate_rnd); \
} while (0)
static void get_unscaled_swscale_neon(SwsContext *c) {
diff --git a/libswscale/aarch64/yuv2rgb_neon.S b/libswscale/aarch64/yuv2rgb_neon.S
index 89d69e7f6c..b89eb2c781 100644
--- a/libswscale/aarch64/yuv2rgb_neon.S
+++ b/libswscale/aarch64/yuv2rgb_neon.S
@@ -30,23 +30,43 @@
#endif
.endm
-.macro load_args_nv12
+.macro load_dst1_dst2 dst1 linesize1 dst2 linesize2
+#if defined(__APPLE__)
+#define DST_OFFSET 8
+#else
+#define DST_OFFSET 0
+#endif
+ ldr x10, [sp, #\dst1 - DST_OFFSET]
+ ldr w12, [sp, #\linesize1 - DST_OFFSET]
+ ldr x15, [sp, #\dst2 - DST_OFFSET]
+ ldr w16, [sp, #\linesize2 - DST_OFFSET]
+#undef DST_OFFSET
+ sub w12, w12, w0 // w12 = linesize1 - width (padding1)
+ sub w16, w16, w0 // w16 = linesize2 - width (padding2)
+.endm
+
+.macro load_args_nv12 ofmt
ldr x8, [sp] // table
load_yoff_ycoeff 8, 16 // y_offset, y_coeff
ld1 {v1.1d}, [x8]
dup v0.8h, w10
dup v3.8h, w9
+.ifc \ofmt,gbrp
+ load_dst1_dst2 24, 32, 40, 48
+ sub w3, w3, w0 // w3 = linesize - width (padding)
+.else
sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding)
+.endif
sub w5, w5, w0 // w5 = linesizeY - width (paddingY)
sub w7, w7, w0 // w7 = linesizeC - width (paddingC)
neg w11, w0
.endm
-.macro load_args_nv21
- load_args_nv12
+.macro load_args_nv21 ofmt
+ load_args_nv12 \ofmt
.endm
-.macro load_args_yuv420p
+.macro load_args_yuv420p ofmt
ldr x13, [sp] // srcV
ldr w14, [sp, #8] // linesizeV
ldr x8, [sp, #16] // table
@@ -54,7 +74,12 @@
ld1 {v1.1d}, [x8]
dup v0.8h, w10
dup v3.8h, w9
+.ifc \ofmt,gbrp
+ load_dst1_dst2 40, 48, 56, 64
+ sub w3, w3, w0 // w3 = linesize - width (padding)
+.else
sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding)
+.endif
sub w5, w5, w0 // w5 = linesizeY - width (paddingY)
sub w7, w7, w0, lsr #1 // w7 = linesizeU - width / 2 (paddingU)
sub w14, w14, w0, lsr #1 // w14 = linesizeV - width / 2 (paddingV)
@@ -62,7 +87,7 @@
neg w11, w11
.endm
-.macro load_args_yuv422p
+.macro load_args_yuv422p ofmt
ldr x13, [sp] // srcV
ldr w14, [sp, #8] // linesizeV
ldr x8, [sp, #16] // table
@@ -70,7 +95,12 @@
ld1 {v1.1d}, [x8]
dup v0.8h, w10
dup v3.8h, w9
+.ifc \ofmt,gbrp
+ load_dst1_dst2 40, 48, 56, 64
+ sub w3, w3, w0 // w3 = linesize - width (padding)
+.else
sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding)
+.endif
sub w5, w5, w0 // w5 = linesizeY - width (paddingY)
sub w7, w7, w0, lsr #1 // w7 = linesizeU - width / 2 (paddingU)
sub w14, w14, w0, lsr #1 // w14 = linesizeV - width / 2 (paddingV)
@@ -100,9 +130,9 @@
.endm
.macro increment_nv12
- ands w15, w1, #1
- csel w16, w7, w11, ne // incC = (h & 1) ? paddincC : -width
- add x6, x6, w16, sxtw // srcC += incC
+ ands w17, w1, #1
+ csel w17, w7, w11, ne // incC = (h & 1) ? paddincC : -width
+ add x6, x6, w17, sxtw // srcC += incC
.endm
.macro increment_nv21
@@ -110,10 +140,10 @@
.endm
.macro increment_yuv420p
- ands w15, w1, #1
- csel w16, w7, w11, ne // incU = (h & 1) ? paddincU : -width/2
+ ands w17, w1, #1
+ csel w17, w7, w11, ne // incU = (h & 1) ? paddincU : -width/2
+ add x6, x6, w17, sxtw // srcU += incU
csel w17, w14, w11, ne // incV = (h & 1) ? paddincV : -width/2
- add x6, x6, w16, sxtw // srcU += incU
add x13, x13, w17, sxtw // srcV += incV
.endm
@@ -122,7 +152,7 @@
add x13, x13, w14, sxtw // srcV += incV
.endm
-.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2
+.macro compute_rgb r1 g1 b1 r2 g2 b2
add v20.8h, v26.8h, v20.8h // Y1 + R1
add v21.8h, v27.8h, v21.8h // Y2 + R2
add v22.8h, v26.8h, v22.8h // Y1 + G1
@@ -135,13 +165,18 @@
sqrshrun \g2, v23.8h, #1 // clip_u8((Y2 + G1) >> 1)
sqrshrun \b1, v24.8h, #1 // clip_u8((Y1 + B1) >> 1)
sqrshrun \b2, v25.8h, #1 // clip_u8((Y2 + B1) >> 1)
+.endm
+
+.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2
+ compute_rgb \r1, \g1, \b1, \r2, \g2, \b2
movi \a1, #255
movi \a2, #255
.endm
.macro declare_func ifmt ofmt
function ff_\ifmt\()_to_\ofmt\()_neon, export=1
- load_args_\ifmt
+ load_args_\ifmt \ofmt
+
mov w9, w1
1:
mov w8, w0 // w8 = width
@@ -185,11 +220,22 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1
compute_rgba v6.8b,v5.8b,v4.8b,v7.8b, v18.8b,v17.8b,v16.8b,v19.8b
.endif
+.ifc \ofmt,gbrp
+ compute_rgb v18.8b,v4.8b,v6.8b, v19.8b,v5.8b,v7.8b
+ st1 { v4.8b, v5.8b }, [x2], #16
+ st1 { v6.8b, v7.8b }, [x10], #16
+ st1 { v18.8b, v19.8b }, [x15], #16
+.else
st4 { v4.8b, v5.8b, v6.8b, v7.8b}, [x2], #32
st4 {v16.8b,v17.8b,v18.8b,v19.8b}, [x2], #32
+.endif
subs w8, w8, #16 // width -= 16
b.gt 2b
add x2, x2, w3, sxtw // dst += padding
+.ifc \ofmt,gbrp
+ add x10, x10, w12, sxtw // dst1 += padding1
+ add x15, x15, w16, sxtw // dst2 += padding2
+.endif
add x4, x4, w5, sxtw // srcY += paddingY
increment_\ifmt
subs w1, w1, #1 // height -= 1
@@ -204,6 +250,7 @@ endfunc
declare_func \ifmt, rgba
declare_func \ifmt, abgr
declare_func \ifmt, bgra
+ declare_func \ifmt, gbrp
.endm
declare_rgb_funcs nv12
--
2.30.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] [PATCH v2 5/5] swscale/aarch64/yuv2rgb: add neon yuv42{0, 2}p -> gbrp unscaled colorspace converters
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 5/5] swscale/aarch64/yuv2rgb: add neon " Ramiro Polla
@ 2024-08-14 12:23 ` Martin Storsjö
0 siblings, 0 replies; 8+ messages in thread
From: Martin Storsjö @ 2024-08-14 12:23 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Tue, 6 Aug 2024, Ramiro Polla wrote:
> checkasm --bench on a Raspberry Pi 5 Model B Rev 1.0:
> yuv420p_gbrp_128_c: 1243.0
> yuv420p_gbrp_128_neon: 453.5
> yuv420p_gbrp_1920_c: 18165.5
> yuv420p_gbrp_1920_neon: 6700.0
> yuv422p_gbrp_128_c: 1463.5
> yuv422p_gbrp_128_neon: 471.5
> yuv422p_gbrp_1920_c: 21343.7
> yuv422p_gbrp_1920_neon: 6743.5
> ---
> libswscale/aarch64/swscale_unscaled.c | 58 +++++++++++++++++++++
> libswscale/aarch64/yuv2rgb_neon.S | 73 ++++++++++++++++++++++-----
> 2 files changed, 118 insertions(+), 13 deletions(-)
This looks reasonable to me, thanks!
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters
2024-08-06 10:51 [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters Ramiro Polla
` (4 preceding siblings ...)
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 5/5] swscale/aarch64/yuv2rgb: add neon " Ramiro Polla
@ 2024-08-15 14:32 ` Ramiro Polla
5 siblings, 0 replies; 8+ messages in thread
From: Ramiro Polla @ 2024-08-15 14:32 UTC (permalink / raw)
To: ffmpeg-devel
On Tue, Aug 6, 2024 at 12:51 PM Ramiro Polla <ramiro.polla@gmail.com> wrote:
>
> Changes from v1:
> - multi-planar rgb for YUV2RGBFUNC no longer uses an array in the stack,
> since that gave an overall 1% slowdown because some variables would no
> longer be stored in registers.
>
> Ramiro Polla (5):
> swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar
> rgb
> swscale/yuv2rgb: prepare YUV2RGBFUNC macro for multi-planar rgb
> swscale/yuv2rgb: add yuv42{0,2}p -> gbrp unscaled colorspace
> converters
> swscale/x86/yuv2rgb: add ssse3 yuv42{0,2}p -> gbrp unscaled colorspace
> converters
> swscale/aarch64/yuv2rgb: add neon yuv42{0,2}p -> gbrp unscaled
> colorspace converters
>
> libswscale/aarch64/swscale_unscaled.c | 58 +++
> libswscale/aarch64/yuv2rgb_neon.S | 73 +++-
> libswscale/x86/yuv2rgb.c | 39 ++
> libswscale/x86/yuv_2_rgb.asm | 24 +-
> libswscale/yuv2rgb.c | 513 ++++++++++++++------------
> tests/checkasm/sw_yuv2rgb.c | 60 ++-
> 6 files changed, 495 insertions(+), 272 deletions(-)
>
> --
> 2.30.2
>
I'll apply this patchset in a few days if there are no further comments.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-08-15 14:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-06 10:51 [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 1/5] swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar rgb Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 2/5] swscale/yuv2rgb: prepare YUV2RGBFUNC macro " Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 3/5] swscale/yuv2rgb: add yuv42{0, 2}p -> gbrp unscaled colorspace converters Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 4/5] swscale/x86/yuv2rgb: add ssse3 " Ramiro Polla
2024-08-06 10:51 ` [FFmpeg-devel] [PATCH v2 5/5] swscale/aarch64/yuv2rgb: add neon " Ramiro Polla
2024-08-14 12:23 ` Martin Storsjö
2024-08-15 14:32 ` [FFmpeg-devel] [PATCH v2 0/5] swscale/yuv2rgb: add " Ramiro Polla
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git