* [FFmpeg-devel] [PATCH v1 0/6] swscale: Add dedicated RGB->YUV unscaled functions & aarch64 asm
@ 2023-08-20 15:10 John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 1/6] fate-filter-fps: Set swscale bitexact for tests that do conversions John Cox
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: John Cox @ 2023-08-20 15:10 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: John Cox
This patch set expands the set of dedicated RGB->YUV unscaled functions
to help with encoding camera output on a Pi. Obviously there are other
uses but that was the motivation.
It enforces the general bitexact path for the fate tests that depend on
it.
It renames the existing bgr function as bgr... so we don't end up with
the counterintuative situation where BGR is handled by rgb... and BGR
would be handled by rgb..
Adds RGB functions
Improves the rounding in the dedicated function as that improves its
score when tested with test/swscale and fixes it to allow any width
(contrary to the comment any height was already allowed).
Adds XRGB->YUV functions to complete the set
Adds Aarch64 neon for BGR24 & RGB24
I haven't built fate tests for this as I'm not quite sure what the
appropriate tests would be. The x86 asm doesn't match either the C
template with improved rounding or the previous template (I'm not quite
sure what it does but it produces a different score out of tests/swscale
to either method) so a simple results match isn't going to work.
Regards
John Cox
John Cox (6):
fate-filter-fps: Set swscale bitexact for tests that do conversions
swscale: Rename BGR24->YUV conversion functions as bgr...
swscale: Add explicit rgb24->yv12 conversion
swscale: RGB24->YUV allow odd widths & improve C rounding
swscale: Add unscaled XRGB->YUV420P functions
swscale: Add aarch64 functions for RGB24->YUV420P
libswscale/aarch64/rgb2rgb.c | 8 +
libswscale/aarch64/rgb2rgb_neon.S | 356 ++++++++++++++++++++++++++++++
libswscale/bayer_template.c | 2 +-
libswscale/rgb2rgb.c | 25 +++
libswscale/rgb2rgb.h | 23 ++
libswscale/rgb2rgb_template.c | 174 +++++++++++++--
libswscale/swscale_unscaled.c | 114 +++++++++-
libswscale/x86/rgb2rgb_template.c | 13 +-
tests/fate/filter-video.mak | 4 +-
9 files changed, 694 insertions(+), 25 deletions(-)
--
2.39.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* [FFmpeg-devel] [PATCH v1 1/6] fate-filter-fps: Set swscale bitexact for tests that do conversions
2023-08-20 15:10 [FFmpeg-devel] [PATCH v1 0/6] swscale: Add dedicated RGB->YUV unscaled functions & aarch64 asm John Cox
@ 2023-08-20 15:10 ` John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 2/6] swscale: Rename BGR24->YUV conversion functions as bgr John Cox
` (4 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: John Cox @ 2023-08-20 15:10 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: John Cox
-bitexact as a general flag doesn't affect swscale so add swscale option
too to get correct CRCs in all circumstances.
Signed-off-by: John Cox <jc@kynesim.co.uk>
---
tests/fate/filter-video.mak | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/fate/filter-video.mak b/tests/fate/filter-video.mak
index 789ec6414c..811a96d124 100644
--- a/tests/fate/filter-video.mak
+++ b/tests/fate/filter-video.mak
@@ -391,8 +391,8 @@ fate-filter-fps-start-drop: CMD = framecrc -lavfi testsrc2=r=7:d=3.5,fps=3:start
fate-filter-fps-start-fill: CMD = framecrc -lavfi testsrc2=r=7:d=1.5,setpts=PTS+14,fps=3:start_time=1.5
FATE_FILTER_SAMPLES-$(call FILTERDEMDEC, FPS SCALE, MOV, QTRLE) += fate-filter-fps-cfr fate-filter-fps
-fate-filter-fps-cfr: CMD = framecrc -auto_conversion_filters -i $(TARGET_SAMPLES)/qtrle/apple-animation-variable-fps-bug.mov -r 30 -vsync cfr -pix_fmt yuv420p
-fate-filter-fps: CMD = framecrc -auto_conversion_filters -i $(TARGET_SAMPLES)/qtrle/apple-animation-variable-fps-bug.mov -vf fps=30 -pix_fmt yuv420p
+fate-filter-fps-cfr: CMD = framecrc -auto_conversion_filters -i $(TARGET_SAMPLES)/qtrle/apple-animation-variable-fps-bug.mov -r 30 -vsync cfr -vf scale=sws_flags=bitexact -pix_fmt yuv420p
+fate-filter-fps: CMD = framecrc -auto_conversion_filters -i $(TARGET_SAMPLES)/qtrle/apple-animation-variable-fps-bug.mov -vf fps=30,scale=sws_flags=bitexact -pix_fmt yuv420p
FATE_FILTER_ALPHAEXTRACT_ALPHAMERGE := $(addprefix fate-filter-alphaextract_alphamerge_, rgb yuv)
FATE_FILTER_VSYNTH_PGMYUV-$(call ALLYES, SCALE_FILTER FORMAT_FILTER SPLIT_FILTER ALPHAEXTRACT_FILTER ALPHAMERGE_FILTER) += $(FATE_FILTER_ALPHAEXTRACT_ALPHAMERGE)
--
2.39.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* [FFmpeg-devel] [PATCH v1 2/6] swscale: Rename BGR24->YUV conversion functions as bgr...
2023-08-20 15:10 [FFmpeg-devel] [PATCH v1 0/6] swscale: Add dedicated RGB->YUV unscaled functions & aarch64 asm John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 1/6] fate-filter-fps: Set swscale bitexact for tests that do conversions John Cox
@ 2023-08-20 15:10 ` John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion John Cox
` (3 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: John Cox @ 2023-08-20 15:10 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: John Cox
Rename swscale conversion functions for converting BGR24 frames to YUV
as bgr24toyuv12 rather than rgb24toyuv12 as that is just confusing and
would be even more confusing with the addition of RGB24 converters.
Signed-off-by: John Cox <jc@kynesim.co.uk>
---
libswscale/bayer_template.c | 2 +-
libswscale/rgb2rgb.c | 2 +-
libswscale/rgb2rgb.h | 4 ++--
libswscale/rgb2rgb_template.c | 4 ++--
libswscale/swscale_unscaled.c | 2 +-
libswscale/x86/rgb2rgb_template.c | 8 ++++----
6 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/libswscale/bayer_template.c b/libswscale/bayer_template.c
index 46b5a4984d..06d917c97f 100644
--- a/libswscale/bayer_template.c
+++ b/libswscale/bayer_template.c
@@ -188,7 +188,7 @@
* invoke ff_rgb24toyv12 for 2x2 pixels
*/
#define rgb24toyv12_2x2(src, dstY, dstU, dstV, luma_stride, src_stride, rgb2yuv) \
- ff_rgb24toyv12(src, dstY, dstV, dstU, 2, 2, luma_stride, 0, src_stride, rgb2yuv)
+ ff_bgr24toyv12(src, dstY, dstV, dstU, 2, 2, luma_stride, 0, src_stride, rgb2yuv)
static void BAYER_RENAME(rgb24_copy)(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int width)
{
diff --git a/libswscale/rgb2rgb.c b/libswscale/rgb2rgb.c
index e98fdac8ea..8707917800 100644
--- a/libswscale/rgb2rgb.c
+++ b/libswscale/rgb2rgb.c
@@ -78,7 +78,7 @@ void (*yuy2toyv12)(const uint8_t *src, uint8_t *ydst,
uint8_t *udst, uint8_t *vdst,
int width, int height,
int lumStride, int chromStride, int srcStride);
-void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst,
+void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst,
uint8_t *udst, uint8_t *vdst,
int width, int height,
int lumStride, int chromStride, int srcStride,
diff --git a/libswscale/rgb2rgb.h b/libswscale/rgb2rgb.h
index f3951d523e..305b830920 100644
--- a/libswscale/rgb2rgb.h
+++ b/libswscale/rgb2rgb.h
@@ -76,7 +76,7 @@ void rgb15tobgr15(const uint8_t *src, uint8_t *dst, int src_size);
void rgb12tobgr12(const uint8_t *src, uint8_t *dst, int src_size);
void rgb12to15(const uint8_t *src, uint8_t *dst, int src_size);
-void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
uint8_t *vdst, int width, int height, int lumStride,
int chromStride, int srcStride, int32_t *rgb2yuv);
@@ -124,7 +124,7 @@ extern void (*yuv422ptouyvy)(const uint8_t *ysrc, const uint8_t *usrc, const uin
* Chrominance data is only taken from every second line, others are ignored.
* FIXME: Write high quality version.
*/
-extern void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
+extern void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
int width, int height,
int lumStride, int chromStride, int srcStride,
int32_t *rgb2yuv);
diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c
index 42c69801ba..8ef4a2cf5d 100644
--- a/libswscale/rgb2rgb_template.c
+++ b/libswscale/rgb2rgb_template.c
@@ -646,7 +646,7 @@ static inline void uyvytoyv12_c(const uint8_t *src, uint8_t *ydst,
* others are ignored in the C version.
* FIXME: Write HQ version.
*/
-void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
uint8_t *vdst, int width, int height, int lumStride,
int chromStride, int srcStride, int32_t *rgb2yuv)
{
@@ -979,7 +979,7 @@ static av_cold void rgb2rgb_init_c(void)
yuv422ptouyvy = yuv422ptouyvy_c;
yuy2toyv12 = yuy2toyv12_c;
planar2x = planar2x_c;
- ff_rgb24toyv12 = ff_rgb24toyv12_c;
+ ff_bgr24toyv12 = ff_bgr24toyv12_c;
interleaveBytes = interleaveBytes_c;
deinterleaveBytes = deinterleaveBytes_c;
vu9_to_vu12 = vu9_to_vu12_c;
diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
index 9af2e7ecc3..32e0d7f63c 100644
--- a/libswscale/swscale_unscaled.c
+++ b/libswscale/swscale_unscaled.c
@@ -1641,7 +1641,7 @@ static int bgr24ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
int srcStride[], int srcSliceY, int srcSliceH,
uint8_t *dst[], int dstStride[])
{
- ff_rgb24toyv12(
+ ff_bgr24toyv12(
src[0],
dst[0] + srcSliceY * dstStride[0],
dst[1] + (srcSliceY >> 1) * dstStride[1],
diff --git a/libswscale/x86/rgb2rgb_template.c b/libswscale/x86/rgb2rgb_template.c
index 4aba25dd51..dc2b4e205a 100644
--- a/libswscale/x86/rgb2rgb_template.c
+++ b/libswscale/x86/rgb2rgb_template.c
@@ -1544,7 +1544,7 @@ static inline void RENAME(uyvytoyv12)(const uint8_t *src, uint8_t *ydst, uint8_t
* FIXME: Write HQ version.
*/
#if HAVE_7REGS
-static inline void RENAME(rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
+static inline void RENAME(bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
int width, int height,
int lumStride, int chromStride, int srcStride,
int32_t *rgb2yuv)
@@ -1556,7 +1556,7 @@ static inline void RENAME(rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_
const x86_reg chromWidth= width>>1;
if (height > 2) {
- ff_rgb24toyv12_c(src, ydst, udst, vdst, width, 2, lumStride, chromStride, srcStride, rgb2yuv);
+ ff_bgr24toyv12_c(src, ydst, udst, vdst, width, 2, lumStride, chromStride, srcStride, rgb2yuv);
src += 2*srcStride;
ydst += 2*lumStride;
udst += chromStride;
@@ -1737,7 +1737,7 @@ static inline void RENAME(rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_
SFENCE" \n\t"
:::"memory");
- ff_rgb24toyv12_c(src, ydst, udst, vdst, width, height-y, lumStride, chromStride, srcStride, rgb2yuv);
+ ff_bgr24toyv12_c(src, ydst, udst, vdst, width, height-y, lumStride, chromStride, srcStride, rgb2yuv);
}
#endif /* HAVE_7REGS */
#endif /* !COMPILE_TEMPLATE_SSE2 */
@@ -2434,7 +2434,7 @@ static av_cold void RENAME(rgb2rgb_init)(void)
planar2x = RENAME(planar2x);
#if HAVE_7REGS
- ff_rgb24toyv12 = RENAME(rgb24toyv12);
+ ff_bgr24toyv12 = RENAME(bgr24toyv12);
#endif /* HAVE_7REGS */
yuyvtoyuv420 = RENAME(yuyvtoyuv420);
--
2.39.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion
2023-08-20 15:10 [FFmpeg-devel] [PATCH v1 0/6] swscale: Add dedicated RGB->YUV unscaled functions & aarch64 asm John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 1/6] fate-filter-fps: Set swscale bitexact for tests that do conversions John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 2/6] swscale: Rename BGR24->YUV conversion functions as bgr John Cox
@ 2023-08-20 15:10 ` John Cox
2023-08-20 17:16 ` Michael Niedermayer
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 4/6] swscale: RGB24->YUV allow odd widths & improve C rounding John Cox
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: John Cox @ 2023-08-20 15:10 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: John Cox
Add a rgb24->yuv420p conversion. Uses the same code as the existing
bgr24->yuv converter but permutes the conversion array to swap R & B
coefficients.
Signed-off-by: John Cox <jc@kynesim.co.uk>
---
libswscale/rgb2rgb.c | 5 +++++
libswscale/rgb2rgb.h | 7 +++++++
libswscale/rgb2rgb_template.c | 38 ++++++++++++++++++++++++++++++-----
libswscale/swscale_unscaled.c | 24 +++++++++++++++++++++-
4 files changed, 68 insertions(+), 6 deletions(-)
diff --git a/libswscale/rgb2rgb.c b/libswscale/rgb2rgb.c
index 8707917800..de90e5193f 100644
--- a/libswscale/rgb2rgb.c
+++ b/libswscale/rgb2rgb.c
@@ -83,6 +83,11 @@ void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst,
int width, int height,
int lumStride, int chromStride, int srcStride,
int32_t *rgb2yuv);
+void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst,
+ uint8_t *udst, uint8_t *vdst,
+ int width, int height,
+ int lumStride, int chromStride, int srcStride,
+ int32_t *rgb2yuv);
void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
int srcStride, int dstStride);
void (*interleaveBytes)(const uint8_t *src1, const uint8_t *src2, uint8_t *dst,
diff --git a/libswscale/rgb2rgb.h b/libswscale/rgb2rgb.h
index 305b830920..f7a76a92ba 100644
--- a/libswscale/rgb2rgb.h
+++ b/libswscale/rgb2rgb.h
@@ -79,6 +79,9 @@ void rgb12to15(const uint8_t *src, uint8_t *dst, int src_size);
void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
uint8_t *vdst, int width, int height, int lumStride,
int chromStride, int srcStride, int32_t *rgb2yuv);
+void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv);
/**
* Height should be a multiple of 2 and width should be a multiple of 16.
@@ -128,6 +131,10 @@ extern void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
int width, int height,
int lumStride, int chromStride, int srcStride,
int32_t *rgb2yuv);
+extern void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
+ int width, int height,
+ int lumStride, int chromStride, int srcStride,
+ int32_t *rgb2yuv);
extern void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
int srcStride, int dstStride);
diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c
index 8ef4a2cf5d..e57bfa6545 100644
--- a/libswscale/rgb2rgb_template.c
+++ b/libswscale/rgb2rgb_template.c
@@ -646,13 +646,14 @@ static inline void uyvytoyv12_c(const uint8_t *src, uint8_t *ydst,
* others are ignored in the C version.
* FIXME: Write HQ version.
*/
-void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
uint8_t *vdst, int width, int height, int lumStride,
- int chromStride, int srcStride, int32_t *rgb2yuv)
+ int chromStride, int srcStride, int32_t *rgb2yuv,
+ const uint8_t x[9])
{
- int32_t ry = rgb2yuv[RY_IDX], gy = rgb2yuv[GY_IDX], by = rgb2yuv[BY_IDX];
- int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu = rgb2yuv[BU_IDX];
- int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv = rgb2yuv[BV_IDX];
+ int32_t ry = rgb2yuv[x[0]], gy = rgb2yuv[x[1]], by = rgb2yuv[x[2]];
+ int32_t ru = rgb2yuv[x[3]], gu = rgb2yuv[x[4]], bu = rgb2yuv[x[5]];
+ int32_t rv = rgb2yuv[x[6]], gv = rgb2yuv[x[7]], bv = rgb2yuv[x[8]];
int y;
const int chromWidth = width >> 1;
@@ -707,6 +708,32 @@ void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
}
}
+static const uint8_t x_bgr[9] = {
+ RY_IDX, GY_IDX, BY_IDX,
+ RU_IDX, GU_IDX, BU_IDX,
+ RV_IDX, GV_IDX, BV_IDX,
+};
+
+static const uint8_t x_rgb[9] = {
+ BY_IDX, GY_IDX, RY_IDX,
+ BU_IDX, GU_IDX, RU_IDX,
+ BV_IDX, GV_IDX, RV_IDX,
+};
+
+void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv)
+{
+ rgb24toyv12_x(src, ydst, udst, vdst, width, height, lumStride, chromStride, srcStride, rgb2yuv, x_bgr);
+}
+
+void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv)
+{
+ rgb24toyv12_x(src, ydst, udst, vdst, width, height, lumStride, chromStride, srcStride, rgb2yuv, x_rgb);
+}
+
static void interleaveBytes_c(const uint8_t *src1, const uint8_t *src2,
uint8_t *dest, int width, int height,
int src1Stride, int src2Stride, int dstStride)
@@ -979,6 +1006,7 @@ static av_cold void rgb2rgb_init_c(void)
yuv422ptouyvy = yuv422ptouyvy_c;
yuy2toyv12 = yuy2toyv12_c;
planar2x = planar2x_c;
+ ff_rgb24toyv12 = ff_rgb24toyv12_c;
ff_bgr24toyv12 = ff_bgr24toyv12_c;
interleaveBytes = interleaveBytes_c;
deinterleaveBytes = deinterleaveBytes_c;
diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
index 32e0d7f63c..751bdcb2e4 100644
--- a/libswscale/swscale_unscaled.c
+++ b/libswscale/swscale_unscaled.c
@@ -1654,6 +1654,23 @@ static int bgr24ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
return srcSliceH;
}
+static int rgb24ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
+ int srcStride[], int srcSliceY, int srcSliceH,
+ uint8_t *dst[], int dstStride[])
+{
+ ff_rgb24toyv12(
+ src[0],
+ dst[0] + srcSliceY * dstStride[0],
+ dst[1] + (srcSliceY >> 1) * dstStride[1],
+ dst[2] + (srcSliceY >> 1) * dstStride[2],
+ c->srcW, srcSliceH,
+ dstStride[0], dstStride[1], srcStride[0],
+ c->input_rgb2yuv_table);
+ if (dst[3])
+ fillPlane(dst[3], dstStride[3], c->srcW, srcSliceH, srcSliceY, 255);
+ return srcSliceH;
+}
+
static int yvu9ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
int srcStride[], int srcSliceY, int srcSliceH,
uint8_t *dst[], int dstStride[])
@@ -2035,8 +2052,13 @@ void ff_get_unscaled_swscale(SwsContext *c)
/* bgr24toYV12 */
if (srcFormat == AV_PIX_FMT_BGR24 &&
(dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P) &&
- !(flags & SWS_ACCURATE_RND) && !(dstW&1))
+ !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)) && !(dstW&1))
c->convert_unscaled = bgr24ToYv12Wrapper;
+ /* rgb24toYV12 */
+ if (srcFormat == AV_PIX_FMT_RGB24 &&
+ (dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P) &&
+ !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)) && !(dstW&1))
+ c->convert_unscaled = rgb24ToYv12Wrapper;
/* RGB/BGR -> RGB/BGR (no dither needed forms) */
if (isAnyRGB(srcFormat) && isAnyRGB(dstFormat) && findRgbConvFn(c)
--
2.39.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* [FFmpeg-devel] [PATCH v1 4/6] swscale: RGB24->YUV allow odd widths & improve C rounding
2023-08-20 15:10 [FFmpeg-devel] [PATCH v1 0/6] swscale: Add dedicated RGB->YUV unscaled functions & aarch64 asm John Cox
` (2 preceding siblings ...)
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion John Cox
@ 2023-08-20 15:10 ` John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 5/6] swscale: Add unscaled XRGB->YUV420P functions John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 6/6] swscale: Add aarch64 functions for RGB24->YUV420P John Cox
5 siblings, 0 replies; 14+ messages in thread
From: John Cox @ 2023-08-20 15:10 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: John Cox
Allow odd widths for conversion it costs very little and simplifies
setup slightly. x86 asm will fall back to the C code if width is odd.
Round to nearest rather than just down. This reduces the Y error
reported by tests/swscale from 3 to 1. x86 asm doesn't mirror the C so
exact correspondence isn't an issue there.
Signed-off-by: John Cox <jc@kynesim.co.uk>
---
libswscale/rgb2rgb_template.c | 42 ++++++++++++++++++-------------
libswscale/swscale_unscaled.c | 5 ++--
libswscale/x86/rgb2rgb_template.c | 5 ++++
3 files changed, 32 insertions(+), 20 deletions(-)
diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c
index e57bfa6545..5503e58a29 100644
--- a/libswscale/rgb2rgb_template.c
+++ b/libswscale/rgb2rgb_template.c
@@ -656,6 +656,8 @@ static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
int32_t rv = rgb2yuv[x[6]], gv = rgb2yuv[x[7]], bv = rgb2yuv[x[8]];
int y;
const int chromWidth = width >> 1;
+ const int32_t ky = ((16 << 1) + 1) << (RGB2YUV_SHIFT - 1);
+ const int32_t kc = ((128 << 1) + 1) << (RGB2YUV_SHIFT - 1);
for (y = 0; y < height; y += 2) {
int i;
@@ -664,9 +666,9 @@ static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
unsigned int g = src[6 * i + 1];
unsigned int r = src[6 * i + 2];
- unsigned int Y = ((ry * r + gy * g + by * b) >> RGB2YUV_SHIFT) + 16;
- unsigned int V = ((rv * r + gv * g + bv * b) >> RGB2YUV_SHIFT) + 128;
- unsigned int U = ((ru * r + gu * g + bu * b) >> RGB2YUV_SHIFT) + 128;
+ unsigned int Y = (ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT;
+ unsigned int V = (rv * r + gv * g + bv * b + kc) >> RGB2YUV_SHIFT;
+ unsigned int U = (ru * r + gu * g + bu * b + kc) >> RGB2YUV_SHIFT;
udst[i] = U;
vdst[i] = V;
@@ -676,30 +678,36 @@ static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
g = src[6 * i + 4];
r = src[6 * i + 5];
- Y = ((ry * r + gy * g + by * b) >> RGB2YUV_SHIFT) + 16;
+ Y = ((ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT);
ydst[2 * i + 1] = Y;
}
- ydst += lumStride;
- src += srcStride;
-
- if (y+1 == height)
- break;
-
- for (i = 0; i < chromWidth; i++) {
+ if ((width & 1) != 0) {
unsigned int b = src[6 * i + 0];
unsigned int g = src[6 * i + 1];
unsigned int r = src[6 * i + 2];
- unsigned int Y = ((ry * r + gy * g + by * b) >> RGB2YUV_SHIFT) + 16;
+ unsigned int Y = (ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT;
+ unsigned int V = (rv * r + gv * g + bv * b + kc) >> RGB2YUV_SHIFT;
+ unsigned int U = (ru * r + gu * g + bu * b + kc) >> RGB2YUV_SHIFT;
+ udst[i] = U;
+ vdst[i] = V;
ydst[2 * i] = Y;
+ }
+ ydst += lumStride;
+ src += srcStride;
- b = src[6 * i + 3];
- g = src[6 * i + 4];
- r = src[6 * i + 5];
+ if (y+1 == height)
+ break;
- Y = ((ry * r + gy * g + by * b) >> RGB2YUV_SHIFT) + 16;
- ydst[2 * i + 1] = Y;
+ for (i = 0; i < width; i++) {
+ unsigned int b = src[3 * i + 0];
+ unsigned int g = src[3 * i + 1];
+ unsigned int r = src[3 * i + 2];
+
+ unsigned int Y = (ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT;
+
+ ydst[i] = Y;
}
udst += chromStride;
vdst += chromStride;
diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
index 751bdcb2e4..e10f967755 100644
--- a/libswscale/swscale_unscaled.c
+++ b/libswscale/swscale_unscaled.c
@@ -1994,7 +1994,6 @@ void ff_get_unscaled_swscale(SwsContext *c)
const enum AVPixelFormat dstFormat = c->dstFormat;
const int flags = c->flags;
const int dstH = c->dstH;
- const int dstW = c->dstW;
int needsDither;
needsDither = isAnyRGB(dstFormat) &&
@@ -2052,12 +2051,12 @@ void ff_get_unscaled_swscale(SwsContext *c)
/* bgr24toYV12 */
if (srcFormat == AV_PIX_FMT_BGR24 &&
(dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P) &&
- !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)) && !(dstW&1))
+ !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)))
c->convert_unscaled = bgr24ToYv12Wrapper;
/* rgb24toYV12 */
if (srcFormat == AV_PIX_FMT_RGB24 &&
(dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P) &&
- !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)) && !(dstW&1))
+ !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)))
c->convert_unscaled = rgb24ToYv12Wrapper;
/* RGB/BGR -> RGB/BGR (no dither needed forms) */
diff --git a/libswscale/x86/rgb2rgb_template.c b/libswscale/x86/rgb2rgb_template.c
index dc2b4e205a..f90527aa08 100644
--- a/libswscale/x86/rgb2rgb_template.c
+++ b/libswscale/x86/rgb2rgb_template.c
@@ -1555,6 +1555,11 @@ static inline void RENAME(bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_
int y;
const x86_reg chromWidth= width>>1;
+ if ((width & 1) != 0) {
+ ff_bgr24toyv12_c(src, ydst, udst, vdst, width, height, lumStride, chromStride, srcStride, rgb2yuv);
+ return;
+ }
+
if (height > 2) {
ff_bgr24toyv12_c(src, ydst, udst, vdst, width, 2, lumStride, chromStride, srcStride, rgb2yuv);
src += 2*srcStride;
--
2.39.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* [FFmpeg-devel] [PATCH v1 5/6] swscale: Add unscaled XRGB->YUV420P functions
2023-08-20 15:10 [FFmpeg-devel] [PATCH v1 0/6] swscale: Add dedicated RGB->YUV unscaled functions & aarch64 asm John Cox
` (3 preceding siblings ...)
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 4/6] swscale: RGB24->YUV allow odd widths & improve C rounding John Cox
@ 2023-08-20 15:10 ` John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 6/6] swscale: Add aarch64 functions for RGB24->YUV420P John Cox
5 siblings, 0 replies; 14+ messages in thread
From: John Cox @ 2023-08-20 15:10 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: John Cox
Add simple C functions for converting XRGB to YUV420P. Same logic as the
RGB24 functions but dropping the A channel.
Signed-off-by: John Cox <jc@kynesim.co.uk>
---
libswscale/rgb2rgb.c | 20 +++++++
libswscale/rgb2rgb.h | 16 +++++
libswscale/rgb2rgb_template.c | 106 ++++++++++++++++++++++++++++++++++
libswscale/swscale_unscaled.c | 89 ++++++++++++++++++++++++++++
4 files changed, 231 insertions(+)
diff --git a/libswscale/rgb2rgb.c b/libswscale/rgb2rgb.c
index de90e5193f..b976341e70 100644
--- a/libswscale/rgb2rgb.c
+++ b/libswscale/rgb2rgb.c
@@ -88,6 +88,26 @@ void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst,
int width, int height,
int lumStride, int chromStride, int srcStride,
int32_t *rgb2yuv);
+void (*ff_rgbxtoyv12)(const uint8_t *src, uint8_t *ydst,
+ uint8_t *udst, uint8_t *vdst,
+ int width, int height,
+ int lumStride, int chromStride, int srcStride,
+ int32_t *rgb2yuv);
+void (*ff_bgrxtoyv12)(const uint8_t *src, uint8_t *ydst,
+ uint8_t *udst, uint8_t *vdst,
+ int width, int height,
+ int lumStride, int chromStride, int srcStride,
+ int32_t *rgb2yuv);
+void (*ff_xrgbtoyv12)(const uint8_t *src, uint8_t *ydst,
+ uint8_t *udst, uint8_t *vdst,
+ int width, int height,
+ int lumStride, int chromStride, int srcStride,
+ int32_t *rgb2yuv);
+void (*ff_xbgrtoyv12)(const uint8_t *src, uint8_t *ydst,
+ uint8_t *udst, uint8_t *vdst,
+ int width, int height,
+ int lumStride, int chromStride, int srcStride,
+ int32_t *rgb2yuv);
void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
int srcStride, int dstStride);
void (*interleaveBytes)(const uint8_t *src1, const uint8_t *src2, uint8_t *dst,
diff --git a/libswscale/rgb2rgb.h b/libswscale/rgb2rgb.h
index f7a76a92ba..0015b1568a 100644
--- a/libswscale/rgb2rgb.h
+++ b/libswscale/rgb2rgb.h
@@ -135,6 +135,22 @@ extern void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
int width, int height,
int lumStride, int chromStride, int srcStride,
int32_t *rgb2yuv);
+extern void (*ff_rgbxtoyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
+ int width, int height,
+ int lumStride, int chromStride, int srcStride,
+ int32_t *rgb2yuv);
+extern void (*ff_bgrxtoyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
+ int width, int height,
+ int lumStride, int chromStride, int srcStride,
+ int32_t *rgb2yuv);
+extern void (*ff_xrgbtoyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
+ int width, int height,
+ int lumStride, int chromStride, int srcStride,
+ int32_t *rgb2yuv);
+extern void (*ff_xbgrtoyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
+ int width, int height,
+ int lumStride, int chromStride, int srcStride,
+ int32_t *rgb2yuv);
extern void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
int srcStride, int dstStride);
diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c
index 5503e58a29..22326807c5 100644
--- a/libswscale/rgb2rgb_template.c
+++ b/libswscale/rgb2rgb_template.c
@@ -742,6 +742,108 @@ void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
rgb24toyv12_x(src, ydst, udst, vdst, width, height, lumStride, chromStride, srcStride, rgb2yuv, x_rgb);
}
+static void rgbxtoyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv,
+ const uint8_t x[9])
+{
+ int32_t ry = rgb2yuv[x[0]], gy = rgb2yuv[x[1]], by = rgb2yuv[x[2]];
+ int32_t ru = rgb2yuv[x[3]], gu = rgb2yuv[x[4]], bu = rgb2yuv[x[5]];
+ int32_t rv = rgb2yuv[x[6]], gv = rgb2yuv[x[7]], bv = rgb2yuv[x[8]];
+ int y;
+ const int chromWidth = width >> 1;
+ // Constants with both rounding and offset
+ const int32_t ky = ((16 << 1) + 1) << (RGB2YUV_SHIFT - 1);
+ const int32_t kc = ((128 << 1) + 1) << (RGB2YUV_SHIFT - 1);
+
+ for (y = 0; y < height; y += 2) {
+ int i;
+ for (i = 0; i < chromWidth; i++) {
+ unsigned int b = src[8 * i + 0];
+ unsigned int g = src[8 * i + 1];
+ unsigned int r = src[8 * i + 2];
+
+ unsigned int Y = (ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT;
+ unsigned int V = (rv * r + gv * g + bv * b + kc) >> RGB2YUV_SHIFT;
+ unsigned int U = (ru * r + gu * g + bu * b + kc) >> RGB2YUV_SHIFT;
+
+ udst[i] = U;
+ vdst[i] = V;
+ ydst[2 * i] = Y;
+
+ b = src[8 * i + 4];
+ g = src[8 * i + 5];
+ r = src[8 * i + 6];
+
+ Y = ((ry * r + gy * g + by * b) >> RGB2YUV_SHIFT) + 16;
+ ydst[2 * i + 1] = Y;
+ }
+ if ((width & 1) != 0) {
+ unsigned int b = src[8 * i + 0];
+ unsigned int g = src[8 * i + 1];
+ unsigned int r = src[8 * i + 2];
+
+ unsigned int Y = (ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT;
+ unsigned int V = (rv * r + gv * g + bv * b + kc) >> RGB2YUV_SHIFT;
+ unsigned int U = (ru * r + gu * g + bu * b + kc) >> RGB2YUV_SHIFT;
+
+ udst[i] = U;
+ vdst[i] = V;
+ ydst[2 * i] = Y;
+ }
+ ydst += lumStride;
+ src += srcStride;
+
+ if (y+1 == height)
+ break;
+
+ for (i = 0; i < width; i++) {
+ unsigned int b = src[4 * i + 0];
+ unsigned int g = src[4 * i + 1];
+ unsigned int r = src[4 * i + 2];
+
+ unsigned int Y = (ry * r + gy * g + by * b + ky) >> RGB2YUV_SHIFT;
+
+ ydst[i] = Y;
+ }
+ udst += chromStride;
+ vdst += chromStride;
+ ydst += lumStride;
+ src += srcStride;
+ }
+}
+
+static void ff_rgbxtoyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv)
+{
+ rgbxtoyv12_x(src, ydst, udst, vdst, width, height, lumStride, chromStride, srcStride, rgb2yuv, x_rgb);
+}
+
+static void ff_bgrxtoyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv)
+{
+ rgbxtoyv12_x(src, ydst, udst, vdst, width, height, lumStride, chromStride, srcStride, rgb2yuv, x_bgr);
+}
+
+// As the general code does no SIMD-like ops simply adding 1 to the src address
+// will fix the ignored alpha position
+static void ff_xrgbtoyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv)
+{
+ rgbxtoyv12_x(src + 1, ydst, udst, vdst, width, height, lumStride, chromStride, srcStride, rgb2yuv, x_rgb);
+}
+
+static void ff_xbgrtoyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv)
+{
+ rgbxtoyv12_x(src + 1, ydst, udst, vdst, width, height, lumStride, chromStride, srcStride, rgb2yuv, x_bgr);
+}
+
+
static void interleaveBytes_c(const uint8_t *src1, const uint8_t *src2,
uint8_t *dest, int width, int height,
int src1Stride, int src2Stride, int dstStride)
@@ -1016,6 +1118,10 @@ static av_cold void rgb2rgb_init_c(void)
planar2x = planar2x_c;
ff_rgb24toyv12 = ff_rgb24toyv12_c;
ff_bgr24toyv12 = ff_bgr24toyv12_c;
+ ff_rgbxtoyv12 = ff_rgbxtoyv12_c;
+ ff_bgrxtoyv12 = ff_bgrxtoyv12_c;
+ ff_xrgbtoyv12 = ff_xrgbtoyv12_c;
+ ff_xbgrtoyv12 = ff_xbgrtoyv12_c;
interleaveBytes = interleaveBytes_c;
deinterleaveBytes = deinterleaveBytes_c;
vu9_to_vu12 = vu9_to_vu12_c;
diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
index e10f967755..ff682d367c 100644
--- a/libswscale/swscale_unscaled.c
+++ b/libswscale/swscale_unscaled.c
@@ -1671,6 +1671,74 @@ static int rgb24ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
return srcSliceH;
}
+static int bgrxToYv12Wrapper(SwsContext *c, const uint8_t *src[],
+ int srcStride[], int srcSliceY, int srcSliceH,
+ uint8_t *dst[], int dstStride[])
+{
+ ff_bgrxtoyv12(
+ src[0],
+ dst[0] + srcSliceY * dstStride[0],
+ dst[1] + (srcSliceY >> 1) * dstStride[1],
+ dst[2] + (srcSliceY >> 1) * dstStride[2],
+ c->srcW, srcSliceH,
+ dstStride[0], dstStride[1], srcStride[0],
+ c->input_rgb2yuv_table);
+ if (dst[3])
+ fillPlane(dst[3], dstStride[3], c->srcW, srcSliceH, srcSliceY, 255);
+ return srcSliceH;
+}
+
+static int rgbxToYv12Wrapper(SwsContext *c, const uint8_t *src[],
+ int srcStride[], int srcSliceY, int srcSliceH,
+ uint8_t *dst[], int dstStride[])
+{
+ ff_rgbxtoyv12(
+ src[0],
+ dst[0] + srcSliceY * dstStride[0],
+ dst[1] + (srcSliceY >> 1) * dstStride[1],
+ dst[2] + (srcSliceY >> 1) * dstStride[2],
+ c->srcW, srcSliceH,
+ dstStride[0], dstStride[1], srcStride[0],
+ c->input_rgb2yuv_table);
+ if (dst[3])
+ fillPlane(dst[3], dstStride[3], c->srcW, srcSliceH, srcSliceY, 255);
+ return srcSliceH;
+}
+
+static int xbgrToYv12Wrapper(SwsContext *c, const uint8_t *src[],
+ int srcStride[], int srcSliceY, int srcSliceH,
+ uint8_t *dst[], int dstStride[])
+{
+ ff_xbgrtoyv12(
+ src[0],
+ dst[0] + srcSliceY * dstStride[0],
+ dst[1] + (srcSliceY >> 1) * dstStride[1],
+ dst[2] + (srcSliceY >> 1) * dstStride[2],
+ c->srcW, srcSliceH,
+ dstStride[0], dstStride[1], srcStride[0],
+ c->input_rgb2yuv_table);
+ if (dst[3])
+ fillPlane(dst[3], dstStride[3], c->srcW, srcSliceH, srcSliceY, 255);
+ return srcSliceH;
+}
+
+static int xrgbToYv12Wrapper(SwsContext *c, const uint8_t *src[],
+ int srcStride[], int srcSliceY, int srcSliceH,
+ uint8_t *dst[], int dstStride[])
+{
+ ff_xrgbtoyv12(
+ src[0],
+ dst[0] + srcSliceY * dstStride[0],
+ dst[1] + (srcSliceY >> 1) * dstStride[1],
+ dst[2] + (srcSliceY >> 1) * dstStride[2],
+ c->srcW, srcSliceH,
+ dstStride[0], dstStride[1], srcStride[0],
+ c->input_rgb2yuv_table);
+ if (dst[3])
+ fillPlane(dst[3], dstStride[3], c->srcW, srcSliceH, srcSliceY, 255);
+ return srcSliceH;
+}
+
static int yvu9ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
int srcStride[], int srcSliceY, int srcSliceH,
uint8_t *dst[], int dstStride[])
@@ -2059,6 +2127,27 @@ void ff_get_unscaled_swscale(SwsContext *c)
!(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)))
c->convert_unscaled = rgb24ToYv12Wrapper;
+ /* bgrxtoYV12 */
+ if (((srcFormat == AV_PIX_FMT_BGRA && dstFormat == AV_PIX_FMT_YUV420P) ||
+ (srcFormat == AV_PIX_FMT_BGR0 && (dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P))) &&
+ !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)))
+ c->convert_unscaled = bgrxToYv12Wrapper;
+ /* rgbx24toYV12 */
+ if (((srcFormat == AV_PIX_FMT_RGBA && dstFormat == AV_PIX_FMT_YUV420P) ||
+ (srcFormat == AV_PIX_FMT_RGB0 && (dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P))) &&
+ !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)))
+ c->convert_unscaled = rgbxToYv12Wrapper;
+ /* xbgrtoYV12 */
+ if (((srcFormat == AV_PIX_FMT_ABGR && dstFormat == AV_PIX_FMT_YUV420P) ||
+ (srcFormat == AV_PIX_FMT_0BGR && (dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P))) &&
+ !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)))
+ c->convert_unscaled = xbgrToYv12Wrapper;
+ /* xrgb24toYV12 */
+ if (((srcFormat == AV_PIX_FMT_ARGB && dstFormat == AV_PIX_FMT_YUV420P) ||
+ (srcFormat == AV_PIX_FMT_0RGB && (dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P))) &&
+ !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)))
+ c->convert_unscaled = xrgbToYv12Wrapper;
+
/* RGB/BGR -> RGB/BGR (no dither needed forms) */
if (isAnyRGB(srcFormat) && isAnyRGB(dstFormat) && findRgbConvFn(c)
&& (!needsDither || (c->flags&(SWS_FAST_BILINEAR|SWS_POINT))))
--
2.39.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* [FFmpeg-devel] [PATCH v1 6/6] swscale: Add aarch64 functions for RGB24->YUV420P
2023-08-20 15:10 [FFmpeg-devel] [PATCH v1 0/6] swscale: Add dedicated RGB->YUV unscaled functions & aarch64 asm John Cox
` (4 preceding siblings ...)
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 5/6] swscale: Add unscaled XRGB->YUV420P functions John Cox
@ 2023-08-20 15:10 ` John Cox
5 siblings, 0 replies; 14+ messages in thread
From: John Cox @ 2023-08-20 15:10 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: John Cox
Neon RGB24->YUV420P and BGR24->YUV420P functions. Works on 16 pixel
blocks and can do any width or height, though for widths less than 32 or
so the C is likely faster.
Signed-off-by: John Cox <jc@kynesim.co.uk>
---
libswscale/aarch64/rgb2rgb.c | 8 +
libswscale/aarch64/rgb2rgb_neon.S | 356 ++++++++++++++++++++++++++++++
2 files changed, 364 insertions(+)
diff --git a/libswscale/aarch64/rgb2rgb.c b/libswscale/aarch64/rgb2rgb.c
index a9bf6ff9e0..b2d68c1df3 100644
--- a/libswscale/aarch64/rgb2rgb.c
+++ b/libswscale/aarch64/rgb2rgb.c
@@ -30,6 +30,12 @@
void ff_interleave_bytes_neon(const uint8_t *src1, const uint8_t *src2,
uint8_t *dest, int width, int height,
int src1Stride, int src2Stride, int dstStride);
+void ff_bgr24toyv12_neon(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv);
+void ff_rgb24toyv12_neon(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv);
av_cold void rgb2rgb_init_aarch64(void)
{
@@ -37,5 +43,7 @@ av_cold void rgb2rgb_init_aarch64(void)
if (have_neon(cpu_flags)) {
interleaveBytes = ff_interleave_bytes_neon;
+ ff_rgb24toyv12 = ff_rgb24toyv12_neon;
+ ff_bgr24toyv12 = ff_bgr24toyv12_neon;
}
}
diff --git a/libswscale/aarch64/rgb2rgb_neon.S b/libswscale/aarch64/rgb2rgb_neon.S
index d81110ec57..b15e69a3bd 100644
--- a/libswscale/aarch64/rgb2rgb_neon.S
+++ b/libswscale/aarch64/rgb2rgb_neon.S
@@ -77,3 +77,359 @@ function ff_interleave_bytes_neon, export=1
0:
ret
endfunc
+
+// Expand rgb2 into r0+r1/g0+g1/b0+b1
+.macro XRGB3Y r0, g0, b0, r1, g1, b1, r2, g2, b2
+ uxtl \r0\().8h, \r2\().8b
+ uxtl \g0\().8h, \g2\().8b
+ uxtl \b0\().8h, \b2\().8b
+
+ uxtl2 \r1\().8h, \r2\().16b
+ uxtl2 \g1\().8h, \g2\().16b
+ uxtl2 \b1\().8h, \b2\().16b
+.endm
+
+// Expand rgb2 into r0+r1/g0+g1/b0+b1
+// and pick every other el to put back into rgb2 for chroma
+.macro XRGB3YC r0, g0, b0, r1, g1, b1, r2, g2, b2
+ XRGB3Y \r0, \g0, \b0, \r1, \g1, \b1, \r2, \g2, \b2
+
+ bic \r2\().8h, #0xff, LSL #8
+ bic \g2\().8h, #0xff, LSL #8
+ bic \b2\().8h, #0xff, LSL #8
+.endm
+
+.macro SMLAL3 d0, d1, s0, s1, s2, c0, c1, c2
+ smull \d0\().4s, \s0\().4h, \c0
+ smlal \d0\().4s, \s1\().4h, \c1
+ smlal \d0\().4s, \s2\().4h, \c2
+ smull2 \d1\().4s, \s0\().8h, \c0
+ smlal2 \d1\().4s, \s1\().8h, \c1
+ smlal2 \d1\().4s, \s2\().8h, \c2
+.endm
+
+// d0 may be s0
+// s0, s2 corrupted
+.macro SHRN_Y d0, s0, s1, s2, s3, k128h
+ shrn \s0\().4h, \s0\().4s, #12
+ shrn2 \s0\().8h, \s1\().4s, #12
+ add \s0\().8h, \s0\().8h, \k128h\().8h // +128 (>> 3 = 16)
+ sqrshrun \d0\().8b, \s0\().8h, #3
+ shrn \s2\().4h, \s2\().4s, #12
+ shrn2 \s2\().8h, \s3\().4s, #12
+ add \s2\().8h, \s2\().8h, \k128h\().8h
+ sqrshrun2 \d0\().16b, v28.8h, #3
+.endm
+
+.macro SHRN_C d0, s0, s1, k128b
+ shrn \s0\().4h, \s0\().4s, #14
+ shrn2 \s0\().8h, \s1\().4s, #14
+ sqrshrn \s0\().8b, \s0\().8h, #1
+ add \d0\().8b, \s0\().8b, \k128b\().8b // +128
+.endm
+
+.macro STB2V s0, n, a
+ st1 {\s0\().b}[(\n+0)], [\a], #1
+ st1 {\s0\().b}[(\n+1)], [\a], #1
+.endm
+
+.macro STB4V s0, n, a
+ STB2V \s0, (\n+0), \a
+ STB2V \s0, (\n+2), \a
+.endm
+
+
+// void ff_bgr24toyv12_neon(
+// const uint8_t *src, // x0
+// uint8_t *ydst, // x1
+// uint8_t *udst, // x2
+// uint8_t *vdst, // x3
+// int width, // w4
+// int height, // w5
+// int lumStride, // w6
+// int chromStride, // w7
+// int srcStr, // [sp, #0]
+// int32_t *rgb2yuv); // [sp, #8]
+
+function ff_bgr24toyv12_neon, export=1
+ ldr x15, [sp, #8]
+ ld3 {v3.s, v4.s, v5.s}[0], [x15], #12
+ ld3 {v3.s, v4.s, v5.s}[1], [x15], #12
+ ld3 {v3.s, v4.s, v5.s}[2], [x15]
+ mov v6.16b, v3.16b
+ mov v3.16b, v5.16b
+ mov v5.16b, v6.16b
+ b 99f
+endfunc
+
+// void ff_rgb24toyv12_neon(
+// const uint8_t *src, // x0
+// uint8_t *ydst, // x1
+// uint8_t *udst, // x2
+// uint8_t *vdst, // x3
+// int width, // w4
+// int height, // w5
+// int lumStride, // w6
+// int chromStride, // w7
+// int srcStr, // [sp, #0]
+// int32_t *rgb2yuv); // [sp, #8] (including Mac)
+
+// regs
+// v0-2 Src bytes - reused as chroma src
+// v3-5 Coeffs (packed very inefficiently - could be squashed)
+// v6 128b
+// v7 128h
+// v8-15 Reserved
+// v16-18 Lo Src expanded as H
+// v19 -
+// v20-22 Hi Src expanded as H
+// v23 -
+// v24 U out
+// v25 U tmp
+// v26 Y out
+// v27-29 Y tmp
+// v30 V out
+// v31 V tmp
+
+function ff_rgb24toyv12_neon, export=1
+ ldr x15, [sp, #8]
+ ld3 {v3.s, v4.s, v5.s}[0], [x15], #12
+ ld3 {v3.s, v4.s, v5.s}[1], [x15], #12
+ ld3 {v3.s, v4.s, v5.s}[2], [x15]
+
+99:
+ ldr w14, [sp, #0]
+ movi v7.8b, #128
+ uxtl v6.8h, v7.8b
+ // Ensure if nothing to do then we do nothing
+ cmp w4, #0
+ b.le 90f
+ cmp w5, #0
+ b.le 90f
+ // If w % 16 != 0 then -16 so we do main loop 1 fewer times with
+ // the remainder done in the tail
+ tst w4, #15
+ b.eq 1f
+ sub w4, w4, #16
+1:
+
+// -------------------- Even line body - YUV
+11:
+ subs w9, w4, #0
+ mov x10, x0
+ mov x11, x1
+ mov x12, x2
+ mov x13, x3
+ b.lt 12f
+
+ ld3 {v0.16b, v1.16b, v2.16b}, [x10], #48
+ subs w9, w9, #16
+ b.le 13f
+
+10:
+ XRGB3YC v16, v17, v18, v20, v21, v22, v0, v1, v2
+
+ // Testing shows it is faster to stack the smull/smlal ops together
+ // rather than interleave them between channels and indeed even the
+ // shift/add sections seem happier not interleaved
+
+ // Y0
+ SMLAL3 v26, v27, v16, v17, v18, v3.h[0], v4.h[0], v5.h[0]
+ // Y1
+ SMLAL3 v28, v29, v20, v21, v22, v3.h[0], v4.h[0], v5.h[0]
+ SHRN_Y v26, v26, v27, v28, v29, v6
+
+ // U
+ // Vector subscript *2 as we loaded into S but are only using H
+ SMLAL3 v24, v25, v0, v1, v2, v3.h[2], v4.h[2], v5.h[2]
+
+ // V
+ SMLAL3 v30, v31, v0, v1, v2, v3.h[4], v4.h[4], v5.h[4]
+
+ ld3 {v0.16b, v1.16b, v2.16b}, [x10], #48
+
+ SHRN_C v24, v24, v25, v7
+ SHRN_C v30, v30, v31, v7
+
+ subs w9, w9, #16
+
+ st1 {v26.16b}, [x11], #16
+ st1 {v24.8b}, [x12], #8
+ st1 {v30.8b}, [x13], #8
+
+ b.gt 10b
+
+// -------------------- Even line tail - YUV
+// If width % 16 == 0 then simply runs once with preloaded RGB
+// If other then deals with preload & then does remaining tail
+
+13:
+ // Body is simple copy of main loop body minus preload
+
+ XRGB3YC v16, v17, v18, v20, v21, v22, v0, v1, v2
+ // Y0
+ SMLAL3 v26, v27, v16, v17, v18, v3.h[0], v4.h[0], v5.h[0]
+ // Y1
+ SMLAL3 v28, v29, v20, v21, v22, v3.h[0], v4.h[0], v5.h[0]
+ SHRN_Y v26, v26, v27, v28, v29, v6
+ // U
+ SMLAL3 v24, v25, v0, v1, v2, v3.h[2], v4.h[2], v5.h[2]
+ // V
+ SMLAL3 v30, v31, v0, v1, v2, v3.h[4], v4.h[4], v5.h[4]
+
+ cmp w9, #-16
+
+ SHRN_C v24, v24, v25, v7
+ SHRN_C v30, v30, v31, v7
+
+ // Here:
+ // w9 == 0 width % 16 == 0, tail done
+ // w9 > -16 1st tail done (16 pels), remainder still to go
+ // w9 == -16 shouldn't happen
+ // w9 > -32 2nd tail done
+ // w9 <= -32 shouldn't happen
+
+ b.lt 2f
+ st1 {v26.16b}, [x11], #16
+ st1 {v24.8b}, [x12], #8
+ st1 {v30.8b}, [x13], #8
+ cbz w9, 3f
+
+12:
+ sub w9, w9, #16
+
+ tbz w9, #3, 1f
+ ld3 {v0.8b, v1.8b, v2.8b}, [x10], #24
+1: tbz w9, #2, 1f
+ ld3 {v0.b, v1.b, v2.b}[8], [x10], #3
+ ld3 {v0.b, v1.b, v2.b}[9], [x10], #3
+ ld3 {v0.b, v1.b, v2.b}[10], [x10], #3
+ ld3 {v0.b, v1.b, v2.b}[11], [x10], #3
+1: tbz w9, #1, 1f
+ ld3 {v0.b, v1.b, v2.b}[12], [x10], #3
+ ld3 {v0.b, v1.b, v2.b}[13], [x10], #3
+1: tbz w9, #0, 13b
+ ld3 {v0.b, v1.b, v2.b}[14], [x10], #3
+ b 13b
+
+2:
+ tbz w9, #3, 1f
+ st1 {v26.8b}, [x11], #8
+ STB4V v24, 0, x12
+ STB4V v30, 0, x13
+1: tbz w9, #2, 1f
+ STB4V v26 8, x11
+ STB2V v24, 4, x12
+ STB2V v30, 4, x13
+1: tbz w9, #1, 1f
+ STB2V v26, 12, x11
+ st1 {v24.b}[6], [x12], #1
+ st1 {v30.b}[6], [x13], #1
+1: tbz w9, #0, 1f
+ st1 {v26.b}[14], [x11]
+ st1 {v24.b}[7], [x12]
+ st1 {v30.b}[7], [x13]
+1:
+3:
+
+// -------------------- Odd line body - Y only
+
+ subs w5, w5, #1
+ b.eq 90f
+
+ subs w9, w4, #0
+ add x0, x0, w14, sxtx
+ add x1, x1, w6, sxtx
+ mov x10, x0
+ mov x11, x1
+ b.lt 12f
+
+ ld3 {v0.16b, v1.16b, v2.16b}, [x10], #48
+ subs w9, w9, #16
+ b.le 13f
+
+10:
+ XRGB3Y v16, v17, v18, v20, v21, v22, v0, v1, v2
+ // Y0
+ SMLAL3 v26, v27, v16, v17, v18, v3.h[0], v4.h[0], v5.h[0]
+ // Y1
+ SMLAL3 v28, v29, v20, v21, v22, v3.h[0], v4.h[0], v5.h[0]
+
+ ld3 {v0.16b, v1.16b, v2.16b}, [x10], #48
+
+ SHRN_Y v26, v26, v27, v28, v29, v6
+
+ subs w9, w9, #16
+
+ st1 {v26.16b}, [x11], #16
+
+ b.gt 10b
+
+// -------------------- Odd line tail - Y
+// If width % 16 == 0 then simply runs once with preloaded RGB
+// If other then deals with preload & then does remaining tail
+
+13:
+ // Body is simple copy of main loop body minus preload
+
+ XRGB3Y v16, v17, v18, v20, v21, v22, v0, v1, v2
+ // Y0
+ SMLAL3 v26, v27, v16, v17, v18, v3.h[0], v4.h[0], v5.h[0]
+ // Y1
+ SMLAL3 v28, v29, v20, v21, v22, v3.h[0], v4.h[0], v5.h[0]
+
+ cmp w9, #-16
+
+ SHRN_Y v26, v26, v27, v28, v29, v6
+
+ // Here:
+ // w9 == 0 width % 16 == 0, tail done
+ // w9 > -16 1st tail done (16 pels), remainder still to go
+ // w9 == -16 shouldn't happen
+ // w9 > -32 2nd tail done
+ // w9 <= -32 shouldn't happen
+
+ b.lt 2f
+ st1 {v26.16b}, [x11], #16
+ cbz w9, 3f
+
+12:
+ sub w9, w9, #16
+
+ tbz w9, #3, 1f
+ ld3 {v0.8b, v1.8b, v2.8b}, [x10], #24
+1: tbz w9, #2, 1f
+ ld3 {v0.b, v1.b, v2.b}[8], [x10], #3
+ ld3 {v0.b, v1.b, v2.b}[9], [x10], #3
+ ld3 {v0.b, v1.b, v2.b}[10], [x10], #3
+ ld3 {v0.b, v1.b, v2.b}[11], [x10], #3
+1: tbz w9, #1, 1f
+ ld3 {v0.b, v1.b, v2.b}[12], [x10], #3
+ ld3 {v0.b, v1.b, v2.b}[13], [x10], #3
+1: tbz w9, #0, 13b
+ ld3 {v0.b, v1.b, v2.b}[14], [x10], #3
+ b 13b
+
+2:
+ tbz w9, #3, 1f
+ st1 {v26.8b}, [x11], #8
+1: tbz w9, #2, 1f
+ STB4V v26, 8, x11
+1: tbz w9, #1, 1f
+ STB2V v26, 12, x11
+1: tbz w9, #0, 1f
+ st1 {v26.b}[14], [x11]
+1:
+3:
+
+// ------------------- Loop to start
+
+ add x0, x0, w14, sxtx
+ add x1, x1, w6, sxtx
+ add x2, x2, w7, sxtx
+ add x3, x3, w7, sxtx
+ subs w5, w5, #1
+ b.gt 11b
+90:
+ ret
+endfunc
--
2.39.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion John Cox
@ 2023-08-20 17:16 ` Michael Niedermayer
2023-08-20 17:45 ` Michael Niedermayer
2023-08-20 18:09 ` John Cox
0 siblings, 2 replies; 14+ messages in thread
From: Michael Niedermayer @ 2023-08-20 17:16 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 5893 bytes --]
On Sun, Aug 20, 2023 at 03:10:19PM +0000, John Cox wrote:
> Add a rgb24->yuv420p conversion. Uses the same code as the existing
> bgr24->yuv converter but permutes the conversion array to swap R & B
> coefficients.
>
> Signed-off-by: John Cox <jc@kynesim.co.uk>
> ---
> libswscale/rgb2rgb.c | 5 +++++
> libswscale/rgb2rgb.h | 7 +++++++
> libswscale/rgb2rgb_template.c | 38 ++++++++++++++++++++++++++++++-----
> libswscale/swscale_unscaled.c | 24 +++++++++++++++++++++-
> 4 files changed, 68 insertions(+), 6 deletions(-)
>
> diff --git a/libswscale/rgb2rgb.c b/libswscale/rgb2rgb.c
> index 8707917800..de90e5193f 100644
> --- a/libswscale/rgb2rgb.c
> +++ b/libswscale/rgb2rgb.c
> @@ -83,6 +83,11 @@ void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst,
> int width, int height,
> int lumStride, int chromStride, int srcStride,
> int32_t *rgb2yuv);
> +void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst,
> + uint8_t *udst, uint8_t *vdst,
> + int width, int height,
> + int lumStride, int chromStride, int srcStride,
> + int32_t *rgb2yuv);
> void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
> int srcStride, int dstStride);
> void (*interleaveBytes)(const uint8_t *src1, const uint8_t *src2, uint8_t *dst,
> diff --git a/libswscale/rgb2rgb.h b/libswscale/rgb2rgb.h
> index 305b830920..f7a76a92ba 100644
> --- a/libswscale/rgb2rgb.h
> +++ b/libswscale/rgb2rgb.h
> @@ -79,6 +79,9 @@ void rgb12to15(const uint8_t *src, uint8_t *dst, int src_size);
> void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> uint8_t *vdst, int width, int height, int lumStride,
> int chromStride, int srcStride, int32_t *rgb2yuv);
> +void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> + uint8_t *vdst, int width, int height, int lumStride,
> + int chromStride, int srcStride, int32_t *rgb2yuv);
>
> /**
> * Height should be a multiple of 2 and width should be a multiple of 16.
> @@ -128,6 +131,10 @@ extern void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> int width, int height,
> int lumStride, int chromStride, int srcStride,
> int32_t *rgb2yuv);
> +extern void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
> + int width, int height,
> + int lumStride, int chromStride, int srcStride,
> + int32_t *rgb2yuv);
> extern void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
> int srcStride, int dstStride);
>
> diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c
> index 8ef4a2cf5d..e57bfa6545 100644
> --- a/libswscale/rgb2rgb_template.c
> +++ b/libswscale/rgb2rgb_template.c
> @@ -646,13 +646,14 @@ static inline void uyvytoyv12_c(const uint8_t *src, uint8_t *ydst,
> * others are ignored in the C version.
> * FIXME: Write HQ version.
> */
> -void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> +static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
this probably should be inline
also i see now "FIXME: Write HQ version." above here. Do you really want to
add a low quality rgb24toyv12 ?
(it is vissible on the diagonal border (cyan / red )) in
./ffmpeg -f lavfi -i testsrc=size=5632x3168 -pix_fmt yuv420p -vframes 1 -qscale 1 -strict -1 new.jpg
also on smaller sizes but for some reason its clearer on the big one zoomed in 400% with gimp
(the gimp test was done with the whole patchset not after this patch)
[...]
> diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
> index 32e0d7f63c..751bdcb2e4 100644
> --- a/libswscale/swscale_unscaled.c
> +++ b/libswscale/swscale_unscaled.c
> @@ -1654,6 +1654,23 @@ static int bgr24ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
> return srcSliceH;
> }
>
> +static int rgb24ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
> + int srcStride[], int srcSliceY, int srcSliceH,
> + uint8_t *dst[], int dstStride[])
> +{
> + ff_rgb24toyv12(
> + src[0],
> + dst[0] + srcSliceY * dstStride[0],
> + dst[1] + (srcSliceY >> 1) * dstStride[1],
> + dst[2] + (srcSliceY >> 1) * dstStride[2],
> + c->srcW, srcSliceH,
> + dstStride[0], dstStride[1], srcStride[0],
> + c->input_rgb2yuv_table);
> + if (dst[3])
> + fillPlane(dst[3], dstStride[3], c->srcW, srcSliceH, srcSliceY, 255);
> + return srcSliceH;
> +}
> +
> static int yvu9ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
> int srcStride[], int srcSliceY, int srcSliceH,
> uint8_t *dst[], int dstStride[])
> @@ -2035,8 +2052,13 @@ void ff_get_unscaled_swscale(SwsContext *c)
> /* bgr24toYV12 */
> if (srcFormat == AV_PIX_FMT_BGR24 &&
> (dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P) &&
> - !(flags & SWS_ACCURATE_RND) && !(dstW&1))
> + !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)) && !(dstW&1))
this doesnt belong in this patch
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion
2023-08-20 17:16 ` Michael Niedermayer
@ 2023-08-20 17:45 ` Michael Niedermayer
2023-08-20 18:28 ` John Cox
2023-08-20 18:09 ` John Cox
1 sibling, 1 reply; 14+ messages in thread
From: Michael Niedermayer @ 2023-08-20 17:45 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 4778 bytes --]
On Sun, Aug 20, 2023 at 07:16:14PM +0200, Michael Niedermayer wrote:
> On Sun, Aug 20, 2023 at 03:10:19PM +0000, John Cox wrote:
> > Add a rgb24->yuv420p conversion. Uses the same code as the existing
> > bgr24->yuv converter but permutes the conversion array to swap R & B
> > coefficients.
> >
> > Signed-off-by: John Cox <jc@kynesim.co.uk>
> > ---
> > libswscale/rgb2rgb.c | 5 +++++
> > libswscale/rgb2rgb.h | 7 +++++++
> > libswscale/rgb2rgb_template.c | 38 ++++++++++++++++++++++++++++++-----
> > libswscale/swscale_unscaled.c | 24 +++++++++++++++++++++-
> > 4 files changed, 68 insertions(+), 6 deletions(-)
> >
> > diff --git a/libswscale/rgb2rgb.c b/libswscale/rgb2rgb.c
> > index 8707917800..de90e5193f 100644
> > --- a/libswscale/rgb2rgb.c
> > +++ b/libswscale/rgb2rgb.c
> > @@ -83,6 +83,11 @@ void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst,
> > int width, int height,
> > int lumStride, int chromStride, int srcStride,
> > int32_t *rgb2yuv);
> > +void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst,
> > + uint8_t *udst, uint8_t *vdst,
> > + int width, int height,
> > + int lumStride, int chromStride, int srcStride,
> > + int32_t *rgb2yuv);
> > void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
> > int srcStride, int dstStride);
> > void (*interleaveBytes)(const uint8_t *src1, const uint8_t *src2, uint8_t *dst,
> > diff --git a/libswscale/rgb2rgb.h b/libswscale/rgb2rgb.h
> > index 305b830920..f7a76a92ba 100644
> > --- a/libswscale/rgb2rgb.h
> > +++ b/libswscale/rgb2rgb.h
> > @@ -79,6 +79,9 @@ void rgb12to15(const uint8_t *src, uint8_t *dst, int src_size);
> > void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> > uint8_t *vdst, int width, int height, int lumStride,
> > int chromStride, int srcStride, int32_t *rgb2yuv);
> > +void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> > + uint8_t *vdst, int width, int height, int lumStride,
> > + int chromStride, int srcStride, int32_t *rgb2yuv);
> >
> > /**
> > * Height should be a multiple of 2 and width should be a multiple of 16.
> > @@ -128,6 +131,10 @@ extern void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> > int width, int height,
> > int lumStride, int chromStride, int srcStride,
> > int32_t *rgb2yuv);
> > +extern void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
> > + int width, int height,
> > + int lumStride, int chromStride, int srcStride,
> > + int32_t *rgb2yuv);
> > extern void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
> > int srcStride, int dstStride);
> >
> > diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c
> > index 8ef4a2cf5d..e57bfa6545 100644
> > --- a/libswscale/rgb2rgb_template.c
> > +++ b/libswscale/rgb2rgb_template.c
>
>
> > @@ -646,13 +646,14 @@ static inline void uyvytoyv12_c(const uint8_t *src, uint8_t *ydst,
> > * others are ignored in the C version.
> > * FIXME: Write HQ version.
> > */
> > -void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> > +static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>
> this probably should be inline
>
> also i see now "FIXME: Write HQ version." above here. Do you really want to
> add a low quality rgb24toyv12 ?
> (it is vissible on the diagonal border (cyan / red )) in
> ./ffmpeg -f lavfi -i testsrc=size=5632x3168 -pix_fmt yuv420p -vframes 1 -qscale 1 -strict -1 new.jpg
>
> also on smaller sizes but for some reason its clearer on the big one zoomed in 400% with gimp
> (the gimp test was done with the whole patchset not after this patch)
Also the reason why its LQ and looks like it does is because
1. half the RGB samples are ignored in computing the chroma samples
2. the chroma sample locations are ignored, the locations for yuv420 are reaonable standard
this needs some simple filter to get from a few RGB samples to the RGB sample co-located
with ths UV sample before RGB->UV
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion
2023-08-20 17:16 ` Michael Niedermayer
2023-08-20 17:45 ` Michael Niedermayer
@ 2023-08-20 18:09 ` John Cox
1 sibling, 0 replies; 14+ messages in thread
From: John Cox @ 2023-08-20 18:09 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Sun, 20 Aug 2023 19:16:14 +0200, you wrote:
>On Sun, Aug 20, 2023 at 03:10:19PM +0000, John Cox wrote:
>> Add a rgb24->yuv420p conversion. Uses the same code as the existing
>> bgr24->yuv converter but permutes the conversion array to swap R & B
>> coefficients.
>>
>> Signed-off-by: John Cox <jc@kynesim.co.uk>
>> ---
>> libswscale/rgb2rgb.c | 5 +++++
>> libswscale/rgb2rgb.h | 7 +++++++
>> libswscale/rgb2rgb_template.c | 38 ++++++++++++++++++++++++++++++-----
>> libswscale/swscale_unscaled.c | 24 +++++++++++++++++++++-
>> 4 files changed, 68 insertions(+), 6 deletions(-)
>>
>> diff --git a/libswscale/rgb2rgb.c b/libswscale/rgb2rgb.c
>> index 8707917800..de90e5193f 100644
>> --- a/libswscale/rgb2rgb.c
>> +++ b/libswscale/rgb2rgb.c
>> @@ -83,6 +83,11 @@ void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst,
>> int width, int height,
>> int lumStride, int chromStride, int srcStride,
>> int32_t *rgb2yuv);
>> +void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst,
>> + uint8_t *udst, uint8_t *vdst,
>> + int width, int height,
>> + int lumStride, int chromStride, int srcStride,
>> + int32_t *rgb2yuv);
>> void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
>> int srcStride, int dstStride);
>> void (*interleaveBytes)(const uint8_t *src1, const uint8_t *src2, uint8_t *dst,
>> diff --git a/libswscale/rgb2rgb.h b/libswscale/rgb2rgb.h
>> index 305b830920..f7a76a92ba 100644
>> --- a/libswscale/rgb2rgb.h
>> +++ b/libswscale/rgb2rgb.h
>> @@ -79,6 +79,9 @@ void rgb12to15(const uint8_t *src, uint8_t *dst, int src_size);
>> void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> uint8_t *vdst, int width, int height, int lumStride,
>> int chromStride, int srcStride, int32_t *rgb2yuv);
>> +void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> + uint8_t *vdst, int width, int height, int lumStride,
>> + int chromStride, int srcStride, int32_t *rgb2yuv);
>>
>> /**
>> * Height should be a multiple of 2 and width should be a multiple of 16.
>> @@ -128,6 +131,10 @@ extern void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> int width, int height,
>> int lumStride, int chromStride, int srcStride,
>> int32_t *rgb2yuv);
>> +extern void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
>> + int width, int height,
>> + int lumStride, int chromStride, int srcStride,
>> + int32_t *rgb2yuv);
>> extern void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
>> int srcStride, int dstStride);
>>
>> diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c
>> index 8ef4a2cf5d..e57bfa6545 100644
>> --- a/libswscale/rgb2rgb_template.c
>> +++ b/libswscale/rgb2rgb_template.c
>
>
>> @@ -646,13 +646,14 @@ static inline void uyvytoyv12_c(const uint8_t *src, uint8_t *ydst,
>> * others are ignored in the C version.
>> * FIXME: Write HQ version.
>> */
>> -void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> +static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>
>this probably should be inline
Could do, and I will if you deem it important, but the only bit that
inline is going to help is the matrix coefficient loading and that
happens once outside the main loops.
>also i see now "FIXME: Write HQ version." above here. Do you really want to
>add a low quality rgb24toyv12 ?
>(it is vissible on the diagonal border (cyan / red )) in
> ./ffmpeg -f lavfi -i testsrc=size=5632x3168 -pix_fmt yuv420p -vframes 1 -qscale 1 -strict -1 new.jpg
>
> also on smaller sizes but for some reason its clearer on the big one zoomed in 400% with gimp
>(the gimp test was done with the whole patchset not after this patch)
On the whole - yes - in the encode path on the Pi that I'm writing this
for speed is more important than quality - the existing path is too slow
to be usable. And honestly - using your example above comparing (Windows
photo viewer zoomed in s.t. pixels are clearly individually visible) the
general (bitexact), presumably HQ, output vs the new code I grant that
the new is slightly muckier but not by a huge amount - sharp chroma
transitions in 420 are always nasty.
>[...]
>> diff --git a/libswscale/swscale_unscaled.c b/libswscale/swscale_unscaled.c
>> index 32e0d7f63c..751bdcb2e4 100644
>> --- a/libswscale/swscale_unscaled.c
>> +++ b/libswscale/swscale_unscaled.c
>> @@ -1654,6 +1654,23 @@ static int bgr24ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
>> return srcSliceH;
>> }
>>
>> +static int rgb24ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
>> + int srcStride[], int srcSliceY, int srcSliceH,
>> + uint8_t *dst[], int dstStride[])
>> +{
>> + ff_rgb24toyv12(
>> + src[0],
>> + dst[0] + srcSliceY * dstStride[0],
>> + dst[1] + (srcSliceY >> 1) * dstStride[1],
>> + dst[2] + (srcSliceY >> 1) * dstStride[2],
>> + c->srcW, srcSliceH,
>> + dstStride[0], dstStride[1], srcStride[0],
>> + c->input_rgb2yuv_table);
>> + if (dst[3])
>> + fillPlane(dst[3], dstStride[3], c->srcW, srcSliceH, srcSliceY, 255);
>> + return srcSliceH;
>> +}
>> +
>> static int yvu9ToYv12Wrapper(SwsContext *c, const uint8_t *src[],
>> int srcStride[], int srcSliceY, int srcSliceH,
>> uint8_t *dst[], int dstStride[])
>
>> @@ -2035,8 +2052,13 @@ void ff_get_unscaled_swscale(SwsContext *c)
>> /* bgr24toYV12 */
>> if (srcFormat == AV_PIX_FMT_BGR24 &&
>> (dstFormat == AV_PIX_FMT_YUV420P || dstFormat == AV_PIX_FMT_YUVA420P) &&
>> - !(flags & SWS_ACCURATE_RND) && !(dstW&1))
>> + !(flags & (SWS_ACCURATE_RND | SWS_BITEXACT)) && !(dstW&1))
>
>this doesnt belong in this patch
So should it go in its own patch, or attached to some other patch?
Ta
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion
2023-08-20 17:45 ` Michael Niedermayer
@ 2023-08-20 18:28 ` John Cox
2023-08-21 19:15 ` Michael Niedermayer
0 siblings, 1 reply; 14+ messages in thread
From: John Cox @ 2023-08-20 18:28 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Cc: FFmpeg development discussions and patches
On Sun, 20 Aug 2023 19:45:11 +0200, you wrote:
>On Sun, Aug 20, 2023 at 07:16:14PM +0200, Michael Niedermayer wrote:
>> On Sun, Aug 20, 2023 at 03:10:19PM +0000, John Cox wrote:
>> > Add a rgb24->yuv420p conversion. Uses the same code as the existing
>> > bgr24->yuv converter but permutes the conversion array to swap R & B
>> > coefficients.
>> >
>> > Signed-off-by: John Cox <jc@kynesim.co.uk>
>> > ---
>> > libswscale/rgb2rgb.c | 5 +++++
>> > libswscale/rgb2rgb.h | 7 +++++++
>> > libswscale/rgb2rgb_template.c | 38 ++++++++++++++++++++++++++++++-----
>> > libswscale/swscale_unscaled.c | 24 +++++++++++++++++++++-
>> > 4 files changed, 68 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/libswscale/rgb2rgb.c b/libswscale/rgb2rgb.c
>> > index 8707917800..de90e5193f 100644
>> > --- a/libswscale/rgb2rgb.c
>> > +++ b/libswscale/rgb2rgb.c
>> > @@ -83,6 +83,11 @@ void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst,
>> > int width, int height,
>> > int lumStride, int chromStride, int srcStride,
>> > int32_t *rgb2yuv);
>> > +void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst,
>> > + uint8_t *udst, uint8_t *vdst,
>> > + int width, int height,
>> > + int lumStride, int chromStride, int srcStride,
>> > + int32_t *rgb2yuv);
>> > void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
>> > int srcStride, int dstStride);
>> > void (*interleaveBytes)(const uint8_t *src1, const uint8_t *src2, uint8_t *dst,
>> > diff --git a/libswscale/rgb2rgb.h b/libswscale/rgb2rgb.h
>> > index 305b830920..f7a76a92ba 100644
>> > --- a/libswscale/rgb2rgb.h
>> > +++ b/libswscale/rgb2rgb.h
>> > @@ -79,6 +79,9 @@ void rgb12to15(const uint8_t *src, uint8_t *dst, int src_size);
>> > void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> > uint8_t *vdst, int width, int height, int lumStride,
>> > int chromStride, int srcStride, int32_t *rgb2yuv);
>> > +void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> > + uint8_t *vdst, int width, int height, int lumStride,
>> > + int chromStride, int srcStride, int32_t *rgb2yuv);
>> >
>> > /**
>> > * Height should be a multiple of 2 and width should be a multiple of 16.
>> > @@ -128,6 +131,10 @@ extern void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> > int width, int height,
>> > int lumStride, int chromStride, int srcStride,
>> > int32_t *rgb2yuv);
>> > +extern void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
>> > + int width, int height,
>> > + int lumStride, int chromStride, int srcStride,
>> > + int32_t *rgb2yuv);
>> > extern void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
>> > int srcStride, int dstStride);
>> >
>> > diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c
>> > index 8ef4a2cf5d..e57bfa6545 100644
>> > --- a/libswscale/rgb2rgb_template.c
>> > +++ b/libswscale/rgb2rgb_template.c
>>
>>
>> > @@ -646,13 +646,14 @@ static inline void uyvytoyv12_c(const uint8_t *src, uint8_t *ydst,
>> > * others are ignored in the C version.
>> > * FIXME: Write HQ version.
>> > */
>> > -void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> > +static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>>
>> this probably should be inline
>>
>> also i see now "FIXME: Write HQ version." above here. Do you really want to
>> add a low quality rgb24toyv12 ?
>> (it is vissible on the diagonal border (cyan / red )) in
>> ./ffmpeg -f lavfi -i testsrc=size=5632x3168 -pix_fmt yuv420p -vframes 1 -qscale 1 -strict -1 new.jpg
>>
>> also on smaller sizes but for some reason its clearer on the big one zoomed in 400% with gimp
>> (the gimp test was done with the whole patchset not after this patch)
>
>Also the reason why its LQ and looks like it does is because
>1. half the RGB samples are ignored in computing the chroma samples
I thought it was a bit light but it is what the existing code did
>2. the chroma sample locations are ignored, the locations for yuv420 are reaonable standard
As I recall MPEG-1 has chroma at (0.5, 0.5), MPEG-II defaults to (0.5,
0), H.265 defaults to (0,0). Printing out dst_h_chr_pos, dst_v_chr_pos
in the setup of your example yields -513, 128 which I'm guessing means
(unset, 0.5) - am I looking at the correct vars?
>this needs some simple filter to get from a few RGB samples to the RGB sample co-located
>with ths UV sample before RGB->UV
I can get to simple bilinear without adding so much complexity that I
lose the speed I need - would that be OK?
Ta
>thx
>
>[...]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion
2023-08-20 18:28 ` John Cox
@ 2023-08-21 19:15 ` Michael Niedermayer
2023-08-22 14:24 ` John Cox
0 siblings, 1 reply; 14+ messages in thread
From: Michael Niedermayer @ 2023-08-21 19:15 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 7097 bytes --]
On Sun, Aug 20, 2023 at 07:28:40PM +0100, John Cox wrote:
> On Sun, 20 Aug 2023 19:45:11 +0200, you wrote:
>
> >On Sun, Aug 20, 2023 at 07:16:14PM +0200, Michael Niedermayer wrote:
> >> On Sun, Aug 20, 2023 at 03:10:19PM +0000, John Cox wrote:
> >> > Add a rgb24->yuv420p conversion. Uses the same code as the existing
> >> > bgr24->yuv converter but permutes the conversion array to swap R & B
> >> > coefficients.
> >> >
> >> > Signed-off-by: John Cox <jc@kynesim.co.uk>
> >> > ---
> >> > libswscale/rgb2rgb.c | 5 +++++
> >> > libswscale/rgb2rgb.h | 7 +++++++
> >> > libswscale/rgb2rgb_template.c | 38 ++++++++++++++++++++++++++++++-----
> >> > libswscale/swscale_unscaled.c | 24 +++++++++++++++++++++-
> >> > 4 files changed, 68 insertions(+), 6 deletions(-)
> >> >
> >> > diff --git a/libswscale/rgb2rgb.c b/libswscale/rgb2rgb.c
> >> > index 8707917800..de90e5193f 100644
> >> > --- a/libswscale/rgb2rgb.c
> >> > +++ b/libswscale/rgb2rgb.c
> >> > @@ -83,6 +83,11 @@ void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst,
> >> > int width, int height,
> >> > int lumStride, int chromStride, int srcStride,
> >> > int32_t *rgb2yuv);
> >> > +void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst,
> >> > + uint8_t *udst, uint8_t *vdst,
> >> > + int width, int height,
> >> > + int lumStride, int chromStride, int srcStride,
> >> > + int32_t *rgb2yuv);
> >> > void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
> >> > int srcStride, int dstStride);
> >> > void (*interleaveBytes)(const uint8_t *src1, const uint8_t *src2, uint8_t *dst,
> >> > diff --git a/libswscale/rgb2rgb.h b/libswscale/rgb2rgb.h
> >> > index 305b830920..f7a76a92ba 100644
> >> > --- a/libswscale/rgb2rgb.h
> >> > +++ b/libswscale/rgb2rgb.h
> >> > @@ -79,6 +79,9 @@ void rgb12to15(const uint8_t *src, uint8_t *dst, int src_size);
> >> > void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> >> > uint8_t *vdst, int width, int height, int lumStride,
> >> > int chromStride, int srcStride, int32_t *rgb2yuv);
> >> > +void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> >> > + uint8_t *vdst, int width, int height, int lumStride,
> >> > + int chromStride, int srcStride, int32_t *rgb2yuv);
> >> >
> >> > /**
> >> > * Height should be a multiple of 2 and width should be a multiple of 16.
> >> > @@ -128,6 +131,10 @@ extern void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> >> > int width, int height,
> >> > int lumStride, int chromStride, int srcStride,
> >> > int32_t *rgb2yuv);
> >> > +extern void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
> >> > + int width, int height,
> >> > + int lumStride, int chromStride, int srcStride,
> >> > + int32_t *rgb2yuv);
> >> > extern void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
> >> > int srcStride, int dstStride);
> >> >
> >> > diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c
> >> > index 8ef4a2cf5d..e57bfa6545 100644
> >> > --- a/libswscale/rgb2rgb_template.c
> >> > +++ b/libswscale/rgb2rgb_template.c
> >>
> >>
> >> > @@ -646,13 +646,14 @@ static inline void uyvytoyv12_c(const uint8_t *src, uint8_t *ydst,
> >> > * others are ignored in the C version.
> >> > * FIXME: Write HQ version.
> >> > */
> >> > -void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> >> > +static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
> >>
> >> this probably should be inline
> >>
> >> also i see now "FIXME: Write HQ version." above here. Do you really want to
> >> add a low quality rgb24toyv12 ?
> >> (it is vissible on the diagonal border (cyan / red )) in
> >> ./ffmpeg -f lavfi -i testsrc=size=5632x3168 -pix_fmt yuv420p -vframes 1 -qscale 1 -strict -1 new.jpg
> >>
> >> also on smaller sizes but for some reason its clearer on the big one zoomed in 400% with gimp
> >> (the gimp test was done with the whole patchset not after this patch)
> >
> >Also the reason why its LQ and looks like it does is because
> >1. half the RGB samples are ignored in computing the chroma samples
>
> I thought it was a bit light but it is what the existing code did
>
> >2. the chroma sample locations are ignored, the locations for yuv420 are reaonable standard
>
> As I recall MPEG-1 has chroma at (0.5, 0.5), MPEG-II defaults to (0.5,
> 0),
yes
> H.265 defaults to (0,0).
hmm
When the value of chroma_format_idc is equal to 1, the nominal vertical and horizontal relative locations of luma and
chroma samples in pictures are shown in Figure 6-1. Alternative chroma sample relative locations may be indicated in
video usability information (see Annex E).
X X X X X X
O O O ...
X X X X X X
X X X X X X
O O O
X X X X X X
X X X X X X
O O O
X X X X X X
. .
: ´.
X Location of luma sample
O Location of chroma sample
Figure 6-1 – Nominal vertical and horizontal locations of 4:2:0 luma and chroma samples in a picture
> Printing out dst_h_chr_pos, dst_v_chr_pos
> in the setup of your example yields -513, 128 which I'm guessing means
> (unset, 0.5) - am I looking at the correct vars?
>
> >this needs some simple filter to get from a few RGB samples to the RGB sample co-located
> >with ths UV sample before RGB->UV
>
> I can get to simple bilinear without adding so much complexity that I
> lose the speed I need - would that be OK?
Not sure simple bilinear is 100% clearly defined
I think it could mean 3 things
1 2 1
C
1 2 1
or
1
C
1
or
1 2 1
3 6 3
C
3 6 3
1 2 1
I think the 6 and 12 tap cases would produce ok results teh 2 tap not
Also maybe there are more finetuned filters for this specific case, i dont
know / didnt look.
Testing these probably would not be a bad idea before implementation
I think users in 2023 expect the default to be better than what the
existing code was doing by default
so feel free to replace the existing "identical" code too
[...]
thx
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The day soldiers stop bringing you their problems is the day you have stopped
leading them. They have either lost confidence that you can help or concluded
you do not care. Either case is a failure of leadership. - Colin Powell
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion
2023-08-21 19:15 ` Michael Niedermayer
@ 2023-08-22 14:24 ` John Cox
2023-08-22 18:03 ` Michael Niedermayer
0 siblings, 1 reply; 14+ messages in thread
From: John Cox @ 2023-08-22 14:24 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Cc: FFmpeg development discussions and patches
On Mon, 21 Aug 2023 21:15:37 +0200, you wrote:
>On Sun, Aug 20, 2023 at 07:28:40PM +0100, John Cox wrote:
>> On Sun, 20 Aug 2023 19:45:11 +0200, you wrote:
>>
>> >On Sun, Aug 20, 2023 at 07:16:14PM +0200, Michael Niedermayer wrote:
>> >> On Sun, Aug 20, 2023 at 03:10:19PM +0000, John Cox wrote:
>> >> > Add a rgb24->yuv420p conversion. Uses the same code as the existing
>> >> > bgr24->yuv converter but permutes the conversion array to swap R & B
>> >> > coefficients.
>> >> >
>> >> > Signed-off-by: John Cox <jc@kynesim.co.uk>
>> >> > ---
>> >> > libswscale/rgb2rgb.c | 5 +++++
>> >> > libswscale/rgb2rgb.h | 7 +++++++
>> >> > libswscale/rgb2rgb_template.c | 38 ++++++++++++++++++++++++++++++-----
>> >> > libswscale/swscale_unscaled.c | 24 +++++++++++++++++++++-
>> >> > 4 files changed, 68 insertions(+), 6 deletions(-)
>> >> >
>> >> > diff --git a/libswscale/rgb2rgb.c b/libswscale/rgb2rgb.c
>> >> > index 8707917800..de90e5193f 100644
>> >> > --- a/libswscale/rgb2rgb.c
>> >> > +++ b/libswscale/rgb2rgb.c
>> >> > @@ -83,6 +83,11 @@ void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst,
>> >> > int width, int height,
>> >> > int lumStride, int chromStride, int srcStride,
>> >> > int32_t *rgb2yuv);
>> >> > +void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst,
>> >> > + uint8_t *udst, uint8_t *vdst,
>> >> > + int width, int height,
>> >> > + int lumStride, int chromStride, int srcStride,
>> >> > + int32_t *rgb2yuv);
>> >> > void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
>> >> > int srcStride, int dstStride);
>> >> > void (*interleaveBytes)(const uint8_t *src1, const uint8_t *src2, uint8_t *dst,
>> >> > diff --git a/libswscale/rgb2rgb.h b/libswscale/rgb2rgb.h
>> >> > index 305b830920..f7a76a92ba 100644
>> >> > --- a/libswscale/rgb2rgb.h
>> >> > +++ b/libswscale/rgb2rgb.h
>> >> > @@ -79,6 +79,9 @@ void rgb12to15(const uint8_t *src, uint8_t *dst, int src_size);
>> >> > void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> >> > uint8_t *vdst, int width, int height, int lumStride,
>> >> > int chromStride, int srcStride, int32_t *rgb2yuv);
>> >> > +void ff_rgb24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> >> > + uint8_t *vdst, int width, int height, int lumStride,
>> >> > + int chromStride, int srcStride, int32_t *rgb2yuv);
>> >> >
>> >> > /**
>> >> > * Height should be a multiple of 2 and width should be a multiple of 16.
>> >> > @@ -128,6 +131,10 @@ extern void (*ff_bgr24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> >> > int width, int height,
>> >> > int lumStride, int chromStride, int srcStride,
>> >> > int32_t *rgb2yuv);
>> >> > +extern void (*ff_rgb24toyv12)(const uint8_t *src, uint8_t *ydst, uint8_t *udst, uint8_t *vdst,
>> >> > + int width, int height,
>> >> > + int lumStride, int chromStride, int srcStride,
>> >> > + int32_t *rgb2yuv);
>> >> > extern void (*planar2x)(const uint8_t *src, uint8_t *dst, int width, int height,
>> >> > int srcStride, int dstStride);
>> >> >
>> >> > diff --git a/libswscale/rgb2rgb_template.c b/libswscale/rgb2rgb_template.c
>> >> > index 8ef4a2cf5d..e57bfa6545 100644
>> >> > --- a/libswscale/rgb2rgb_template.c
>> >> > +++ b/libswscale/rgb2rgb_template.c
>> >>
>> >>
>> >> > @@ -646,13 +646,14 @@ static inline void uyvytoyv12_c(const uint8_t *src, uint8_t *ydst,
>> >> > * others are ignored in the C version.
>> >> > * FIXME: Write HQ version.
>> >> > */
>> >> > -void ff_bgr24toyv12_c(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> >> > +static void rgb24toyv12_x(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
>> >>
>> >> this probably should be inline
>> >>
>> >> also i see now "FIXME: Write HQ version." above here. Do you really want to
>> >> add a low quality rgb24toyv12 ?
>> >> (it is vissible on the diagonal border (cyan / red )) in
>> >> ./ffmpeg -f lavfi -i testsrc=size=5632x3168 -pix_fmt yuv420p -vframes 1 -qscale 1 -strict -1 new.jpg
>> >>
>> >> also on smaller sizes but for some reason its clearer on the big one zoomed in 400% with gimp
>> >> (the gimp test was done with the whole patchset not after this patch)
>> >
>> >Also the reason why its LQ and looks like it does is because
>> >1. half the RGB samples are ignored in computing the chroma samples
>>
>> I thought it was a bit light but it is what the existing code did
>>
>> >2. the chroma sample locations are ignored, the locations for yuv420 are reaonable standard
>>
>> As I recall MPEG-1 has chroma at (0.5, 0.5), MPEG-II defaults to (0.5,
>> 0),
>
>yes
>
>
>> H.265 defaults to (0,0).
>
>hmm
> When the value of chroma_format_idc is equal to 1, the nominal vertical and horizontal relative locations of luma and
> chroma samples in pictures are shown in Figure 6-1. Alternative chroma sample relative locations may be indicated in
> video usability information (see Annex E).
>
> X X X X X X
> O O O ...
> X X X X X X
>
> X X X X X X
> O O O
> X X X X X X
>
> X X X X X X
> O O O
> X X X X X X
> . .
> : ´.
> X Location of luma sample
> O Location of chroma sample
>
> Figure 6-1 – Nominal vertical and horizontal locations of 4:2:0 luma and chroma samples in a picture
You are right - I was remembering the special case for BT2020 ("When
chroma_format_idc is equal to 1 (4:2:0 chroma format) and the decoded
video content is intended for interpretation according to Rec. ITU-R
BT.2020-2 or Rec. ITU-R BT.2100-2, chroma_loc_info_present_flag should
be equal to 1, and chroma_sample_loc_type_top_field and
chroma_sample_loc_type_bottom_field should both be equal to 2")
>> Printing out dst_h_chr_pos, dst_v_chr_pos
>> in the setup of your example yields -513, 128 which I'm guessing means
>> (unset, 0.5) - am I looking at the correct vars?
>>
>> >this needs some simple filter to get from a few RGB samples to the RGB sample co-located
>> >with ths UV sample before RGB->UV
>>
>
>> I can get to simple bilinear without adding so much complexity that I
>> lose the speed I need - would that be OK?
>
>Not sure simple bilinear is 100% clearly defined
>I think it could mean 3 things
>
>1 2 1
> C
>1 2 1
>
>or
>
> 1
> C
> 1
>
> or
>
>1 2 1
>
>3 6 3
> C
>3 6 3
>
>1 2 1
>
>I think the 6 and 12 tap cases would produce ok results teh 2 tap not
>Also maybe there are more finetuned filters for this specific case, i dont
>know / didnt look.
>Testing these probably would not be a bad idea before implementation
>
>I think users in 2023 expect the default to be better than what the
>existing code was doing by default
>so feel free to replace the existing "identical" code too
I was thinking of 2-tap (in both X & Y) which is equivalent to
SWS_FAST_BILINEAR in ffmpeg. In the case I'm looking at I need the speed
more than I need the quality and I'm quite happy to gate them behind a
test for SWS_FAST_BILINEAR.
As an aside, with SWS_FAST_BILINEAR (and probably the other methods) in
ffmpeg you need flags=out_v_chr_pos=0:out_h_chr_pos=128 to land the YUV
chroma sample on the top-left RGB sample - that confused me for a while
whilst I was trying to work out what ffmpeg actually does!
Regards
JC
>[...]
>
>thx
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion
2023-08-22 14:24 ` John Cox
@ 2023-08-22 18:03 ` Michael Niedermayer
0 siblings, 0 replies; 14+ messages in thread
From: Michael Niedermayer @ 2023-08-22 18:03 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 1719 bytes --]
On Tue, Aug 22, 2023 at 03:24:17PM +0100, John Cox wrote:
> On Mon, 21 Aug 2023 21:15:37 +0200, you wrote:
[...]
> >> I can get to simple bilinear without adding so much complexity that I
> >> lose the speed I need - would that be OK?
> >
> >Not sure simple bilinear is 100% clearly defined
> >I think it could mean 3 things
> >
> >1 2 1
> > C
> >1 2 1
> >
> >or
> >
> > 1
> > C
> > 1
> >
> > or
> >
> >1 2 1
> >
> >3 6 3
> > C
> >3 6 3
> >
> >1 2 1
> >
> >I think the 6 and 12 tap cases would produce ok results teh 2 tap not
> >Also maybe there are more finetuned filters for this specific case, i dont
> >know / didnt look.
> >Testing these probably would not be a bad idea before implementation
> >
> >I think users in 2023 expect the default to be better than what the
> >existing code was doing by default
> >so feel free to replace the existing "identical" code too
>
> I was thinking of 2-tap (in both X & Y) which is equivalent to
> SWS_FAST_BILINEAR in ffmpeg. In the case I'm looking at I need the speed
> more than I need the quality and I'm quite happy to gate them behind a
> test for SWS_FAST_BILINEAR.
ok but maybe you want to still fix/add the higher quality C version too?
it would be almost the same code, just a different mix of source samples
iam asking as you are already working on this
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Dictatorship: All citizens are under surveillance, all their steps and
actions recorded, for the politicians to enforce control.
Democracy: All politicians are under surveillance, all their steps and
actions recorded, for the citizens to enforce control.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-08-22 18:03 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-20 15:10 [FFmpeg-devel] [PATCH v1 0/6] swscale: Add dedicated RGB->YUV unscaled functions & aarch64 asm John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 1/6] fate-filter-fps: Set swscale bitexact for tests that do conversions John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 2/6] swscale: Rename BGR24->YUV conversion functions as bgr John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion John Cox
2023-08-20 17:16 ` Michael Niedermayer
2023-08-20 17:45 ` Michael Niedermayer
2023-08-20 18:28 ` John Cox
2023-08-21 19:15 ` Michael Niedermayer
2023-08-22 14:24 ` John Cox
2023-08-22 18:03 ` Michael Niedermayer
2023-08-20 18:09 ` John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 4/6] swscale: RGB24->YUV allow odd widths & improve C rounding John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 5/6] swscale: Add unscaled XRGB->YUV420P functions John Cox
2023-08-20 15:10 ` [FFmpeg-devel] [PATCH v1 6/6] swscale: Add aarch64 functions for RGB24->YUV420P John Cox
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git