* [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME
[not found] <20240607134452.94467-1-quinkblack@foxmail.com>
@ 2024-06-07 13:44 ` Zhao Zhili
2024-06-07 14:41 ` Rémi Denis-Courmont
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 3/4] tests/checkasm: Fix build error when enable linux perf on Android Zhao Zhili
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation Zhao Zhili
2 siblings, 1 reply; 7+ messages in thread
From: Zhao Zhili @ 2024-06-07 13:44 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Zhao Zhili
From: Zhao Zhili <zhilizhao@tencent.com>
---
v3: add ff_read_time() rather than use av_gettime_relative() to get
nanosecond precision.
libavutil/timer.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/libavutil/timer.h b/libavutil/timer.h
index 2cd299eca3..3e5d5ef23f 100644
--- a/libavutil/timer.h
+++ b/libavutil/timer.h
@@ -46,6 +46,8 @@
#include "macos_kperf.h"
#elif HAVE_MACH_ABSOLUTE_TIME
#include <mach/mach_time.h>
+#elif HAVE_CLOCK_GETTIME
+#include <time.h>
#endif
#include "common.h"
@@ -70,6 +72,14 @@
# define AV_READ_TIME gethrtime
# elif HAVE_MACH_ABSOLUTE_TIME
# define AV_READ_TIME mach_absolute_time
+# elif HAVE_CLOCK_GETTIME && defined(CLOCK_MONOTONIC)
+ static inline int64_t ff_read_time(void)
+ {
+ struct timespec ts;
+ clock_gettime(CLOCK_MONOTONIC, &ts);
+ return (int64_t) ts.tv_sec * 1000000000 + ts.tv_nsec;
+ }
+# define AV_READ_TIME ff_read_time
# endif
#endif
--
2.42.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* [FFmpeg-devel] [PATCH v3 3/4] tests/checkasm: Fix build error when enable linux perf on Android
[not found] <20240607134452.94467-1-quinkblack@foxmail.com>
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME Zhao Zhili
@ 2024-06-07 13:44 ` Zhao Zhili
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation Zhao Zhili
2 siblings, 0 replies; 7+ messages in thread
From: Zhao Zhili @ 2024-06-07 13:44 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Zhao Zhili
From: Zhao Zhili <zhilizhao@tencent.com>
B0 is defined by system header, see f0f596dbc6b for ref.
---
v3: add f0f596dbc6b as ref.
tests/checkasm/llviddsp.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/tests/checkasm/llviddsp.c b/tests/checkasm/llviddsp.c
index b75c0ea099..9f8de65df4 100644
--- a/tests/checkasm/llviddsp.c
+++ b/tests/checkasm/llviddsp.c
@@ -71,7 +71,7 @@ static void check_add_bytes(LLVidDSPContext *c, int width)
}
static void check_add_median_pred(LLVidDSPContext *c, int width) {
- int A0, A1, B0, B1;
+ int a0, a1, b0, b1;
uint8_t *dst0 = av_mallocz(width);
uint8_t *dst1 = av_mallocz(width);
uint8_t *src0 = av_calloc(width, sizeof(*src0));
@@ -85,18 +85,18 @@ static void check_add_median_pred(LLVidDSPContext *c, int width) {
init_buffer(src0, src1, uint8_t, width);
init_buffer(diff0, diff1, uint8_t, width);
- A0 = rnd() & 0xFF;
- B0 = rnd() & 0xFF;
- A1 = A0;
- B1 = B0;
+ a0 = rnd() & 0xFF;
+ b0 = rnd() & 0xFF;
+ a1 = a0;
+ b1 = b0;
if (check_func(c->add_median_pred, "add_median_pred")) {
- call_ref(dst0, src0, diff0, width, &A0, &B0);
- call_new(dst1, src1, diff1, width, &A1, &B1);
- if (memcmp(dst0, dst1, width) || (A0 != A1) || (B0 != B1))
+ call_ref(dst0, src0, diff0, width, &a0, &b0);
+ call_new(dst1, src1, diff1, width, &a1, &b1);
+ if (memcmp(dst0, dst1, width) || (a0 != a1) || (b0 != b1))
fail();
- bench_new(dst1, src1, diff1, width, &A1, &B1);
+ bench_new(dst1, src1, diff1, width, &a1, &b1);
}
av_free(src0);
--
2.42.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation
[not found] <20240607134452.94467-1-quinkblack@foxmail.com>
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME Zhao Zhili
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 3/4] tests/checkasm: Fix build error when enable linux perf on Android Zhao Zhili
@ 2024-06-07 13:44 ` Zhao Zhili
2024-06-10 11:59 ` Martin Storsjö
2 siblings, 1 reply; 7+ messages in thread
From: Zhao Zhili @ 2024-06-07 13:44 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Zhao Zhili
From: Zhao Zhili <zhilizhao@tencent.com>
Test on Apple M1:
rgb24_to_uv_8_c: 0.0
rgb24_to_uv_8_neon: 0.2
rgb24_to_uv_128_c: 1.0
rgb24_to_uv_128_neon: 0.5
rgb24_to_uv_1080_c: 7.0
rgb24_to_uv_1080_neon: 5.7
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_8_c: 0.2
rgb24_to_uv_half_8_neon: 0.2
rgb24_to_uv_half_128_c: 1.0
rgb24_to_uv_half_128_neon: 0.5
rgb24_to_uv_half_1080_c: 6.2
rgb24_to_uv_half_1080_neon: 3.0
rgb24_to_uv_half_1920_c: 11.2
rgb24_to_uv_half_1920_neon: 5.2
rgb24_to_y_8_c: 0.2
rgb24_to_y_8_neon: 0.0
rgb24_to_y_128_c: 0.5
rgb24_to_y_128_neon: 0.5
rgb24_to_y_1080_c: 4.7
rgb24_to_y_1080_neon: 3.2
rgb24_to_y_1920_c: 8.0
rgb24_to_y_1920_neon: 5.7
On Pixel 6:
rgb24_to_uv_8_c: 30.7
rgb24_to_uv_8_neon: 56.9
rgb24_to_uv_128_c: 213.9
rgb24_to_uv_128_neon: 173.2
rgb24_to_uv_1080_c: 1649.9
rgb24_to_uv_1080_neon: 1424.4
rgb24_to_uv_1920_c: 2907.9
rgb24_to_uv_1920_neon: 2480.7
rgb24_to_uv_half_8_c: 36.2
rgb24_to_uv_half_8_neon: 33.4
rgb24_to_uv_half_128_c: 167.9
rgb24_to_uv_half_128_neon: 99.4
rgb24_to_uv_half_1080_c: 1293.9
rgb24_to_uv_half_1080_neon: 778.7
rgb24_to_uv_half_1920_c: 2292.7
rgb24_to_uv_half_1920_neon: 1328.7
rgb24_to_y_8_c: 19.7
rgb24_to_y_8_neon: 27.7
rgb24_to_y_128_c: 129.9
rgb24_to_y_128_neon: 96.7
rgb24_to_y_1080_c: 995.4
rgb24_to_y_1080_neon: 767.7
rgb24_to_y_1920_c: 1747.4
rgb24_to_y_1920_neon: 1337.2
Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
---
v3: Fix comments.
libswscale/aarch64/Makefile | 1 +
libswscale/aarch64/input.S | 202 +++++++++++++++++++++++++++++++++++
libswscale/aarch64/swscale.c | 25 +++++
3 files changed, 228 insertions(+)
create mode 100644 libswscale/aarch64/input.S
diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile
index da1d909561..adfd90a1b6 100644
--- a/libswscale/aarch64/Makefile
+++ b/libswscale/aarch64/Makefile
@@ -3,6 +3,7 @@ OBJS += aarch64/rgb2rgb.o \
aarch64/swscale_unscaled.o \
NEON-OBJS += aarch64/hscale.o \
+ aarch64/input.o \
aarch64/output.o \
aarch64/rgb2rgb_neon.o \
aarch64/yuv2rgb_neon.o \
diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S
new file mode 100644
index 0000000000..33afa34111
--- /dev/null
+++ b/libswscale/aarch64/input.S
@@ -0,0 +1,202 @@
+/*
+ * Copyright (c) 2024 Zhao Zhili <quinkblack@foxmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+.macro rgb24_to_yuv_load_rgb, src
+ ld3 { v16.16b, v17.16b, v18.16b }, [\src]
+ uxtl v19.8h, v16.8b // v19: r
+ uxtl v20.8h, v17.8b // v20: g
+ uxtl v21.8h, v18.8b // v21: b
+ uxtl2 v22.8h, v16.16b // v22: r
+ uxtl2 v23.8h, v17.16b // v23: g
+ uxtl2 v24.8h, v18.16b // v24: b
+.endm
+
+.macro rgb24_to_yuv_product, r, g, b, dst1, dst2, dst, coef0, coef1, coef2, right_shift
+ mov \dst1\().16b, v6.16b // dst1 = const_offset
+ mov \dst2\().16b, v6.16b // dst2 = const_offset
+ smlal \dst1\().4s, \coef0\().4h, \r\().4h // dst1 += rx * r
+ smlal \dst1\().4s, \coef1\().4h, \g\().4h // dst1 += gx * g
+ smlal \dst1\().4s, \coef2\().4h, \b\().4h // dst1 += bx * b
+ smlal2 \dst2\().4s, \coef0\().8h, \r\().8h // dst2 += rx * r
+ smlal2 \dst2\().4s, \coef1\().8h, \g\().8h // dst2 += gx * g
+ smlal2 \dst2\().4s, \coef2\().8h, \b\().8h // dst2 += bx * b
+ sqshrn \dst\().4h, \dst1\().4s, \right_shift // dst_lower_half = dst1 >> right_shift
+ sqshrn2 \dst\().8h, \dst2\().4s, \right_shift // dst_higher_half = dst2 >> right_shift
+.endm
+
+function ff_rgb24ToY_neon, export=1
+ cmp w4, #0 // check width > 0
+ ldp w10, w11, [x5] // w10: ry, w11: gy
+ ldr w12, [x5, #8] // w12: by
+ b.le 3f
+
+ mov w9, #256 // w9 = 1 << (RGB2YUV_SHIFT - 7)
+ movk w9, #8, lsl #16 // w9 += 32 << (RGB2YUV_SHIFT - 1)
+ dup v6.4s, w9 // w9: const_offset
+
+ cmp w4, #16
+ dup v0.8h, w10
+ dup v1.8h, w11
+ dup v2.8h, w12
+ b.lt 2f
+1:
+ rgb24_to_yuv_load_rgb x1
+ rgb24_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9
+ rgb24_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9
+ sub w4, w4, #16 // width -= 16
+ add x1, x1, #48 // src += 48
+ cmp w4, #16 // width >= 16 ?
+ stp q16, q17, [x0], #32 // store to dst
+ b.ge 1b
+ cbz x4, 3f
+2:
+ ldrb w13, [x1] // w13: r
+ ldrb w14, [x1, #1] // w14: g
+ ldrb w15, [x1, #2] // w15: b
+
+ smaddl x13, w13, w10, x9 // x13 = ry * r + const_offset
+ smaddl x13, w14, w11, x13 // x13 += gy * g
+ smaddl x13, w15, w12, x13 // x13 += by * b
+ asr w13, w13, #9 // x13 >>= 9
+ sub w4, w4, #1 // width--
+ add x1, x1, #3 // src += 3
+ strh w13, [x0], #2 // store to dst
+ cbnz w4, 2b
+3:
+ ret
+endfunc
+
+.macro rgb24_load_uv_coeff half
+ ldp w10, w11, [x6, #12] // w10: ru, w11: gu
+ ldp w12, w13, [x6, #20] // w12: bu, w13: rv
+ ldp w14, w15, [x6, #28] // w14: gv, w15: bv
+ .if \half
+ mov w9, #512
+ movk w9, #128, lsl #16 // w9: const_offset
+ .else
+ mov w9, #256
+ movk w9, #64, lsl #16 // w9: const_offset
+ .endif
+ dup v0.8h, w10
+ dup v1.8h, w11
+ dup v2.8h, w12
+ dup v3.8h, w13
+ dup v4.8h, w14
+ dup v5.8h, w15
+ dup v6.4s, w9
+.endm
+
+function ff_rgb24ToUV_half_neon, export=1
+ cmp w5, #0 // check width > 0
+ b.le 3f
+
+ cmp w5, #8
+ rgb24_load_uv_coeff half=1
+ b.lt 2f
+1:
+ ld3 { v16.16b, v17.16b, v18.16b }, [x3]
+ uaddlp v19.8h, v16.16b // v19: r
+ uaddlp v20.8h, v17.16b // v20: g
+ uaddlp v21.8h, v18.16b // v21: b
+
+ rgb24_to_yuv_product v19, v20, v21, v22, v23, v16, v0, v1, v2, #10
+ rgb24_to_yuv_product v19, v20, v21, v24, v25, v17, v3, v4, v5, #10
+ sub w5, w5, #8 // width -= 8
+ add x3, x3, #48 // src += 48
+ cmp w5, #8 // width >= 8 ?
+ str q16, [x0], #16 // store dst_u
+ str q17, [x1], #16 // store dst_v
+ b.ge 1b
+ cbz w5, 3f
+2:
+ ldrb w2, [x3] // w2: r1
+ ldrb w4, [x3, #3] // w4: r2
+ add w2, w2, w4 // w2 = r1 + r2
+
+ ldrb w4, [x3, #1] // w4: g1
+ ldrb w7, [x3, #4] // w7: g2
+ add w4, w4, w7 // w4 = g1 + g2
+
+ ldrb w7, [x3, #2] // w7: b1
+ ldrb w8, [x3, #5] // w8: b2
+ add w7, w7, w8 // w7 = b1 + b2
+
+ smaddl x8, w2, w10, x9 // dst_u = ru * r + const_offset
+ smaddl x8, w4, w11, x8 // dst_u += gu * g
+ smaddl x8, w7, w12, x8 // dst_u += bu * b
+ asr x8, x8, #10 // dst_u >>= 10
+ strh w8, [x0], #2 // store dst_u
+
+ smaddl x8, w2, w13, x9 // dst_v = rv * r + const_offset
+ smaddl x8, w4, w14, x8 // dst_v += gv * g
+ smaddl x8, w7, w15, x8 // dst_v += bv * b
+ asr x8, x8, #10 // dst_v >>= 10
+ sub w5, w5, #1
+ add x3, x3, #6 // src += 6
+ strh w8, [x1], #2 // store dst_v
+ cbnz w5, 2b
+3:
+ ret
+endfunc
+
+function ff_rgb24ToUV_neon, export=1
+ cmp w5, #0 // check width > 0
+ b.le 3f
+
+ cmp w5, #16
+ rgb24_load_uv_coeff half=0
+ b.lt 2f
+1:
+ rgb24_to_yuv_load_rgb x3
+ rgb24_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9
+ rgb24_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9
+ rgb24_to_yuv_product v19, v20, v21, v25, v26, v18, v3, v4, v5, #9
+ rgb24_to_yuv_product v22, v23, v24, v27, v28, v19, v3, v4, v5, #9
+ sub w5, w5, #16
+ add x3, x3, #48 // src += 48
+ cmp w5, #16
+ stp q16, q17, [x0], #32 // store to dst_u
+ stp q18, q19, [x1], #32 // store to dst_v
+ b.ge 1b
+ cbz w5, 3f
+2:
+ ldrb w16, [x3] // w16: r
+ ldrb w17, [x3, #1] // w17: g
+ ldrb w4, [x3, #2] // w4: b
+
+ smaddl x8, w16, w10, x9 // x8 = ru * r + const_offset
+ smaddl x8, w17, w11, x8 // x8 += gu * g
+ smaddl x8, w4, w12, x8 // x8 += bu * b
+ asr w8, w8, #9 // x8 >>= 9
+ strh w8, [x0], #2 // store to dst_u
+
+ smaddl x8, w16, w13, x9 // x8 = rv * r + const_offset
+ smaddl x8, w17, w14, x8 // x8 += gv * g
+ smaddl x8, w4, w15, x8 // x8 += bv * b
+ asr w8, w8, #9 // x8 >>= 9
+ sub w5, w5, #1 // width--
+ add x3, x3, #3 // src += 3
+ strh w8, [x1], #2 // store to dst_v
+ cbnz w5, 2b
+3:
+ ret
+endfunc
diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c
index bbd9719a44..4c4ea39dc1 100644
--- a/libswscale/aarch64/swscale.c
+++ b/libswscale/aarch64/swscale.c
@@ -201,6 +201,20 @@ void ff_yuv2plane1_8_neon(
default: break; \
}
+void ff_rgb24ToY_neon(uint8_t *_dst, const uint8_t *src, const uint8_t *unused1,
+ const uint8_t *unused2, int width,
+ uint32_t *rgb2yuv, void *opq);
+
+void ff_rgb24ToUV_neon(uint8_t *_dstU, uint8_t *_dstV, const uint8_t *unused0,
+ const uint8_t *src1,
+ const uint8_t *src2, int width, uint32_t *rgb2yuv,
+ void *opq);
+
+void ff_rgb24ToUV_half_neon(uint8_t *_dstU, uint8_t *_dstV, const uint8_t *unused0,
+ const uint8_t *src1,
+ const uint8_t *src2, int width, uint32_t *rgb2yuv,
+ void *opq);
+
av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)
{
int cpu_flags = av_get_cpu_flags();
@@ -212,5 +226,16 @@ av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)
if (c->dstBpc == 8) {
c->yuv2planeX = ff_yuv2planeX_8_neon;
}
+ switch (c->srcFormat) {
+ case AV_PIX_FMT_RGB24:
+ c->lumToYV12 = ff_rgb24ToY_neon;
+ if (c->chrSrcHSubSample)
+ c->chrToYV12 = ff_rgb24ToUV_half_neon;
+ else
+ c->chrToYV12 = ff_rgb24ToUV_neon;
+ break;
+ default:
+ break;
+ }
}
}
--
2.42.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME Zhao Zhili
@ 2024-06-07 14:41 ` Rémi Denis-Courmont
0 siblings, 0 replies; 7+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-07 14:41 UTC (permalink / raw)
To: ffmpeg-devel
Le perjantaina 7. kesäkuuta 2024, 16.44.50 EEST Zhao Zhili a écrit :
> From: Zhao Zhili <zhilizhao@tencent.com>
>
> ---
> v3: add ff_read_time() rather than use av_gettime_relative() to get
> nanosecond precision.
>
> libavutil/timer.h | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/libavutil/timer.h b/libavutil/timer.h
> index 2cd299eca3..3e5d5ef23f 100644
> --- a/libavutil/timer.h
> +++ b/libavutil/timer.h
> @@ -46,6 +46,8 @@
> #include "macos_kperf.h"
> #elif HAVE_MACH_ABSOLUTE_TIME
> #include <mach/mach_time.h>
> +#elif HAVE_CLOCK_GETTIME
> +#include <time.h>
> #endif
>
> #include "common.h"
> @@ -70,6 +72,14 @@
> # define AV_READ_TIME gethrtime
> # elif HAVE_MACH_ABSOLUTE_TIME
> # define AV_READ_TIME mach_absolute_time
> +# elif HAVE_CLOCK_GETTIME && defined(CLOCK_MONOTONIC)
> + static inline int64_t ff_read_time(void)
> + {
> + struct timespec ts;
> + clock_gettime(CLOCK_MONOTONIC, &ts);
> + return (int64_t) ts.tv_sec * 1000000000 + ts.tv_nsec;
Wouldn't INT64_C() be more idiomatic here?
> + }
> +# define AV_READ_TIME ff_read_time
> # endif
> #endif
--
レミ・デニ-クールモン
http://www.remlab.net/
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation Zhao Zhili
@ 2024-06-10 11:59 ` Martin Storsjö
2024-06-10 16:26 ` Zhao Zhili
0 siblings, 1 reply; 7+ messages in thread
From: Martin Storsjö @ 2024-06-10 11:59 UTC (permalink / raw)
To: FFmpeg development discussions and patches; +Cc: Zhao Zhili
On Fri, 7 Jun 2024, Zhao Zhili wrote:
> From: Zhao Zhili <zhilizhao@tencent.com>
>
> Test on Apple M1:
>
> rgb24_to_uv_8_c: 0.0
> rgb24_to_uv_8_neon: 0.2
> rgb24_to_uv_128_c: 1.0
> rgb24_to_uv_128_neon: 0.5
> rgb24_to_uv_1080_c: 7.0
> rgb24_to_uv_1080_neon: 5.7
> rgb24_to_uv_1920_c: 12.5
> rgb24_to_uv_1920_neon: 9.5
> rgb24_to_uv_half_8_c: 0.2
> rgb24_to_uv_half_8_neon: 0.2
> rgb24_to_uv_half_128_c: 1.0
> rgb24_to_uv_half_128_neon: 0.5
> rgb24_to_uv_half_1080_c: 6.2
> rgb24_to_uv_half_1080_neon: 3.0
> rgb24_to_uv_half_1920_c: 11.2
> rgb24_to_uv_half_1920_neon: 5.2
> rgb24_to_y_8_c: 0.2
> rgb24_to_y_8_neon: 0.0
> rgb24_to_y_128_c: 0.5
> rgb24_to_y_128_neon: 0.5
> rgb24_to_y_1080_c: 4.7
> rgb24_to_y_1080_neon: 3.2
> rgb24_to_y_1920_c: 8.0
> rgb24_to_y_1920_neon: 5.7
>
> On Pixel 6:
>
> rgb24_to_uv_8_c: 30.7
> rgb24_to_uv_8_neon: 56.9
> rgb24_to_uv_128_c: 213.9
> rgb24_to_uv_128_neon: 173.2
> rgb24_to_uv_1080_c: 1649.9
> rgb24_to_uv_1080_neon: 1424.4
> rgb24_to_uv_1920_c: 2907.9
> rgb24_to_uv_1920_neon: 2480.7
> rgb24_to_uv_half_8_c: 36.2
> rgb24_to_uv_half_8_neon: 33.4
> rgb24_to_uv_half_128_c: 167.9
> rgb24_to_uv_half_128_neon: 99.4
> rgb24_to_uv_half_1080_c: 1293.9
> rgb24_to_uv_half_1080_neon: 778.7
> rgb24_to_uv_half_1920_c: 2292.7
> rgb24_to_uv_half_1920_neon: 1328.7
> rgb24_to_y_8_c: 19.7
> rgb24_to_y_8_neon: 27.7
> rgb24_to_y_128_c: 129.9
> rgb24_to_y_128_neon: 96.7
> rgb24_to_y_1080_c: 995.4
> rgb24_to_y_1080_neon: 767.7
> rgb24_to_y_1920_c: 1747.4
> rgb24_to_y_1920_neon: 1337.2
>
> Note both tests use clang as compiler, which has vectorization
> enabled by default with -O3.
>
> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
> ---
> v3: Fix comments.
>
> libswscale/aarch64/Makefile | 1 +
> libswscale/aarch64/input.S | 202 +++++++++++++++++++++++++++++++++++
> libswscale/aarch64/swscale.c | 25 +++++
> 3 files changed, 228 insertions(+)
> create mode 100644 libswscale/aarch64/input.S
No further comments from me, on this patchset. (Rémi had a comment on 2/4
though - I don't have a strong opinion on that matter either way.)
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation
2024-06-10 11:59 ` Martin Storsjö
@ 2024-06-10 16:26 ` Zhao Zhili
2024-06-10 16:44 ` Rémi Denis-Courmont
0 siblings, 1 reply; 7+ messages in thread
From: Zhao Zhili @ 2024-06-10 16:26 UTC (permalink / raw)
To: FFmpeg development discussions and patches; +Cc: Michael Niedermayer
> On Jun 10, 2024, at 19:59, Martin Storsjö <martin@martin.st> wrote:
>
> On Fri, 7 Jun 2024, Zhao Zhili wrote:
>
>> From: Zhao Zhili <zhilizhao@tencent.com>
>>
>
> No further comments from me, on this patchset. (Rémi had a comment on 2/4 though - I don't have a strong opinion on that matter either way.)
Thanks. I have modified patch 2/4 locally.
However, git pull and git push has no response from git at source.ffmpeg.org <http://source.ffmpeg.org/>:ffmpeg.
Git pull from https://git.ffmpeg.org <https://git.ffmpeg.org/> is fine. And there is no commit after 94f2274a. Is there
anything wrong with the server?
>
> // Martin
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org <mailto:ffmpeg-devel@ffmpeg.org>
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org <mailto:ffmpeg-devel-request@ffmpeg.org> with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation
2024-06-10 16:26 ` Zhao Zhili
@ 2024-06-10 16:44 ` Rémi Denis-Courmont
0 siblings, 0 replies; 7+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-10 16:44 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Le maanantaina 10. kesäkuuta 2024, 19.26.17 EEST Zhao Zhili a écrit :
> However, git pull and git push has no response from git at source.ffmpeg.org
> <http://source.ffmpeg.org/>:ffmpeg. Git pull from https://git.ffmpeg.org
> <https://git.ffmpeg.org/> is fine. And there is no commit after 94f2274a.
> Is there anything wrong with the server?
Either too many poorly programmed bots downloading VLC 3.0.21 from master
instead of mirrors, or that release gave somebody inspiration for their next
DDoS victim. I guess.
--
レミ・デニ-クールモン
http://www.remlab.net/
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-06-10 16:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20240607134452.94467-1-quinkblack@foxmail.com>
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME Zhao Zhili
2024-06-07 14:41 ` Rémi Denis-Courmont
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 3/4] tests/checkasm: Fix build error when enable linux perf on Android Zhao Zhili
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation Zhao Zhili
2024-06-10 11:59 ` Martin Storsjö
2024-06-10 16:26 ` Zhao Zhili
2024-06-10 16:44 ` Rémi Denis-Courmont
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git