[FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed

* [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME
       [not found] <20240607134452.94467-1-quinkblack@foxmail.com>
@ 2024-06-07 13:44 ` Zhao Zhili
  2024-06-07 14:41   ` Rémi Denis-Courmont
  2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 3/4] tests/checkasm: Fix build error when enable linux perf on Android Zhao Zhili
  2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation Zhao Zhili
  2 siblings, 1 reply; 7+ messages in thread
From: Zhao Zhili @ 2024-06-07 13:44 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Zhao Zhili

From: Zhao Zhili <zhilizhao@tencent.com>

---
v3: add ff_read_time() rather than use av_gettime_relative() to get
nanosecond precision.

 libavutil/timer.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/libavutil/timer.h b/libavutil/timer.h
index 2cd299eca3..3e5d5ef23f 100644
--- a/libavutil/timer.h
+++ b/libavutil/timer.h
@@ -46,6 +46,8 @@
 #include "macos_kperf.h"
 #elif HAVE_MACH_ABSOLUTE_TIME
 #include <mach/mach_time.h>
+#elif HAVE_CLOCK_GETTIME
+#include <time.h>
 #endif
 
 #include "common.h"
@@ -70,6 +72,14 @@
 #       define AV_READ_TIME gethrtime
 #   elif HAVE_MACH_ABSOLUTE_TIME
 #       define AV_READ_TIME mach_absolute_time
+#   elif HAVE_CLOCK_GETTIME && defined(CLOCK_MONOTONIC)
+        static inline int64_t ff_read_time(void)
+        {
+            struct timespec ts;
+            clock_gettime(CLOCK_MONOTONIC, &ts);
+            return (int64_t) ts.tv_sec * 1000000000 + ts.tv_nsec;
+        }
+#       define AV_READ_TIME ff_read_time
 #   endif
 #endif
 
-- 
2.42.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [FFmpeg-devel] [PATCH v3 3/4] tests/checkasm: Fix build error when enable linux perf on Android
       [not found] <20240607134452.94467-1-quinkblack@foxmail.com>
  2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME Zhao Zhili
@ 2024-06-07 13:44 ` Zhao Zhili
  2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation Zhao Zhili
  2 siblings, 0 replies; 7+ messages in thread
From: Zhao Zhili @ 2024-06-07 13:44 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Zhao Zhili

From: Zhao Zhili <zhilizhao@tencent.com>

B0 is defined by system header, see f0f596dbc6b for ref.
---
v3: add f0f596dbc6b as ref.

 tests/checkasm/llviddsp.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tests/checkasm/llviddsp.c b/tests/checkasm/llviddsp.c
index b75c0ea099..9f8de65df4 100644
--- a/tests/checkasm/llviddsp.c
+++ b/tests/checkasm/llviddsp.c
@@ -71,7 +71,7 @@ static void check_add_bytes(LLVidDSPContext *c, int width)
 }
 
 static void check_add_median_pred(LLVidDSPContext *c, int width) {
-    int A0, A1, B0, B1;
+    int a0, a1, b0, b1;
     uint8_t *dst0 = av_mallocz(width);
     uint8_t *dst1 = av_mallocz(width);
     uint8_t *src0  = av_calloc(width, sizeof(*src0));
@@ -85,18 +85,18 @@ static void check_add_median_pred(LLVidDSPContext *c, int width) {
     init_buffer(src0, src1, uint8_t, width);
     init_buffer(diff0, diff1, uint8_t, width);
 
-    A0 = rnd() & 0xFF;
-    B0 = rnd() & 0xFF;
-    A1 = A0;
-    B1 = B0;
+    a0 = rnd() & 0xFF;
+    b0 = rnd() & 0xFF;
+    a1 = a0;
+    b1 = b0;
 
 
     if (check_func(c->add_median_pred, "add_median_pred")) {
-        call_ref(dst0, src0, diff0, width, &A0, &B0);
-        call_new(dst1, src1, diff1, width, &A1, &B1);
-        if (memcmp(dst0, dst1, width) || (A0 != A1) || (B0 != B1))
+        call_ref(dst0, src0, diff0, width, &a0, &b0);
+        call_new(dst1, src1, diff1, width, &a1, &b1);
+        if (memcmp(dst0, dst1, width) || (a0 != a1) || (b0 != b1))
             fail();
-        bench_new(dst1, src1, diff1, width, &A1, &B1);
+        bench_new(dst1, src1, diff1, width, &a1, &b1);
     }
 
     av_free(src0);
-- 
2.42.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation
       [not found] <20240607134452.94467-1-quinkblack@foxmail.com>
  2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME Zhao Zhili
  2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 3/4] tests/checkasm: Fix build error when enable linux perf on Android Zhao Zhili
@ 2024-06-07 13:44 ` Zhao Zhili
  2024-06-10 11:59   ` Martin Storsjö
  2 siblings, 1 reply; 7+ messages in thread
From: Zhao Zhili @ 2024-06-07 13:44 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Zhao Zhili

From: Zhao Zhili <zhilizhao@tencent.com>

Test on Apple M1:

rgb24_to_uv_8_c: 0.0
rgb24_to_uv_8_neon: 0.2
rgb24_to_uv_128_c: 1.0
rgb24_to_uv_128_neon: 0.5
rgb24_to_uv_1080_c: 7.0
rgb24_to_uv_1080_neon: 5.7
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_8_c: 0.2
rgb24_to_uv_half_8_neon: 0.2
rgb24_to_uv_half_128_c: 1.0
rgb24_to_uv_half_128_neon: 0.5
rgb24_to_uv_half_1080_c: 6.2
rgb24_to_uv_half_1080_neon: 3.0
rgb24_to_uv_half_1920_c: 11.2
rgb24_to_uv_half_1920_neon: 5.2
rgb24_to_y_8_c: 0.2
rgb24_to_y_8_neon: 0.0
rgb24_to_y_128_c: 0.5
rgb24_to_y_128_neon: 0.5
rgb24_to_y_1080_c: 4.7
rgb24_to_y_1080_neon: 3.2
rgb24_to_y_1920_c: 8.0
rgb24_to_y_1920_neon: 5.7

On Pixel 6:

rgb24_to_uv_8_c: 30.7
rgb24_to_uv_8_neon: 56.9
rgb24_to_uv_128_c: 213.9
rgb24_to_uv_128_neon: 173.2
rgb24_to_uv_1080_c: 1649.9
rgb24_to_uv_1080_neon: 1424.4
rgb24_to_uv_1920_c: 2907.9
rgb24_to_uv_1920_neon: 2480.7
rgb24_to_uv_half_8_c: 36.2
rgb24_to_uv_half_8_neon: 33.4
rgb24_to_uv_half_128_c: 167.9
rgb24_to_uv_half_128_neon: 99.4
rgb24_to_uv_half_1080_c: 1293.9
rgb24_to_uv_half_1080_neon: 778.7
rgb24_to_uv_half_1920_c: 2292.7
rgb24_to_uv_half_1920_neon: 1328.7
rgb24_to_y_8_c: 19.7
rgb24_to_y_8_neon: 27.7
rgb24_to_y_128_c: 129.9
rgb24_to_y_128_neon: 96.7
rgb24_to_y_1080_c: 995.4
rgb24_to_y_1080_neon: 767.7
rgb24_to_y_1920_c: 1747.4
rgb24_to_y_1920_neon: 1337.2

Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
---
v3: Fix comments.

 libswscale/aarch64/Makefile  |   1 +
 libswscale/aarch64/input.S   | 202 +++++++++++++++++++++++++++++++++++
 libswscale/aarch64/swscale.c |  25 +++++
 3 files changed, 228 insertions(+)
 create mode 100644 libswscale/aarch64/input.S

diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile
index da1d909561..adfd90a1b6 100644
--- a/libswscale/aarch64/Makefile
+++ b/libswscale/aarch64/Makefile
@@ -3,6 +3,7 @@ OBJS        += aarch64/rgb2rgb.o                \
                aarch64/swscale_unscaled.o       \
 
 NEON-OBJS   += aarch64/hscale.o                 \
+               aarch64/input.o                  \
                aarch64/output.o                 \
                aarch64/rgb2rgb_neon.o           \
                aarch64/yuv2rgb_neon.o           \
diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S
new file mode 100644
index 0000000000..33afa34111
--- /dev/null
+++ b/libswscale/aarch64/input.S
@@ -0,0 +1,202 @@
+/*
+ * Copyright (c) 2024 Zhao Zhili <quinkblack@foxmail.com>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+.macro rgb24_to_yuv_load_rgb, src
+        ld3             { v16.16b, v17.16b, v18.16b }, [\src]
+        uxtl            v19.8h, v16.8b             // v19: r
+        uxtl            v20.8h, v17.8b             // v20: g
+        uxtl            v21.8h, v18.8b             // v21: b
+        uxtl2           v22.8h, v16.16b            // v22: r
+        uxtl2           v23.8h, v17.16b            // v23: g
+        uxtl2           v24.8h, v18.16b            // v24: b
+.endm
+
+.macro rgb24_to_yuv_product, r, g, b, dst1, dst2, dst, coef0, coef1, coef2, right_shift
+        mov             \dst1\().16b, v6.16b                    // dst1 = const_offset
+        mov             \dst2\().16b, v6.16b                    // dst2 = const_offset
+        smlal           \dst1\().4s, \coef0\().4h, \r\().4h     // dst1 += rx * r
+        smlal           \dst1\().4s, \coef1\().4h, \g\().4h     // dst1 += gx * g
+        smlal           \dst1\().4s, \coef2\().4h, \b\().4h     // dst1 += bx * b
+        smlal2          \dst2\().4s, \coef0\().8h, \r\().8h     // dst2 += rx * r
+        smlal2          \dst2\().4s, \coef1\().8h, \g\().8h     // dst2 += gx * g
+        smlal2          \dst2\().4s, \coef2\().8h, \b\().8h     // dst2 += bx * b
+        sqshrn          \dst\().4h, \dst1\().4s, \right_shift   // dst_lower_half = dst1 >> right_shift
+        sqshrn2         \dst\().8h, \dst2\().4s, \right_shift   // dst_higher_half = dst2 >> right_shift
+.endm
+
+function ff_rgb24ToY_neon, export=1
+        cmp             w4, #0                  // check width > 0
+        ldp             w10, w11, [x5]          // w10: ry, w11: gy
+        ldr             w12, [x5, #8]           // w12: by
+        b.le            3f
+
+        mov             w9, #256                // w9 = 1 << (RGB2YUV_SHIFT - 7)
+        movk            w9, #8, lsl #16         // w9 += 32 << (RGB2YUV_SHIFT - 1)
+        dup             v6.4s, w9               // w9: const_offset
+
+        cmp             w4, #16
+        dup             v0.8h, w10
+        dup             v1.8h, w11
+        dup             v2.8h, w12
+        b.lt            2f
+1:
+        rgb24_to_yuv_load_rgb x1
+        rgb24_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9
+        rgb24_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9
+        sub             w4, w4, #16             // width -= 16
+        add             x1, x1, #48             // src += 48
+        cmp             w4, #16                 // width >= 16 ?
+        stp             q16, q17, [x0], #32     // store to dst
+        b.ge            1b
+        cbz             x4, 3f
+2:
+        ldrb            w13, [x1]               // w13: r
+        ldrb            w14, [x1, #1]           // w14: g
+        ldrb            w15, [x1, #2]           // w15: b
+
+        smaddl          x13, w13, w10, x9       // x13 = ry * r + const_offset
+        smaddl          x13, w14, w11, x13      // x13 += gy * g
+        smaddl          x13, w15, w12, x13      // x13 += by * b
+        asr             w13, w13, #9            // x13 >>= 9
+        sub             w4, w4, #1              // width--
+        add             x1, x1, #3              // src += 3
+        strh            w13, [x0], #2           // store to dst
+        cbnz            w4, 2b
+3:
+        ret
+endfunc
+
+.macro rgb24_load_uv_coeff half
+        ldp             w10, w11, [x6, #12]     // w10: ru, w11: gu
+        ldp             w12, w13, [x6, #20]     // w12: bu, w13: rv
+        ldp             w14, w15, [x6, #28]     // w14: gv, w15: bv
+    .if \half
+        mov             w9, #512
+        movk            w9, #128, lsl #16       // w9: const_offset
+    .else
+        mov             w9, #256
+        movk            w9, #64, lsl #16        // w9: const_offset
+    .endif
+        dup             v0.8h, w10
+        dup             v1.8h, w11
+        dup             v2.8h, w12
+        dup             v3.8h, w13
+        dup             v4.8h, w14
+        dup             v5.8h, w15
+        dup             v6.4s, w9
+.endm
+
+function ff_rgb24ToUV_half_neon, export=1
+        cmp             w5, #0          // check width > 0
+        b.le            3f
+
+        cmp             w5, #8
+        rgb24_load_uv_coeff half=1
+        b.lt            2f
+1:
+        ld3             { v16.16b, v17.16b, v18.16b }, [x3]
+        uaddlp          v19.8h, v16.16b         // v19: r
+        uaddlp          v20.8h, v17.16b         // v20: g
+        uaddlp          v21.8h, v18.16b         // v21: b
+
+        rgb24_to_yuv_product v19, v20, v21, v22, v23, v16, v0, v1, v2, #10
+        rgb24_to_yuv_product v19, v20, v21, v24, v25, v17, v3, v4, v5, #10
+        sub             w5, w5, #8              // width -= 8
+        add             x3, x3, #48             // src += 48
+        cmp             w5, #8                  // width >= 8 ?
+        str             q16, [x0], #16          // store dst_u
+        str             q17, [x1], #16          // store dst_v
+        b.ge            1b
+        cbz             w5, 3f
+2:
+        ldrb            w2, [x3]                // w2: r1
+        ldrb            w4, [x3, #3]            // w4: r2
+        add             w2, w2, w4              // w2 = r1 + r2
+
+        ldrb            w4, [x3, #1]            // w4: g1
+        ldrb            w7, [x3, #4]            // w7: g2
+        add             w4, w4, w7              // w4 = g1 + g2
+
+        ldrb            w7, [x3, #2]            // w7: b1
+        ldrb            w8, [x3, #5]            // w8: b2
+        add             w7, w7, w8              // w7 = b1 + b2
+
+        smaddl          x8, w2, w10, x9         // dst_u = ru * r + const_offset
+        smaddl          x8, w4, w11, x8         // dst_u += gu * g
+        smaddl          x8, w7, w12, x8         // dst_u += bu * b
+        asr             x8, x8, #10             // dst_u >>= 10
+        strh            w8, [x0], #2            // store dst_u
+
+        smaddl          x8, w2, w13, x9         // dst_v = rv * r + const_offset
+        smaddl          x8, w4, w14, x8         // dst_v += gv * g
+        smaddl          x8, w7, w15, x8         // dst_v += bv * b
+        asr             x8, x8, #10             // dst_v >>= 10
+        sub             w5, w5, #1
+        add             x3, x3, #6              // src += 6
+        strh            w8, [x1], #2            // store dst_v
+        cbnz            w5, 2b
+3:
+        ret
+endfunc
+
+function ff_rgb24ToUV_neon, export=1
+        cmp             w5, #0                  // check width > 0
+        b.le            3f
+
+        cmp             w5, #16
+        rgb24_load_uv_coeff half=0
+        b.lt            2f
+1:
+        rgb24_to_yuv_load_rgb x3
+        rgb24_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9
+        rgb24_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9
+        rgb24_to_yuv_product v19, v20, v21, v25, v26, v18, v3, v4, v5, #9
+        rgb24_to_yuv_product v22, v23, v24, v27, v28, v19, v3, v4, v5, #9
+        sub             w5, w5, #16
+        add             x3, x3, #48             // src += 48
+        cmp             w5, #16
+        stp             q16, q17, [x0], #32     // store to dst_u
+        stp             q18, q19, [x1], #32     // store to dst_v
+        b.ge            1b
+        cbz             w5, 3f
+2:
+        ldrb            w16, [x3]               // w16: r
+        ldrb            w17, [x3, #1]           // w17: g
+        ldrb            w4, [x3, #2]            // w4: b
+
+        smaddl          x8, w16, w10, x9        // x8 = ru * r + const_offset
+        smaddl          x8, w17, w11, x8        // x8 += gu * g
+        smaddl          x8, w4, w12, x8         // x8 += bu * b
+        asr             w8, w8, #9              // x8 >>= 9
+        strh            w8, [x0], #2            // store to dst_u
+
+        smaddl          x8, w16, w13, x9        // x8 = rv * r + const_offset
+        smaddl          x8, w17, w14, x8        // x8 += gv * g
+        smaddl          x8, w4, w15, x8         // x8 += bv * b
+        asr             w8, w8, #9              // x8 >>= 9
+        sub             w5, w5, #1              // width--
+        add             x3, x3, #3              // src += 3
+        strh            w8, [x1], #2            // store to dst_v
+        cbnz            w5, 2b
+3:
+        ret
+endfunc
diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c
index bbd9719a44..4c4ea39dc1 100644
--- a/libswscale/aarch64/swscale.c
+++ b/libswscale/aarch64/swscale.c
@@ -201,6 +201,20 @@ void ff_yuv2plane1_8_neon(
     default: break;                                                     \
     }
 
+void ff_rgb24ToY_neon(uint8_t *_dst, const uint8_t *src, const uint8_t *unused1,
+                      const uint8_t *unused2, int width,
+                      uint32_t *rgb2yuv, void *opq);
+
+void ff_rgb24ToUV_neon(uint8_t *_dstU, uint8_t *_dstV, const uint8_t *unused0,
+                       const uint8_t *src1,
+                       const uint8_t *src2, int width, uint32_t *rgb2yuv,
+                       void *opq);
+
+void ff_rgb24ToUV_half_neon(uint8_t *_dstU, uint8_t *_dstV, const uint8_t *unused0,
+                       const uint8_t *src1,
+                       const uint8_t *src2, int width, uint32_t *rgb2yuv,
+                       void *opq);
+
 av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)
 {
     int cpu_flags = av_get_cpu_flags();
@@ -212,5 +226,16 @@ av_cold void ff_sws_init_swscale_aarch64(SwsContext *c)
         if (c->dstBpc == 8) {
             c->yuv2planeX = ff_yuv2planeX_8_neon;
         }
+        switch (c->srcFormat) {
+        case AV_PIX_FMT_RGB24:
+            c->lumToYV12 = ff_rgb24ToY_neon;
+            if (c->chrSrcHSubSample)
+                c->chrToYV12 = ff_rgb24ToUV_half_neon;
+            else
+                c->chrToYV12 = ff_rgb24ToUV_neon;
+            break;
+        default:
+            break;
+        }
     }
 }
-- 
2.42.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME
  2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME Zhao Zhili
@ 2024-06-07 14:41   ` Rémi Denis-Courmont
  0 siblings, 0 replies; 7+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-07 14:41 UTC (permalink / raw)
  To: ffmpeg-devel

Le perjantaina 7. kesäkuuta 2024, 16.44.50 EEST Zhao Zhili a écrit :
> From: Zhao Zhili <zhilizhao@tencent.com>
> 
> ---
> v3: add ff_read_time() rather than use av_gettime_relative() to get
> nanosecond precision.
> 
>  libavutil/timer.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/libavutil/timer.h b/libavutil/timer.h
> index 2cd299eca3..3e5d5ef23f 100644
> --- a/libavutil/timer.h
> +++ b/libavutil/timer.h
> @@ -46,6 +46,8 @@
>  #include "macos_kperf.h"
>  #elif HAVE_MACH_ABSOLUTE_TIME
>  #include <mach/mach_time.h>
> +#elif HAVE_CLOCK_GETTIME
> +#include <time.h>
>  #endif
> 
>  #include "common.h"
> @@ -70,6 +72,14 @@
>  #       define AV_READ_TIME gethrtime
>  #   elif HAVE_MACH_ABSOLUTE_TIME
>  #       define AV_READ_TIME mach_absolute_time
> +#   elif HAVE_CLOCK_GETTIME && defined(CLOCK_MONOTONIC)
> +        static inline int64_t ff_read_time(void)
> +        {
> +            struct timespec ts;
> +            clock_gettime(CLOCK_MONOTONIC, &ts);
> +            return (int64_t) ts.tv_sec * 1000000000 + ts.tv_nsec;

Wouldn't INT64_C() be more idiomatic here?

> +        }
> +#       define AV_READ_TIME ff_read_time
>  #   endif
>  #endif


-- 
レミ・デニ-クールモン
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation
  2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation Zhao Zhili
@ 2024-06-10 11:59   ` Martin Storsjö
  2024-06-10 16:26     ` Zhao Zhili
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Storsjö @ 2024-06-10 11:59 UTC (permalink / raw)
  To: FFmpeg development discussions and patches; +Cc: Zhao Zhili

On Fri, 7 Jun 2024, Zhao Zhili wrote:

> From: Zhao Zhili <zhilizhao@tencent.com>
>
> Test on Apple M1:
>
> rgb24_to_uv_8_c: 0.0
> rgb24_to_uv_8_neon: 0.2
> rgb24_to_uv_128_c: 1.0
> rgb24_to_uv_128_neon: 0.5
> rgb24_to_uv_1080_c: 7.0
> rgb24_to_uv_1080_neon: 5.7
> rgb24_to_uv_1920_c: 12.5
> rgb24_to_uv_1920_neon: 9.5
> rgb24_to_uv_half_8_c: 0.2
> rgb24_to_uv_half_8_neon: 0.2
> rgb24_to_uv_half_128_c: 1.0
> rgb24_to_uv_half_128_neon: 0.5
> rgb24_to_uv_half_1080_c: 6.2
> rgb24_to_uv_half_1080_neon: 3.0
> rgb24_to_uv_half_1920_c: 11.2
> rgb24_to_uv_half_1920_neon: 5.2
> rgb24_to_y_8_c: 0.2
> rgb24_to_y_8_neon: 0.0
> rgb24_to_y_128_c: 0.5
> rgb24_to_y_128_neon: 0.5
> rgb24_to_y_1080_c: 4.7
> rgb24_to_y_1080_neon: 3.2
> rgb24_to_y_1920_c: 8.0
> rgb24_to_y_1920_neon: 5.7
>
> On Pixel 6:
>
> rgb24_to_uv_8_c: 30.7
> rgb24_to_uv_8_neon: 56.9
> rgb24_to_uv_128_c: 213.9
> rgb24_to_uv_128_neon: 173.2
> rgb24_to_uv_1080_c: 1649.9
> rgb24_to_uv_1080_neon: 1424.4
> rgb24_to_uv_1920_c: 2907.9
> rgb24_to_uv_1920_neon: 2480.7
> rgb24_to_uv_half_8_c: 36.2
> rgb24_to_uv_half_8_neon: 33.4
> rgb24_to_uv_half_128_c: 167.9
> rgb24_to_uv_half_128_neon: 99.4
> rgb24_to_uv_half_1080_c: 1293.9
> rgb24_to_uv_half_1080_neon: 778.7
> rgb24_to_uv_half_1920_c: 2292.7
> rgb24_to_uv_half_1920_neon: 1328.7
> rgb24_to_y_8_c: 19.7
> rgb24_to_y_8_neon: 27.7
> rgb24_to_y_128_c: 129.9
> rgb24_to_y_128_neon: 96.7
> rgb24_to_y_1080_c: 995.4
> rgb24_to_y_1080_neon: 767.7
> rgb24_to_y_1920_c: 1747.4
> rgb24_to_y_1920_neon: 1337.2
>
> Note both tests use clang as compiler, which has vectorization
> enabled by default with -O3.
>
> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
> ---
> v3: Fix comments.
>
> libswscale/aarch64/Makefile  |   1 +
> libswscale/aarch64/input.S   | 202 +++++++++++++++++++++++++++++++++++
> libswscale/aarch64/swscale.c |  25 +++++
> 3 files changed, 228 insertions(+)
> create mode 100644 libswscale/aarch64/input.S

No further comments from me, on this patchset. (Rémi had a comment on 2/4 
though - I don't have a strong opinion on that matter either way.)

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation
  2024-06-10 11:59   ` Martin Storsjö
@ 2024-06-10 16:26     ` Zhao Zhili
  2024-06-10 16:44       ` Rémi Denis-Courmont
  0 siblings, 1 reply; 7+ messages in thread
From: Zhao Zhili @ 2024-06-10 16:26 UTC (permalink / raw)
  To: FFmpeg development discussions and patches; +Cc: Michael Niedermayer



> On Jun 10, 2024, at 19:59, Martin Storsjö <martin@martin.st> wrote:
> 
> On Fri, 7 Jun 2024, Zhao Zhili wrote:
> 
>> From: Zhao Zhili <zhilizhao@tencent.com>
>> 
> 
> No further comments from me, on this patchset. (Rémi had a comment on 2/4 though - I don't have a strong opinion on that matter either way.)

Thanks. I have modified patch 2/4 locally.

However, git pull and git push has no response from git at source.ffmpeg.org <http://source.ffmpeg.org/>:ffmpeg.
Git pull from https://git.ffmpeg.org <https://git.ffmpeg.org/> is fine. And there is no commit after 94f2274a. Is there
anything wrong with the server?

> 
> // Martin
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org <mailto:ffmpeg-devel@ffmpeg.org>
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org <mailto:ffmpeg-devel-request@ffmpeg.org> with subject "unsubscribe".

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation
  2024-06-10 16:26     ` Zhao Zhili
@ 2024-06-10 16:44       ` Rémi Denis-Courmont
  0 siblings, 0 replies; 7+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-10 16:44 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Le maanantaina 10. kesäkuuta 2024, 19.26.17 EEST Zhao Zhili a écrit :
> However, git pull and git push has no response from git at source.ffmpeg.org
> <http://source.ffmpeg.org/>:ffmpeg. Git pull from https://git.ffmpeg.org
> <https://git.ffmpeg.org/> is fine. And there is no commit after 94f2274a.
> Is there anything wrong with the server?

Either too many poorly programmed bots downloading VLC 3.0.21 from master 
instead of mirrors, or that release gave somebody inspiration for their next 
DDoS victim. I guess.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-10 16:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20240607134452.94467-1-quinkblack@foxmail.com>
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 2/4] avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME Zhao Zhili
2024-06-07 14:41   ` Rémi Denis-Courmont
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 3/4] tests/checkasm: Fix build error when enable linux perf on Android Zhao Zhili
2024-06-07 13:44 ` [FFmpeg-devel] [PATCH v3 4/4] swscale/aarch64: Add rgb24 to yuv implementation Zhao Zhili
2024-06-10 11:59   ` Martin Storsjö
2024-06-10 16:26     ` Zhao Zhili
2024-06-10 16:44       ` Rémi Denis-Courmont

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git