From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 333184E877 for ; Fri, 6 Feb 2026 22:06:58 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'WMW6wYpcxTeYVKb15waXQwP8mDIAaUO+7XKtYtIo6hI=', expected b'1w5IofunsjgIvXWxbcbbN3BN4ZZrossi9PxuVpyeSM4=')) header.d=christle.is header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1770415605; h=date : to : message-id : mime-version : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=kYIGbVtJLeL1naZO8gKHfyvKI+rZfPqdQl0N/1XsjyM=; b=TIhOebputZ/5eZt8xd4L6AwowoBVJdt3g6YvatDueaSd/HkPnpd7IH1rJjwHyynsiB1Ik ChOTmSDpf7aV1SXSzBje2JF7pt/N9YROcvROBxC0r0bMFyGNTiOwsmGw/zfHPJ0JDNNJgyk VSvT3Y+B+HibHUz/maLXhmu2sjg2b8kdq7QLtDWzFdO0FZA5wV8a5T1tj0lQqL05m8lQAnc qSRHl/c8dbm4MtWnKG7dCPUoKESVu2cuXUeBJayJ2IR6lqdjQpV5plvEZQeARzLU6lZ8+MJ M7YgyqqiGj8bTXe3AdGqeYKoOFVvJ3BcFGycMEcf9nDqhFskxW9kg6To5gjg== Received: from [172.20.0.4] (unknown [172.20.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 96A1F691103; Sat, 7 Feb 2026 00:06:45 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1770415590; b=T5Z3Ns82ag+xPKQznQowPivKZpvM+Mh0RDbY1qDMZg6O3Pw0HCO4au7O5EXMRpz10Mjqv Z5LCO1aXNsdHLcKNUUUeyEB/l9XozCr02R33j671ddmenayGFSAZjVFAg4Od2Fzun8fJx1+ +FIE99j084GFaaVRrbyCh7pb75/alGQKTryw2eP8du2zv78s6PX+2aLz/za0eoXP68Jdris HvPRprUI88m4NXPBezAYz5iidzVp5cCPa9CSNOdWSSpO4VAvjvr1pj21tcJorhp21MefN3w p6SgrOMzNjaE93MG2cVgToMtUyPXE1HfiSp8UsXzVGapZLNeMTZE5LGUWDxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1770415590; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=WMW6wYpcxTeYVKb15waXQwP8mDIAaUO+7XKtYtIo6hI=; b=R/awe2g6HSvVaLfsffp7Cc4LaJZZWBMMWpqwg6ivKSq3adD0QZ4u17YiVYATqtLjNiFb0 qSGxrKJTuHxK/O8IhqsnhqfCy1HGUQI9IbJ/iynQ7uNXr8yzpbeRVoCucwj0NWNijuyYD55 G/0ri9VP9MhWlR268M2p6lE3y+PoHoJky7kyIHUjkfLLdFnpQBnz2l8rkGkyMQG4jVr1oyX mpHeXbzoA+IWZ+45K0GOKGtj0eWivFvSOsV89jpbBUYB9GX4SJBO/TCh61MLbbiP5acuwHe fJoiElVhVYrv4z+u4DUeHwlSg7QdUu3PsbOKd1pSoKbU6RcPEw+RWYBfS2Mg== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=christle.is; arc=none; dmarc=pass header.from=christle.is policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=christle.is; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=christle.is policy.dmarc=quarantine Received: from mail-244117.protonmail.ch (mail-244117.protonmail.ch [109.224.244.117]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 6D341690881 for ; Sat, 7 Feb 2026 00:06:16 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=christle.is; s=protonmail2; t=1770415575; x=1770674775; bh=1w5IofunsjgIvXWxbcbbN3BN4ZZrossi9PxuVpyeSM4=; h=Date:To:From:Cc:Subject:Message-ID:Feedback-ID:From:To:Cc:Date: Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector; b=gMqfdtJs2pYCQmyBgqY7A2tHklE8s+lVON5llk/cwM+wcVrD1fzirT0SWX+2ysT7H qId0hYDRa+xXwZbHGWTUZDLEH6ij7OSLEDlzzqGPSYoL0JWxa+aCQhHmzKFw44CBHU FQYi8lB0rwWaa4cn+xGmF2WPRQvR10dIkbNREbgD1RUdLrT0yuCwL2TMUxuLpNRb62 UFZysh+SEbscT4CyLyyzZIigfWhyrReADD8UrnHgAfajb6C4Km+VFQyinGEJu8gbAp P+BbQB4UmlbQkOSDlYz8GcjWqdR6o+xsilWuly+DV4ypFeB2WtHWMP+PQ9NeW9j5fp 1mLKpIUq/odiw== Date: Fri, 06 Feb 2026 22:06:11 +0000 To: ffmpeg-devel@ffmpeg.org Message-ID: <20260206220606.7865-1-dev@christle.is> Feedback-ID: 179972264:user:proton X-Pm-Message-ID: 153fdc29f7875bed0abd74f87117e9b56cb17a57 MIME-Version: 1.0 Message-ID-Hash: SN3MELCD76YOELV5E3HC53BPUSDJSPW5 X-Message-ID-Hash: SN3MELCD76YOELV5E3HC53BPUSDJSPW5 X-MailFrom: SRS0=Q+0Z=AK=christle.is=dev@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] swscale/aarch64: add NEON rgb24tobgr24 byte-swap List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: David Christle via ffmpeg-devel Cc: David Christle Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: Add a NEON rgb24tobgr24 using ld3/st3 to swap R and B channels in packed 24bpp RGB buffers. Handles all input sizes with a 16-pixel NEON fast path, 8-pixel NEON cleanup, and scalar tail. checkasm --bench on Apple M3 Max (1920*3 = 5760 bytes): rgb24tobgr24_c: 872.3 ( 1.00x) rgb24tobgr24_neon: 62.4 (13.98x) Signed-off-by: David Christle --- libswscale/aarch64/rgb2rgb.c | 3 +++ libswscale/aarch64/rgb2rgb_neon.S | 43 +++++++++++++++++++++++++++++++ tests/checkasm/sw_rgb.c | 31 ++++++++++++++++++++++ 3 files changed, 77 insertions(+) diff --git a/libswscale/aarch64/rgb2rgb.c b/libswscale/aarch64/rgb2rgb.c index f474228298..5873439db5 100644 --- a/libswscale/aarch64/rgb2rgb.c +++ b/libswscale/aarch64/rgb2rgb.c @@ -51,6 +51,8 @@ static void rgb24toyv12(const uint8_t *src, uint8_t *ydst, uint8_t *udst, } } +void ff_rgb24tobgr24_neon(const uint8_t *src, uint8_t *dst, int src_size); + void ff_interleave_bytes_neon(const uint8_t *src1, const uint8_t *src2, uint8_t *dest, int width, int height, int src1Stride, int src2Stride, int dstStride); @@ -85,6 +87,7 @@ av_cold void rgb2rgb_init_aarch64(void) if (have_neon(cpu_flags)) { ff_rgb24toyv12 = rgb24toyv12; + rgb24tobgr24 = ff_rgb24tobgr24_neon; interleaveBytes = ff_interleave_bytes_neon; deinterleaveBytes = ff_deinterleave_bytes_neon; shuffle_bytes_0321 = ff_shuffle_bytes_0321_neon; diff --git a/libswscale/aarch64/rgb2rgb_neon.S b/libswscale/aarch64/rgb2rgb_neon.S index f6d625f11f..25e6c73f42 100644 --- a/libswscale/aarch64/rgb2rgb_neon.S +++ b/libswscale/aarch64/rgb2rgb_neon.S @@ -241,6 +241,49 @@ function ff_rgb24toyv12_neon, export=1 ret endfunc +// void ff_rgb24tobgr24_neon(const uint8_t *src, uint8_t *dst, int src_size); +function ff_rgb24tobgr24_neon, export=1 + // x0 = src, x1 = dst, w2 = src_size (bytes) + + // Fast path: 48 bytes (16 pixels) per iteration + subs w2, w2, #48 + b.lt 2f +1: + ld3 {v0.16b, v1.16b, v2.16b}, [x0], #48 + mov v3.16b, v0.16b + mov v0.16b, v2.16b + mov v2.16b, v3.16b + st3 {v0.16b, v1.16b, v2.16b}, [x1], #48 + subs w2, w2, #48 + b.ge 1b +2: + add w2, w2, #48 + // Medium path: 24 bytes (8 pixels) + cmp w2, #24 + b.lt 3f + ld3 {v0.8b, v1.8b, v2.8b}, [x0], #24 + mov v3.8b, v0.8b + mov v0.8b, v2.8b + mov v2.8b, v3.8b + sub w2, w2, #24 + st3 {v0.8b, v1.8b, v2.8b}, [x1], #24 +3: + // Scalar tail: 3 bytes (1 pixel) at a time + cmp w2, #3 + b.lt 4f +5: + ldrb w3, [x0], #1 + ldrb w4, [x0], #1 + ldrb w5, [x0], #1 + subs w2, w2, #3 + strb w5, [x1], #1 + strb w4, [x1], #1 + strb w3, [x1], #1 + b.gt 5b +4: + ret +endfunc + // void ff_interleave_bytes_neon(const uint8_t *src1, const uint8_t *src2, // uint8_t *dest, int width, int height, // int src1Stride, int src2Stride, int dstStride); diff --git a/tests/checkasm/sw_rgb.c b/tests/checkasm/sw_rgb.c index 6edfc93b0b..baeb34b465 100644 --- a/tests/checkasm/sw_rgb.c +++ b/tests/checkasm/sw_rgb.c @@ -834,6 +834,37 @@ void checkasm_check_sw_rgb(void) check_shuffle_bytes(shuffle_bytes_2130, "shuffle_bytes_2130"); report("shuffle_bytes_2130"); + { + /* rgb24tobgr24 operates on 3-byte pixels, so test widths must be + * multiples of 3 to avoid reading past the source buffer. */ + static const int rgb24_width[] = {3, 12, 24, 36, 48, 126, 1920 * 3}; + int i; +#define RGB24_BENCH_WIDTH (1920 * 3) + LOCAL_ALIGNED_32(uint8_t, src0, [RGB24_BENCH_WIDTH]); + LOCAL_ALIGNED_32(uint8_t, src1, [RGB24_BENCH_WIDTH]); + LOCAL_ALIGNED_32(uint8_t, dst0, [RGB24_BENCH_WIDTH]); + LOCAL_ALIGNED_32(uint8_t, dst1, [RGB24_BENCH_WIDTH]); + + declare_func(void, const uint8_t *src, uint8_t *dst, int src_size); + + memset(dst0, 0, RGB24_BENCH_WIDTH); + memset(dst1, 0, RGB24_BENCH_WIDTH); + randomize_buffers(src0, RGB24_BENCH_WIDTH); + memcpy(src1, src0, RGB24_BENCH_WIDTH); + + if (check_func(rgb24tobgr24, "rgb24tobgr24")) { + for (i = 0; i < FF_ARRAY_ELEMS(rgb24_width); i++) { + call_ref(src0, dst0, rgb24_width[i]); + call_new(src1, dst1, rgb24_width[i]); + if (memcmp(dst0, dst1, rgb24_width[i])) + fail(); + } + bench_new(src0, dst0, RGB24_BENCH_WIDTH); + } +#undef RGB24_BENCH_WIDTH + } + report("rgb24tobgr24"); + check_uyvy_to_422p(); report("uyvytoyuv422"); -- 2.52.0 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org