From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id F34814E043 for ; Sat, 10 Jan 2026 14:35:31 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'VVzyJ4F7ITWd9uQjvjEFGTCK8AcsxZOmSDImNl8wM5Y=', expected b'jigPhNmZSS6erYusZ7xB46umVq0DSu3rqEi2Cx3XQiA=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1768055722; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=VVzyJ4F7ITWd9uQjvjEFGTCK8AcsxZOmSDImNl8wM5Y=; b=t1MgRe4zHlO0UOHw6ZBa2nf43GYcxKoe0oAaj4ftijqd1665Zq/y3OrGrA8sGVMYlMZsr PAbnuFLHvmIVvxiK+EEiDcXIIFQ80VOpx9MMORYqc2odGJUzC5D91F5CVuq30YnuPR7Ua9E 4KD2ehY8rm2muXIntImpGGRz3DmZ05y25K8JSFoXbe10IzvjFt4c3SRDCXkezmyYXjJpN+h 3UGsiNvbtxthSucaGTuK1/inTNhqeyNG00FtIiKAMSAP0vY6jSV9zrhQYi7csI9MsAU6ona s+6papLH4cxPBbg7RDpGXZuD+Q9c/pBi7zpw9PZ43lzkBXh5AW5a2tUIcg2w== Received: from [172.20.0.4] (unknown [172.20.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 2A41E690DB4; Sat, 10 Jan 2026 16:35:22 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1768055700; b=d0buOMecPHI52gfcaw8yeNYsm4XJjn1jfYsoZSNt80L8BM1e4MOIoUZmdIxWVcAngz11+ PIaimKzHhFDMOOI5oVxnonsy9SOZMk6or8GQrUAQZ2vzpRuShI1UOac/oiFxrsoLb2QkKAP 2Rqgleq2hrLjTJUWZHgtU375ZZyEjpCXNRWhZu9JDVjSxx5smpc++MG3f8HL60IxadKCElt pAB5y5pvO5uqCW7OsPFwTfX56SPw7pZW0y/i4C8pzXxXWRFW2NUh/zNGvN3kHIeE4BYk/y4 FVmRYYVPgBJmP3arEd/7ky4zR2630iiXaaMBo9o3ZF7C59WF/uK8DNBuSOrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1768055700; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=J8jhb53W2rM6sjruwclertzje6LkpKSmsSjMvYAQlMU=; b=YdS6ikv7Nviagw7N+DGxvqmB9C/Lcl4roSoXtICe6kMPd8fz4CLLM2nCYkvpVYz0N5kl6 jsaBw3yjF6BR8Fz39giT/MmOOb44TDHOwpB4y/lIvWxx11yHwuc1JBrGvN+Uekdmqj3KlaO 4WIeo6fSCoSXoOzaPOzUlAa/QcvO3SH+ODbxB2U2aazWYkym3y7ARK7zLgwAp4Mu82BzNjn 7LkfBEJU0lh84UGWs/DUrIJNS9DJNx9SREGTbfzfK4c7pdLkm3XByrhfDVj2y1VkHfeyAFH rfYN3GInlvL7pvdd5NeYTK7NvgpXp8zYjMRVVmn/A8srfKPEScNqK3U0Fo8A== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1768055692; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=jigPhNmZSS6erYusZ7xB46umVq0DSu3rqEi2Cx3XQiA=; b=IMJMl4OByxT94uR7P2oOiPdd/bvVch3tnYk6EjvPny6rpwePQ4incd82z4NaN/xny3MkD AdQhiOzRF3K9L8rI7A3zUBXntkUoSYJIqCMtac0yOrVhfpkfw4kqCRDzMDNLk/L1jckk0X0 Eilu+ru2PDuDrOqGrQomWKMUJScl1g7rjMi1xZV9usvXRkxYBtA46sVLUYtWLqI6wTHHPRR wXYPBOAb7BKx7EQRX65LcMvbvCNOoshXN0FmgeTfuU/LEbbkYOrFXfWZFnPkVVm1txsxJ+v 627h1Y2DQyiLHU97Re+tvPZuPP3i7+bVoIob6S72KCOXVeeKHReltDbSNoog== Received: from f7c34508609e (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 6D4CB690D2F for ; Sat, 10 Jan 2026 16:34:52 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Sat, 10 Jan 2026 14:34:51 -0000 Message-ID: <176805569258.25.14055692357510208609@4457048688e7> Message-ID-Hash: MDZFEERNX6A5CSW6TXEGVG5BQBXVHWIJ X-Message-ID-Hash: MDZFEERNX6A5CSW6TXEGVG5BQBXVHWIJ X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PR] avcodec/bswapdsp: improve performance by remove manually unroll (PR #21427) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Zhao Zhili via ffmpeg-devel Cc: Zhao Zhili Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #21427 opened by Zhao Zhili (quink) URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21427 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21427.patch Manually unrolling loops increases code size, which can sometimes improve performance, but more often than not, it degrades performance. Keep the C version simple, and add assembly optimizations when needed. x86-clang x86-gcc-arch-native x86-msvc m1-clang rpi5-clang pi5-gcc-14 ------------------------------------------------------------------------------------------------------------- bswap_buf_c 57.3 ( 1.00x) 19.4 ( 1.00x) 55.4 ( 1.00x) 0.5 ( 1.00x) 143.5 ( 1.00x) 59.8 ( 1.00x) bswap_buf_this* 49.0 ( 1.17x) 12.5 ( 1.56x) 17.7 ( 3.13x) 0.3 ( 2.04x) 57.9 ( 2.48x) 73.5 ( 0.81x) bswap_buf_sse2 28.4 ( 2.02x) 24.3 ( 0.80x) 25.5 ( 2.18x) - - - bswap_buf_ssse3 24.6 ( 2.32x) 16.0 ( 1.22x) 19.0 ( 2.92x) - - - bswap_buf_avx2 21.2 ( 2.70x) 11.1 ( 1.74x) 11.2 ( 4.95x) - - - bswap_buf_c: C implementation before this patch bswap_buf_this: C implementation after this patch Signed-off-by: Zhao Zhili >>From e41f8e1ac5984bc9ba7abc21fcf49fc55b666e93 Mon Sep 17 00:00:00 2001 From: Zhao Zhili Date: Sat, 10 Jan 2026 22:07:16 +0800 Subject: [PATCH] avcodec/bswapdsp: improve performance by remove manually unroll Manually unrolling loops increases code size, which can sometimes improve performance, but more often than not, it degrades performance. Keep the C version simple, and add assembly optimizations when needed. x86-clang x86-gcc-arch-native x86-msvc m1-clang rpi5-clang pi5-gcc-14 ------------------------------------------------------------------------------------------------------------- bswap_buf_c 57.3 ( 1.00x) 19.4 ( 1.00x) 55.4 ( 1.00x) 0.5 ( 1.00x) 143.5 ( 1.00x) 59.8 ( 1.00x) bswap_buf_this* 49.0 ( 1.17x) 12.5 ( 1.56x) 17.7 ( 3.13x) 0.3 ( 2.04x) 57.9 ( 2.48x) 73.5 ( 0.81x) bswap_buf_sse2 28.4 ( 2.02x) 24.3 ( 0.80x) 25.5 ( 2.18x) - - - bswap_buf_ssse3 24.6 ( 2.32x) 16.0 ( 1.22x) 19.0 ( 2.92x) - - - bswap_buf_avx2 21.2 ( 2.70x) 11.1 ( 1.74x) 11.2 ( 4.95x) - - - bswap_buf_c: C implementation before this patch bswap_buf_this: C implementation after this patch Signed-off-by: Zhao Zhili --- libavcodec/bswapdsp.c | 14 +------------- 1 file changed, 1 insertion(+), 13 deletions(-) diff --git a/libavcodec/bswapdsp.c b/libavcodec/bswapdsp.c index f375ab79ac..266aeca44a 100644 --- a/libavcodec/bswapdsp.c +++ b/libavcodec/bswapdsp.c @@ -24,19 +24,7 @@ static void bswap_buf(uint32_t *dst, const uint32_t *src, int w) { - int i; - - for (i = 0; i + 8 <= w; i += 8) { - dst[i + 0] = av_bswap32(src[i + 0]); - dst[i + 1] = av_bswap32(src[i + 1]); - dst[i + 2] = av_bswap32(src[i + 2]); - dst[i + 3] = av_bswap32(src[i + 3]); - dst[i + 4] = av_bswap32(src[i + 4]); - dst[i + 5] = av_bswap32(src[i + 5]); - dst[i + 6] = av_bswap32(src[i + 6]); - dst[i + 7] = av_bswap32(src[i + 7]); - } - for (; i < w; i++) + for (int i = 0; i < w; i++) dst[i + 0] = av_bswap32(src[i + 0]); } -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org