From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 1FFC94F521 for ; Fri, 27 Feb 2026 13:04:28 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'M5IEPYKsdt0+aI/zCg5f3Zp4w8lv6+K5Hc1C19nN2SA=', expected b'f+y27opvwkdo02gRpjDkjHeTfgYdFx7eFzM99fBGHMQ=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1772197455; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=M5IEPYKsdt0+aI/zCg5f3Zp4w8lv6+K5Hc1C19nN2SA=; b=prz1qO00QnWrrRZujspnRzPeepo0NY0jH3n64iqInCSky9mOWjblnTzfMsUVLN1SMgC8+ 9HEeQ0FKTOP6ouEmwMTe4E0EAfq/k+MmS7VvBxkt2KiLgiMfhjxtWXBMPYID+sKFAHNzgyP 0IfkdK5Bd3nFT0vvLVZwGmCao1MxFeMKxP4tUAae8CSEKs5kelK49sAYCy8EffDpKDyPlfI SRQdC+nAC19XOmzgpUoLrRMMvv6dTimDpTWir0S4n6XuCjasNU2qe7BzzirTlPMyt+04nrG LxjI3kQoKUN81kYABxXWNn5wbajwxUyk9WOrPBgf5H6FAxww//0WvVYNxJyg== Received: from [172.18.0.3] (unknown [172.18.0.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id C8CAE69137E; Fri, 27 Feb 2026 15:04:15 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1772197442; b=rQW8l7Kn2eZh+ydcl+FS/FcXBWlft8rx7OZo1KiuVAXdq4PKMxsmSDA+2i1oqPp33iwpw RcYDCA6dRvggSoIAEF7cV648Neua6CwNlk7Xfpv2N1xf+N6kIa1Dx8lyJBoiO5eOEKarycn YRJOyhYFZzu+a2FaGjqP1Z59Rnx4d9ps0YO41P0V0Ln2Y7i8RDg6LA2POqigvw3bECgzf++ B0IxYr0EhuM3QhGG1dFkudhTz1H+3782FEZ82jKHK/V61Wq4y9g39c5u+s/IswFrB8yVl6g N30ChH+gv5QWpeYJ9eZGiLwrPhtdbQWEJ3gMA8kRqJcoyuwOYesIGFcA4GHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1772197442; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=rj3hamJbgpd1bRL4gMpqjKEMxWa/o1rKiy1nhsO/3f0=; b=RqQAnzL6pdmIjeksf7srKApC5fI5I179MN0E6987ChBzwHeQrL+St0GtZGShiNHtVDJRq K+AlcdTZjevXulZ2E+aeA4GxGqv4z8rQJP5lng4Px4aGWCL4abVRovIEgPUQhHsfH9ykgxf U/nngxWAYP0cW74njO5tHaie8c5c+N8u/XQbii0E9+tHmcUgOcpkTxK2Uvp/pUAwkRYW6HX 10hLeMB9FoKiRrz6EyEO5rOnIHG0rsxBL742Cj1f2PU3Ab6HFJ9OgwSTCaZCKbRckIggVUd kUQ/JW1a199mOE1T09vVCg0aokRgwBgPr1m5O+0y/OKzV11AfkXA03XIzTgA== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1772197432; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=f+y27opvwkdo02gRpjDkjHeTfgYdFx7eFzM99fBGHMQ=; b=j/jJLKQv3FiOyqmyS90a0JuvrL0oOHmLQ8EYzWqD+jSm17gZ8SGMzRNba6vwpTrH9IMoI Qy6NFjv7xfy5yGL5aZvNRL5oGx3W5qVwiJJHGNfzqBhadd+tYCTXZzQ5czk/cWQ1aKkbU7a kGvo0X7e6IimnSDWaKdhiSKcdNMuwqdeH8INEQpsgaTsc9m04PUF265ypxpVqjkv5xSsxV/ ukz5ImIRShtogjEBgKcNMcL9nYzjS8CIHiF6xyBYS9RrONwHWsCMP66efJ/HjpY7vIiiFxZ v0giSduaI+ImncdDPSCocAiA5MzujSMmi8Fwyi1TSg+jaKc5hZ8DPuagCn9Q== MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Fri, 27 Feb 2026 13:03:51 -0000 Message-ID: <177219743245.25.4048680044388236040@29965ddac10e> Message-ID-Hash: ZOENJDQRSUTZRBTPKS6QQCY2PCTFI3IA X-Message-ID-Hash: ZOENJDQRSUTZRBTPKS6QQCY2PCTFI3IA X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PR] avcodec/x86/bswapdsp: Minor improvements (PR #22307) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: mkver via ffmpeg-devel Cc: mkver Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #22307 opened by mkver URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22307 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22307.patch >>From 451d53eb3db21189d9ca66a3a3b6684eb8e34efb Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Fri, 27 Feb 2026 13:19:47 +0100 Subject: [PATCH 1/3] avcodec/x86/bswapdsp: Avoid register copies No change in benchmarks here. Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/bswapdsp.asm | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/libavcodec/x86/bswapdsp.asm b/libavcodec/x86/bswapdsp.asm index 31c6c48a21..12fd494ffe 100644 --- a/libavcodec/x86/bswapdsp.asm +++ b/libavcodec/x86/bswapdsp.asm @@ -33,10 +33,10 @@ SECTION .text ; %1 = aligned/unaligned %macro BSWAP_LOOPS 1 mov r3d, r2d - sar r2d, 3 + sar r3d, 3 jz .left4_%1 %if cpuflag(avx2) - sar r2d, 1 + sar r3d, 1 jz .left8_%1 %endif .loop8_%1: @@ -65,12 +65,11 @@ SECTION .text %endif add r0, mmsize*2 add r1, mmsize*2 - dec r2d + dec r3d jnz .loop8_%1 %if cpuflag(avx2) .left8_%1: - mov r2d, r3d - test r3d, 8 + test r2d, 8 jz .left4_%1 mov%1 m0, [r1] pshufb m0, m2 @@ -79,8 +78,7 @@ SECTION .text add r0, mmsize %endif .left4_%1: - mov r2d, r3d - test r3d, 4 + test r2d, 4 jz .left mov%1 xm0, [r1] %if cpuflag(ssse3) -- 2.52.0 >>From 3db6adc772ebfadf0537390740883ab6feed2841 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Fri, 27 Feb 2026 13:24:04 +0100 Subject: [PATCH 2/3] avcodec/x86/bswapdsp: combine shifting, avoid check for AVX2 This avoids a check and a shift if >=8 elements are processed; it adds a check if < 8 elements are processed (which should be rare). No change in benchmarks here. Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/bswapdsp.asm | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/libavcodec/x86/bswapdsp.asm b/libavcodec/x86/bswapdsp.asm index 12fd494ffe..f89ca76cf1 100644 --- a/libavcodec/x86/bswapdsp.asm +++ b/libavcodec/x86/bswapdsp.asm @@ -33,11 +33,12 @@ SECTION .text ; %1 = aligned/unaligned %macro BSWAP_LOOPS 1 mov r3d, r2d +%if cpuflag(avx2) + sar r3d, 4 + jz .left8_%1 +%else sar r3d, 3 jz .left4_%1 -%if cpuflag(avx2) - sar r3d, 1 - jz .left8_%1 %endif .loop8_%1: mov%1 m0, [r1 + 0] -- 2.52.0 >>From 311a587c7f2b90f54a04bb19505736cf9f304a48 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Fri, 27 Feb 2026 13:54:21 +0100 Subject: [PATCH 3/3] avcodec/x86/bswapdsp: Avoid aligned vs unaligned codepaths for AVX2 For modern cpus (like those supporting AVX2) loads and stores using the unaligned versions of instructions are as fast as aligned ones if the address is aligned, so remove the aligned AVX2 version (and the alignment check) and just remove the unaligned one. Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/bswapdsp.asm | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/libavcodec/x86/bswapdsp.asm b/libavcodec/x86/bswapdsp.asm index f89ca76cf1..2b80d8a75e 100644 --- a/libavcodec/x86/bswapdsp.asm +++ b/libavcodec/x86/bswapdsp.asm @@ -100,10 +100,15 @@ SECTION .text ; void ff_bswap_buf(uint32_t *dst, const uint32_t *src, int w); %macro BSWAP32_BUF 0 -%if cpuflag(ssse3)||cpuflag(avx2) +%if cpuflag(avx2) +cglobal bswap32_buf, 3,4,3 + vbroadcasti128 m2, [pb_bswap32] + BSWAP_LOOPS u +%else +%if cpuflag(ssse3) cglobal bswap32_buf, 3,4,3 mov r3, r1 - VBROADCASTI128 m2, [pb_bswap32] + mova m2, [pb_bswap32] %else cglobal bswap32_buf, 3,4,5 mov r3, r1 @@ -115,6 +120,7 @@ cglobal bswap32_buf, 3,4,5 jmp .left .start_align: BSWAP_LOOPS a +%endif .left: %if cpuflag(ssse3) test r2d, 2 -- 2.52.0 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org