From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 9A0A54D033 for ; Wed, 5 Nov 2025 13:11:23 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'CN95QFT/Wm0hQ5FiU2o7T+Cp4ipghSCJKj8NCb4Q8kA=', expected b'XIBmPzWjLLnfpOYDofxBuA01+QErYm2ZoG1CgLArDrk=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1762348275; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=CN95QFT/Wm0hQ5FiU2o7T+Cp4ipghSCJKj8NCb4Q8kA=; b=c+TZtvxaD3luBwkcwJFvFuYe9eDvqkJ753Hre/HYYiVb2P1GF/aUGSmz8zDTtoNS8eDsc 9AH42peoKrWbPy10YMTzSNRkpnvZTucJ7D36XUoBOuCK79K2y9vimPnEqHDUdbjlmqMnqb4 4SVvAocVtXFgr32C6W/dYLogEavtWZ1GEwHTq4w4GxJg6c0ZDZ0daoIPgTcYYQ8d83lVVit 21rqEQALgUWGyjpMyQrPs79jZfksd0nwlzY/Xp2izHm1PwvxrRtq67Kbm9F+MSHI5nyVM3V Lf6+KhwIY0cRfO/HlJN9Ct0SsTQNemnIqfgZo6mcc1F9v4ZvubifNwZCn2nQ== Received: from [172.19.0.2] (unknown [172.19.0.2]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 969C568F97A; Wed, 5 Nov 2025 15:11:15 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1762348259; b=foGdkCuRfLSOOEF4Zwjkr9vMUx3YCZNUpvbA1Pdn+/XopAktp5PCR96wpm0BFZELFiC8J ffetubIZ4+hd3fZOA2yc7L5AupU04mOXXkJagEQbKCscmsjUMXZDOKGISIMviXrGJgNeI0n Nc7KKHCbm+HzVWhdyQVhk6CCuqdogXpkKGNSfoydQ7Ltyq+4w6JnimU3njCdc08YbJF+FpG zAXGZvpUM9vaklvIj/Z7vDhWgJPoNbL9fWH/9/XbpfALOE3COR1b2tswHtz49StjcnUg4d8 IZft+952rQ2z/e6bT1//cQxzMK5CBagV2SnZ3x1IKhN1LviuJhfR9/iMBXbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1762348259; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=prMufxKLL+aENazMLGtKZxNoP0+qV/Pyh0iDmLfj5ok=; b=gxJ6dC3G0U5iP2oRGwSNNsb2epfna4EAIo7CRYCpAGlVBe2isSCHeGjFtMSbJ8WwTouXd xoLEUmaOXM/234RUFVEoLx9KdFTDGk2Hwg2G5C0+3jBFi3oj9M6GUMqRD46HD093jPja1/b tKPJR/7LRF0OeW//KrA/kI6DJAnpYuI5yr8BHWWJsVpy69fTCpRGFOuqLriEj7Asm/B08Po Sogc1Or5OQIAZW1rdAlUdg2lmPON+H49MQxcH3WU4xyG17UT8QJh8ecqy3KMAcjRs922K5z vHHn8tD+k7dEguMrzjUfNiBK9EDAirr8Vpl/RRmjwX0oFu3grZSc1P7BOQuA== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1762348250; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=XIBmPzWjLLnfpOYDofxBuA01+QErYm2ZoG1CgLArDrk=; b=oVo9TWzQmK+pFnMWXORkY+DdjX/ZYm+nWhcNCobTi8fxgSLdtNl+bxad/ULojvNeEbkxn bTv4M2B37Egkwfo0C15Z41bk8eYfj8lAr3E2sKQjFgCqDhQpgVDCYArJbfj1M3akE3YCpgJ x806c9kg+pNHpGRrW6zeULseHN3bWNN/d0FtakkG2AoYA/naAQtBK6RxdqbD+aXoCdURXZz dYb3E/ECPsauBQ5NtWQ6dMxOMbm84lrQx+YzZx/1CvG4IMzgZgj0+GOs4IgcuLncLBasoiC GTCKIK48T2+J45mKTDIM1n6+YLWF1oxChD5lSFCe3ShAH1ROhfnKZXjNf99g== Received: from 188d6d40ca7a (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 4149C68F7EE for ; Wed, 5 Nov 2025 15:10:50 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Wed, 05 Nov 2025 13:10:49 -0000 Message-ID: <176234825039.25.17628330336355125775@2cb04c0e5124> Message-ID-Hash: AVVMNX7GFRORVP6U2H6MQELZKSDLMV7Y X-Message-ID-Hash: AVVMNX7GFRORVP6U2H6MQELZKSDLMV7Y X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] avcodec/x86/h264_chromamc: Use xmm regs in chroma_mc4 SSSE3 functions (PR #20842) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: mkver via ffmpeg-devel Cc: mkver Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #20842 opened by mkver URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20842 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20842.patch Doubling the register size allowed to avoid two pmaddubsw. It is also ABI compliant (the old version lacked an emms) and the average versions no longer rely on padding (the old versions used pavgb with a memory operand reading eight bytes, although only four are needed). Old benchmarks (the latter four refer to RV40): avg_h264_chroma_mc4_8_c: 145.7 ( 1.00x) avg_h264_chroma_mc4_8_ssse3: 32.3 ( 4.51x) put_h264_chroma_mc4_8_c: 136.1 ( 1.00x) put_h264_chroma_mc4_8_ssse3: 29.0 ( 4.70x) avg_chroma_mc4_c: 162.1 ( 1.00x) avg_chroma_mc4_ssse3: 31.1 ( 5.22x) put_chroma_mc4_c: 137.5 ( 1.00x) put_chroma_mc4_ssse3: 28.6 ( 4.81x) New benchmarks: avg_h264_chroma_mc4_8_c: 146.7 ( 1.00x) avg_h264_chroma_mc4_8_ssse3: 26.5 ( 5.53x) put_h264_chroma_mc4_8_c: 136.8 ( 1.00x) put_h264_chroma_mc4_8_ssse3: 22.5 ( 6.09x) avg_chroma_mc4_c: 165.5 ( 1.00x) avg_chroma_mc4_ssse3: 27.2 ( 6.08x) put_chroma_mc4_c: 138.1 ( 1.00x) put_chroma_mc4_ssse3: 23.2 ( 5.96x) >>From 16296019a93e612ba4d07495e9bc85c49dbc1aaf Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Wed, 5 Nov 2025 12:46:50 +0100 Subject: [PATCH] avcodec/x86/h264_chromamc: Use xmm regs in chroma_mc4 SSSE3 functions Doubling the register size allowed to avoid two pmaddubsw. It is also ABI compliant (the old version lacked an emms) and the average versions no longer rely on padding (the old versions used pavgb with a memory operand reading eight bytes, although only four are needed). Old benchmarks (the latter four refer to RV40): avg_h264_chroma_mc4_8_c: 145.7 ( 1.00x) avg_h264_chroma_mc4_8_ssse3: 32.3 ( 4.51x) put_h264_chroma_mc4_8_c: 136.1 ( 1.00x) put_h264_chroma_mc4_8_ssse3: 29.0 ( 4.70x) avg_chroma_mc4_c: 162.1 ( 1.00x) avg_chroma_mc4_ssse3: 31.1 ( 5.22x) put_chroma_mc4_c: 137.5 ( 1.00x) put_chroma_mc4_ssse3: 28.6 ( 4.81x) New benchmarks: avg_h264_chroma_mc4_8_c: 146.7 ( 1.00x) avg_h264_chroma_mc4_8_ssse3: 26.5 ( 5.53x) put_h264_chroma_mc4_8_c: 136.8 ( 1.00x) put_h264_chroma_mc4_8_ssse3: 22.5 ( 6.09x) avg_chroma_mc4_c: 165.5 ( 1.00x) avg_chroma_mc4_ssse3: 27.2 ( 6.08x) put_chroma_mc4_c: 138.1 ( 1.00x) put_chroma_mc4_ssse3: 23.2 ( 5.96x) Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/h264_chromamc.asm | 89 +++++++++++++++++--------------- 1 file changed, 46 insertions(+), 43 deletions(-) diff --git a/libavcodec/x86/h264_chromamc.asm b/libavcodec/x86/h264_chromamc.asm index 6a65d5cabd..7c896db179 100644 --- a/libavcodec/x86/h264_chromamc.asm +++ b/libavcodec/x86/h264_chromamc.asm @@ -276,51 +276,57 @@ cglobal %1_%2_chroma_mc8%3, 6, 7+UNIX64, 8 %endmacro %macro chroma_mc4_ssse3_func 2 -cglobal %1_%2_chroma_mc4, 6, 7+UNIX64, 0 - movq m5, [pw_32] +cglobal %1_%2_chroma_mc4, 6, 7+UNIX64, 8 + mova m5, [pw_32] ..@%1_%2_chroma_mc4_after_init_ %+ cpuname: - mov r6, r4 + mov r6d, r4d shl r4d, 8 - sub r4d, r6d - mov r6, 8 - add r4d, 8 ; x*288+8 - sub r6d, r5d - imul r6d, r4d ; (8-y)*(x*255+8) = (8-y)*x<<8 | (8-y)*(8-x) - imul r4d, r5d ; y *(x*255+8) = y *x<<8 | y *(8-x) + movd m0, [r1] + sub r6d, 8 + sub r4d, r6d ; x << 8 | (8-x) + mov r6d, r5d + shl r5d, 16 + movd m1, [r1+1] + sub r6d, 8 + sub r5d, r6d ; y << 16 | (8-y) + imul r4d, r5d ; xy << 24 | (8-x)y << 16 | x(8-y) << 8 | (8-x)(8-y) + add r1, r2 - movd m7, r6d - movd m6, r4d - movd m0, [r1 ] - pshufw m7, m7, 0 - punpcklbw m0, [r1+1] - pshufw m6, m6, 0 + movd m6, r4d ; ABCD + punpcklwd m6, m6 ; ABABCDCD + pshufd m7, m6, 0x55 ; CDCDCDCDCDCDCDCD + punpcklbw m0, m1 + pshufd m6, m6, 0x0 ; ABABABABABABABAB .next2rows: - movd m1, [r1+r2*1 ] - movd m3, [r1+r2*2 ] - punpcklbw m1, [r1+r2*1+1] - punpcklbw m3, [r1+r2*2+1] - lea r1, [r1+r2*2] - movq m2, m1 - movq m4, m3 - pmaddubsw m0, m7 - pmaddubsw m1, m6 - pmaddubsw m2, m7 - pmaddubsw m3, m6 + movd m1, [r1] + movd m2, [r1+1] + movd m3, [r1+r2] + movd m4, [r1+r2+1] + punpcklbw m1, m2 + punpcklqdq m0, m1 + pmaddubsw m0, m6 + punpcklbw m3, m4 + punpcklqdq m1, m3 + pmaddubsw m1, m7 +%ifidn %1, avg + movd m2, [r0] + movd m4, [r0+r2] +%endif paddw m0, m5 - paddw m2, m5 - paddw m1, m0 - paddw m3, m2 - psrlw m1, 6 - movq m0, m4 - psrlw m3, 6 - packuswb m1, m1 - packuswb m3, m3 - CHROMAMC_AVG m1, [r0 ] - CHROMAMC_AVG m3, [r0+r2] - movd [r0 ], m1 - movd [r0+r2], m3 + lea r1, [r1+r2*2] + paddw m0, m1 + psrlw m0, 6 + packuswb m0, m0 + pshufd m1, m0, 0x1 +%ifidn %1, avg + pavgb m0, m2 + pavgb m1, m4 +%endif sub r3d, 2 + movd [r0], m0 + movd [r0+r2], m1 + mova m0, m3 lea r0, [r0+r2*2] jg .next2rows RET @@ -379,26 +385,23 @@ cglobal %1_%2_chroma_mc4, 6, 7+UNIX64, 0 %macro rv40_chroma_mc4_func 1 ; put vs avg %if CONFIG_RV40_DECODER - cglobal rv40_%1_chroma_mc4, 6, 7+UNIX64, 0 + cglobal rv40_%1_chroma_mc4, 6, 7+UNIX64, 8 rv40_get_bias m5 jmp ..@%1_h264_chroma_mc4_after_init_ %+ cpuname %endif %endmacro -%define CHROMAMC_AVG NOTHING INIT_XMM ssse3 +%define CHROMAMC_AVG NOTHING chroma_mc8_ssse3_func put, h264, _rnd chroma_mc8_ssse3_func put, vc1, _nornd rv40_chroma_mc8_func put -INIT_MMX ssse3 chroma_mc4_ssse3_func put, h264 rv40_chroma_mc4_func put %define CHROMAMC_AVG DIRECT_AVG -INIT_XMM ssse3 chroma_mc8_ssse3_func avg, h264, _rnd chroma_mc8_ssse3_func avg, vc1, _nornd rv40_chroma_mc8_func avg -INIT_MMX ssse3 chroma_mc4_ssse3_func avg, h264 rv40_chroma_mc4_func avg -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org