From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 593F84CBC7 for ; Thu, 30 Oct 2025 11:02:39 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'iEeAzMenKfvT82ouGq8fZU18jqJ/KtMoGYY7zfR3oSQ=', expected b'fJ7FNbDMo/7dEp8OH8Z5ogH7ZymYEaFZBxdLGponvbY=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1761822148; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=iEeAzMenKfvT82ouGq8fZU18jqJ/KtMoGYY7zfR3oSQ=; b=LPw/si8KU/3Uj1YqF9VpDcQmpqTD40cuIJBORXtMGLE2Q9ieaJu5+ZprJnC6w7lCBBmo6 udPV7YKZ5p++LTUpReNjBkAWpiwXcx7IUR8teeIVbR1RXL9v07KXkYO/caJGYEbL+gaY/m5 OIX/iSyinVi0MZga6EWQH1M0xlksVBaRH4ANoJebeALupgwIzMTA+R5Civ/6OVN2/iMdLtd 9J9sL7zEybJvIS3SRfuAc6/tI+N4XlPe8Vgk6gpyQEOWkuTAAImbapS8mavSloBK2XhUsd2 9e7WaCDHK8fe5bNcvomoN84oxmZyYd+mLaHsHrL4HxOQUf6dtJgJgwN9jSwg== Received: from [172.19.0.2] (unknown [172.19.0.2]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id D427468F89A; Thu, 30 Oct 2025 13:02:28 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1761822133; b=sC21HDk9LYjAh828DQR8xizbEaO8xVR1h6fxK3d4kOEakKfo34SHNF1LE/ZQKD40KULhh HYDytj5CEJBS60VVqv6ai+S5xTmdw9ODESqgdNS/NChUriGrvHJ/XWHsYdydMW1wgnWQT8I KzALMmmYZo8O90SPwItni/63Ohdje2RsXC1e1EuHKGZYna2tuneuYLBbkEwXRKpAT5sJm4X IrAdYFhwkQ6kJ8gbAINYX6SiB2GBzx29MvuNJlIOC2xyYEMjQdHdDCHB+EgevbH+yIq1Bwa AZcS8ohToBMNXZ6YiAk7CFWa1N/wy3LELOF++O6FaIkGnzG5/sdsNAuaTI3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1761822133; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=o7iQE8/7UkB7mSe9SJ8GedI6kp85gYUmzygD1AXbuIU=; b=YmtkBZDn2BufnjUEl/IVdreI08lYwHsafZTJjNdRdnJ9xEW5L3rW5Ngan1wJkxHdWBSvc PMJOEIVd9VrrSL6EWbuOHvzv7QcYglP7KGyzMSKIKzATMHPOmPYpWwX3n4Ko518QSDpj3Xl X7EoAlk/UnYiqpf2C5A/zSOapvSwWmSdjIL0ouhzMvTvy3GyKHdPxPZ4z6Ivu0jJuDt/kg7 Srs+j3MBY3ZcURl6rQ4r7IPP4SH7/Ow2WN+msUPE+2XW0seF9eDRvcAv/2yy4b+P7bfmqKh DlDcIW4Je1n5sUteFiLf7zNOouG1HT7+EcO0BnLHFZq87RfXXlx3RnsLX1xw== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1761822126; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=fJ7FNbDMo/7dEp8OH8Z5ogH7ZymYEaFZBxdLGponvbY=; b=qcttI6g1nV2YegLTtbVyje51Z0IEAUi/N2JNBq010iQs39PqFK6DyE4bcV6vdOG8h0+3z 69+8rqeyg7dUnjNTApZbDhNV7rvK29TNr+RK560EuzvparbMUVPl7GOU666uUnxNx7YCaGJ KYWhuZeaiQSBThqCg03OggRkja7CscFjznNL/spgrboVICBMt0bQ8+cYr9xZWuOe0rrSffY pccUJjI7SlsnSInIUYjPR7bTOpox1N8b687F0KOkP8Myn3Yb/R2P1YSrRUIS2lUrfHNN3GR zAZVF97+lq2cmG4AHOH6S9Xs1oMW5dv9iWz95dgktwagRu0CfJifn1764/zA== Received: from 02c22a36bd31 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id DAE4B68F7AE for ; Thu, 30 Oct 2025 13:02:06 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 Oct 2025 11:02:06 -0000 Message-ID: <176182212706.81.6416319141458007086@7d278768979e> Message-ID-Hash: I5VVUQG3ZEBIOTPHBGBPOFEIPX4BUBRX X-Message-ID-Hash: I5VVUQG3ZEBIOTPHBGBPOFEIPX4BUBRX X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: Don't use saturated addition when unnecessary (PR #20791) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: mkver via ffmpeg-devel Cc: mkver Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #20791 opened by mkver URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20791 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20791.patch >>From 50f2e0e7ba41e4aedf36244d63c42a1381fc0336 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Thu, 30 Oct 2025 10:27:00 +0100 Subject: [PATCH 1/3] avcodec/x86/hpeldsp: Actually use constants in registers Forgotten in 36f92206bb90d6f0268749bd6fe6aa57974442db. Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/hpeldsp.asm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm index 2587e3c315..0974286b0d 100644 --- a/libavcodec/x86/hpeldsp.asm +++ b/libavcodec/x86/hpeldsp.asm @@ -428,7 +428,7 @@ cglobal %1%3_pixels8_xy2, 4,5,5 psrlw m2, 2 %else paddusw m2, m0 - pmulhrsw m2, [pw_8192] + pmulhrsw m2, m3 %endif %ifidn %1, avg movh m1, [r0+r4] @@ -450,7 +450,7 @@ cglobal %1%3_pixels8_xy2, 4,5,5 psrlw m0, 2 %else paddusw m0, m2 - pmulhrsw m0, [pw_8192] + pmulhrsw m0, m3 %endif %ifidn %1, avg movh m1, [r0+r4] -- 2.49.1 >>From a84ea10f93fbb66530eaa5ebb6f0275203d18356 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Thu, 30 Oct 2025 10:44:41 +0100 Subject: [PATCH 2/3] avcodec/x86/hpeldsp: Don't use saturated addition when unnecessary The numbers here are small (sums of values unpacked from bytes). Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/hpeldsp.asm | 48 +++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm index 0974286b0d..c92c70f5ad 100644 --- a/libavcodec/x86/hpeldsp.asm +++ b/libavcodec/x86/hpeldsp.asm @@ -423,11 +423,11 @@ cglobal %1%3_pixels8_xy2, 4,5,5 punpcklbw m0, m1 pmaddubsw m0, m4 %ifidn %3, _no_rnd - paddusw m2, m3 - paddusw m2, m0 + paddw m2, m3 + paddw m2, m0 psrlw m2, 2 %else - paddusw m2, m0 + paddw m2, m0 pmulhrsw m2, m3 %endif %ifidn %1, avg @@ -445,11 +445,11 @@ cglobal %1%3_pixels8_xy2, 4,5,5 punpcklbw m2, m1 pmaddubsw m2, m4 %ifidn %3, _no_rnd - paddusw m0, m3 - paddusw m0, m2 + paddw m0, m3 + paddw m0, m2 psrlw m0, 2 %else - paddusw m0, m2 + paddw m0, m2 pmulhrsw m0, m3 %endif %ifidn %1, avg @@ -485,8 +485,8 @@ cglobal %1%3_pixels16_xy2, 4,5,8 punpcklbw m4, m7 punpckhbw m1, m7 punpckhbw m5, m7 - paddusw m4, m0 - paddusw m5, m1 + paddw m4, m0 + paddw m5, m1 xor r4, r4 add r1, r2 .loop: @@ -498,12 +498,12 @@ cglobal %1%3_pixels16_xy2, 4,5,8 punpcklbw m2, m7 punpckhbw m1, m7 punpckhbw m3, m7 - paddusw m0, m2 - paddusw m1, m3 - paddusw m4, m6 - paddusw m5, m6 - paddusw m4, m0 - paddusw m5, m1 + paddw m0, m2 + paddw m1, m3 + paddw m4, m6 + paddw m5, m6 + paddw m4, m0 + paddw m5, m1 psrlw m4, 2 psrlw m5, 2 %ifidn %1, avg @@ -524,12 +524,12 @@ cglobal %1%3_pixels16_xy2, 4,5,8 punpcklbw m4, m7 punpckhbw m3, m7 punpckhbw m5, m7 - paddusw m4, m2 - paddusw m5, m3 - paddusw m0, m6 - paddusw m1, m6 - paddusw m0, m4 - paddusw m1, m5 + paddw m4, m2 + paddw m5, m3 + paddw m0, m6 + paddw m1, m6 + paddw m0, m4 + paddw m1, m5 psrlw m0, 2 psrlw m1, 2 %ifidn %1, avg @@ -567,8 +567,8 @@ cglobal %1_pixels16_xy2, 4,5,%2 movu m3, [r1+r4+1] pmaddubsw m2, m5 pmaddubsw m3, m5 - paddusw m0, m2 - paddusw m1, m3 + paddw m0, m2 + paddw m1, m3 pmulhrsw m0, [pw_8192] pmulhrsw m1, [pw_8192] %ifidn %1, avg @@ -587,8 +587,8 @@ cglobal %1_pixels16_xy2, 4,5,%2 movu m1, [r1+r4+1] pmaddubsw m0, m5 pmaddubsw m1, m5 - paddusw m2, m0 - paddusw m3, m1 + paddw m2, m0 + paddw m3, m1 pmulhrsw m2, [pw_8192] pmulhrsw m3, [pw_8192] %ifidn %1, avg -- 2.49.1 >>From 88f4641db2d488308f04b70cba9f285d30da6eb5 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Thu, 30 Oct 2025 11:07:43 +0100 Subject: [PATCH 3/3] avcodec/x86/hpeldsp: Don't use PAVGB macro It was only needed for MMX and there are no MMX functions here any more. Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/hpeldsp.asm | 84 +++++++++++++++++++------------------- 1 file changed, 42 insertions(+), 42 deletions(-) diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm index c92c70f5ad..cbdf0e460d 100644 --- a/libavcodec/x86/hpeldsp.asm +++ b/libavcodec/x86/hpeldsp.asm @@ -54,8 +54,8 @@ cglobal put_pixels8_x2, 4,5 pavgb m0, m2 pavgb m1, m3 %else - PAVGB m0, [r1] - PAVGB m1, [r1+r2] + pavgb m0, [r1] + pavgb m1, [r1+r2] %endif mova [r0], m0 mova [r0+r2], m1 @@ -69,8 +69,8 @@ cglobal put_pixels8_x2, 4,5 pavgb m0, m2 pavgb m1, m3 %else - PAVGB m0, [r1] - PAVGB m1, [r1+r2] + pavgb m0, [r1] + pavgb m1, [r1+r2] %endif add r1, r4 mova [r0], m0 @@ -103,8 +103,8 @@ cglobal put_no_rnd_pixels8_x2, 4,5 add r1, r4 psubusb m0, m6 psubusb m2, m6 - PAVGB m0, m1 - PAVGB m2, m3 + pavgb m0, m1 + pavgb m2, m3 mova [r0], m0 mova [r0+r2], m2 mova m0, [r1] @@ -115,8 +115,8 @@ cglobal put_no_rnd_pixels8_x2, 4,5 add r1, r4 psubusb m0, m6 psubusb m2, m6 - PAVGB m0, m1 - PAVGB m2, m3 + pavgb m0, m1 + pavgb m2, m3 mova [r0], m0 mova [r0+r2], m2 add r0, r4 @@ -143,8 +143,8 @@ cglobal %1_no_rnd_pixels8_x2_exact, 4,5 pxor m2, m4 pxor m1, m4 pxor m3, m4 - PAVGB m0, m1 - PAVGB m2, m3 + pavgb m0, m1 + pavgb m2, m3 pxor m0, m4 pxor m2, m4 %ifidn %1, avg @@ -161,8 +161,8 @@ cglobal %1_no_rnd_pixels8_x2_exact, 4,5 pxor m1, m4 pxor m2, m4 pxor m3, m4 - PAVGB m0, m1 - PAVGB m2, m3 + pavgb m0, m1 + pavgb m2, m3 pxor m0, m4 pxor m2, m4 %ifidn %1, avg @@ -198,16 +198,16 @@ cglobal put_pixels8_y2, 4,5 movu m1, [r1+r2] movu m2, [r1+r4] add r1, r4 - PAVGB m0, m1 - PAVGB m1, m2 + pavgb m0, m1 + pavgb m1, m2 mova [r0+r2], m0 mova [r0+r4], m1 movu m1, [r1+r2] movu m0, [r1+r4] add r0, r4 add r1, r4 - PAVGB m2, m1 - PAVGB m1, m0 + pavgb m2, m1 + pavgb m1, m0 mova [r0+r2], m2 mova [r0+r4], m1 add r0, r4 @@ -235,8 +235,8 @@ cglobal put_no_rnd_pixels8_y2, 4,5 mova m2, [r1+r4] add r1, r4 psubusb m1, m6 - PAVGB m0, m1 - PAVGB m1, m2 + pavgb m0, m1 + pavgb m1, m2 mova [r0+r2], m0 mova [r0+r4], m1 mova m1, [r1+r2] @@ -244,8 +244,8 @@ cglobal put_no_rnd_pixels8_y2, 4,5 add r0, r4 add r1, r4 psubusb m1, m6 - PAVGB m2, m1 - PAVGB m1, m0 + pavgb m2, m1 + pavgb m1, m0 mova [r0+r2], m2 mova [r0+r4], m1 add r0, r4 @@ -271,8 +271,8 @@ cglobal %1_no_rnd_pixels8_y2_exact, 4,5 movu m2, [r1+r2] pxor m1, m3 pxor m2, m3 - PAVGB m0, m1 - PAVGB m1, m2 + pavgb m0, m1 + pavgb m1, m2 pxor m0, m3 pxor m1, m3 %ifidn %1, avg @@ -285,8 +285,8 @@ cglobal %1_no_rnd_pixels8_y2_exact, 4,5 movu m0, [r1+r4] pxor m1, m3 pxor m0, m3 - PAVGB m2, m1 - PAVGB m1, m0 + pavgb m2, m1 + pavgb m1, m0 pxor m2, m3 pxor m1, m3 %ifidn %1, avg @@ -325,11 +325,11 @@ cglobal avg_pixels8_x2, 4,5 pavgb m0, m1 pavgb m2, m3 %else - PAVGB m0, [r1+1], m3, m5 - PAVGB m2, [r1+r2+1], m4, m5 + pavgb m0, [r1+1] + pavgb m2, [r1+r2+1] %endif - PAVGB m0, [r0], m3, m5 - PAVGB m2, [r0+r2], m4, m5 + pavgb m0, [r0] + pavgb m2, [r0+r2] add r1, r4 mova [r0], m0 mova [r0+r2], m2 @@ -341,13 +341,13 @@ cglobal avg_pixels8_x2, 4,5 pavgb m0, m1 pavgb m2, m3 %else - PAVGB m0, [r1+1], m3, m5 - PAVGB m2, [r1+r2+1], m4, m5 + pavgb m0, [r1+1] + pavgb m2, [r1+r2+1] %endif add r0, r4 add r1, r4 - PAVGB m0, [r0], m3, m5 - PAVGB m2, [r0+r2], m4, m5 + pavgb m0, [r0] + pavgb m2, [r0+r2] mova [r0], m0 mova [r0+r2], m2 add r0, r4 @@ -377,20 +377,20 @@ cglobal avg_pixels8_y2, 4,5 movu m1, [r1+r2] movu m2, [r1+r4] add r1, r4 - PAVGB m0, m1 - PAVGB m1, m2 - PAVGB m0, [r0+r2] - PAVGB m1, [r0+r4] + pavgb m0, m1 + pavgb m1, m2 + pavgb m0, [r0+r2] + pavgb m1, [r0+r4] mova [r0+r2], m0 mova [r0+r4], m1 movu m1, [r1+r2] movu m0, [r1+r4] - PAVGB m2, m1 - PAVGB m1, m0 + pavgb m2, m1 + pavgb m1, m0 add r0, r4 add r1, r4 - PAVGB m2, [r0+r2] - PAVGB m1, [r0+r4] + pavgb m2, [r0+r2] + pavgb m1, [r0+r4] mova [r0+r2], m2 mova [r0+r4], m1 add r0, r4 @@ -509,7 +509,7 @@ cglobal %1%3_pixels16_xy2, 4,5,8 %ifidn %1, avg mova m3, [r0+r4] packuswb m4, m5 - PAVGB m4, m3 + pavgb m4, m3 %else packuswb m4, m5 %endif @@ -535,7 +535,7 @@ cglobal %1%3_pixels16_xy2, 4,5,8 %ifidn %1, avg mova m3, [r0+r4] packuswb m0, m1 - PAVGB m0, m3 + pavgb m0, m3 %else packuswb m0, m1 %endif -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org