From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id EB1064C57F for ; Tue, 14 Oct 2025 13:17:16 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'a1JtZaUMWvkDr1wjqZ4guUzk2kmok59Kx/sP6TIYfCs=', expected b'GD8k9OzaoqDwaUSAfzQwCbI5zYq1KRuK9GqCs7G915A=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1760447825; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=a1JtZaUMWvkDr1wjqZ4guUzk2kmok59Kx/sP6TIYfCs=; b=M6uMQtHAgs46bsVc096NWJHGgo6FFT9F/zDzGnU/optM9PgncV/kuhS86TKVL5Rc8Gaau RrPX4IZbW9sNwkf4BQICKikNxwNqu/TiIlI17/JpWeqi8mjSir0d0Fsvbns7bGvVoN0W7Dp 0JG92HYjD9rCSh4iUg4MQnhhM9A3CT62b67mVk64D2KCi2wSJxOU3OjOn+zLxiEABHKlS3V Wx+KwVCeVf0cUkpIREMiYPgpGKT0qRu4d2K9ojTCGLBqMJjfBA2xXO19iTx0bnBn/2HBcXG UGNpDntl2FxHSzIgxEO9ec6N2AZZ9Yr26Lpb5MLl6U2ygC2r+oNPgKk+ufZw== Received: from [172.19.0.4] (unknown [172.19.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 6BF9868F2E4; Tue, 14 Oct 2025 16:17:05 +0300 (EEST) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1760447821; b=eNC41FkBtK2TMX15mX1llIkTJtj9+dVSUiqRL71w4j5ffgP/VDAQdooK5WZqkb2tC/Osq yRyUvJ3zpIwNLlsS678xkehMTtqs0mDI7doUoRW6S253CPajQKz72vhTW4JhuLBX7raHfHC IqUNfd1ZMV11PdkKZXG/3tnpb4EJCaukb+f2rsYXIyPXFo8QC/2kU7NPdAgdKTA0ADVp9Tw RNSTKh8Yf13SK6JoPMwoNbo9fU+n7rr/UV8oeELCf7hIkhza3NRjII9d7dg9jNLNq7kGqca LS06M6tWiKynaC9GR6nTCfG6KU5b/kjZcgdlDbQRvL0HVxfL0A5c73XpGG/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1760447821; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=EQHTwzVbX1XkzJOiOagXtYFCLEUOslQsMu9u597cGoQ=; b=Xg4IsSzRMippLMrRbdvsJb7UdlDWBxFCHV+RuHdePgG2COGxA+17aWftUXLZhGNeaCd/l HpgtXmyk4f9oDIShmqWoZFjWI2I06PA1xvmH8A12zsvESJHhy9I61H3rwtgKg68Z/ldhTxD SQUbXrVB5j9NohVMFoMmbNwAo/HXFZYi7PuA6lOhTIBGjxzLmVHVwm4Ay3lX/s2Fzae+Jd+ xPlRcwtXZqv4P/XSp9VBdDvApHc2BMnNr77FA7CDL/fT80oNCi/Qsbk5NZt5MLchnqRYw99 dSXvbKRifC2yxxQEjpgXyevjx22l47GYVLBAkX0402awPaQ+afisr0cCM2pg== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1760447813; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=GD8k9OzaoqDwaUSAfzQwCbI5zYq1KRuK9GqCs7G915A=; b=ULq0x/Cx6FMeGhTMTqFFg82pNv7zh143DxS8a7TpKiu6aw6VJfNNFTkhmfzkFwbIaI6tQ xzCncR2k+dRb+UGqE45TLXf7sYnk3Q7U/e/PUs9FGWaiQiSgdCPF3kcAJkxRogdsooiy7kZ 3xuz6x0f4fJRcG/CX2y7tEeq5A/Hn2nO/r4vGHsej+msdXYPIdNmiXoDMsXYoFiJ9nFweq8 oJ1aIEOl0nySiV+y6m1VMyljQXZG7p70jCkFiVmvDt+MYdmq9zSK4ZHi0ENUwoZQjtjiXzZ 0ne1lJ2510nVPUQzFE64b8uOF4R7UVUfxlg0/iOTf/i8qMm7JeiFN/6z8S9A== Received: from be50bb5a3685 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id A013D68F22D for ; Tue, 14 Oct 2025 16:16:53 +0300 (EEST) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Tue, 14 Oct 2025 13:16:53 -0000 Message-ID: <176044781379.25.2773833633271337279@bf907ddaa564> Message-ID-Hash: TKDJCY7A3KXPR2JOLMVVHKUFRDAPJGZX X-Message-ID-Hash: TKDJCY7A3KXPR2JOLMVVHKUFRDAPJGZX X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 (PR #20706) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: mkver via ffmpeg-devel Cc: mkver Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #20706 opened by mkver URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20706 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20706.patch This has the advantage of not violating the ABI by using MMX registers without issuing emms; it e.g. allows to remove an emms_c from bink.c. This commit uses GP registers on Unix64 (there are not enough volatile registers to do likewise on Win64) which reduces codesize and is faster on some CPUs. >>From dee82cd1a40c7ce05bfdc9a35ab2dcd453b60f26 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Tue, 14 Oct 2025 15:06:13 +0200 Subject: [PATCH] avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 This has the advantage of not violating the ABI by using MMX registers without issuing emms; it e.g. allows to remove an emms_c from bink.c. This commit uses GP registers on Unix64 (there are not enough volatile registers to do likewise on Win64) which reduces codesize and is faster on some CPUs. Signed-off-by: Andreas Rheinhardt --- libavcodec/bink.c | 2 -- libavcodec/x86/cavsdsp.c | 11 +---------- libavcodec/x86/fpel.asm | 22 ++++++++++++++++------ libavcodec/x86/fpel.h | 8 ++++---- libavcodec/x86/hpeldsp_init.c | 14 +++----------- libavcodec/x86/qpeldsp_init.c | 4 ++-- libavcodec/x86/vc1dsp_init.c | 4 ++-- 7 files changed, 28 insertions(+), 37 deletions(-) diff --git a/libavcodec/bink.c b/libavcodec/bink.c index ef8e974589..e5300be000 100644 --- a/libavcodec/bink.c +++ b/libavcodec/bink.c @@ -21,7 +21,6 @@ */ #include "libavutil/attributes.h" -#include "libavutil/emms.h" #include "libavutil/imgutils.h" #include "libavutil/mem.h" #include "libavutil/mem_internal.h" @@ -1297,7 +1296,6 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *frame, if (get_bits_count(&gb) >= bits_count) break; } - emms_c(); if (c->version > 'b') { if ((ret = av_frame_replace(c->last, frame)) < 0) diff --git a/libavcodec/x86/cavsdsp.c b/libavcodec/x86/cavsdsp.c index d14b472d54..e333bbee49 100644 --- a/libavcodec/x86/cavsdsp.c +++ b/libavcodec/x86/cavsdsp.c @@ -46,13 +46,6 @@ static void cavs_idct8_add_sse2(uint8_t *dst, int16_t *block, ptrdiff_t stride) #endif /* HAVE_SSE2_EXTERNAL */ -static av_cold void cavsdsp_init_mmx(CAVSDSPContext *c) -{ -#if HAVE_MMX_EXTERNAL - c->put_cavs_qpel_pixels_tab[1][0] = ff_put_pixels8x8_mmx; -#endif /* HAVE_MMX_EXTERNAL */ -} - #if HAVE_SSE2_EXTERNAL #define DEF_QPEL(OPNAME) \ void ff_ ## OPNAME ## _cavs_qpel8_mc20_sse2(uint8_t *dst, const uint8_t *src, ptrdiff_t stride); \ @@ -98,9 +91,6 @@ av_cold void ff_cavsdsp_init_x86(CAVSDSPContext *c) { av_unused int cpu_flags = av_get_cpu_flags(); - if (X86_MMX(cpu_flags)) - cavsdsp_init_mmx(c); - #if HAVE_MMX_EXTERNAL if (EXTERNAL_MMXEXT(cpu_flags)) { c->avg_cavs_qpel_pixels_tab[1][0] = ff_avg_pixels8x8_mmxext; @@ -113,6 +103,7 @@ av_cold void ff_cavsdsp_init_x86(CAVSDSPContext *c) c->put_cavs_qpel_pixels_tab[0][ 4] = put_cavs_qpel16_mc01_sse2; c->put_cavs_qpel_pixels_tab[0][ 8] = put_cavs_qpel16_mc02_sse2; c->put_cavs_qpel_pixels_tab[0][12] = put_cavs_qpel16_mc03_sse2; + c->put_cavs_qpel_pixels_tab[1][ 0] = ff_put_pixels8x8_sse2; c->put_cavs_qpel_pixels_tab[1][ 2] = ff_put_cavs_qpel8_mc20_sse2; c->put_cavs_qpel_pixels_tab[1][ 4] = put_cavs_qpel8_mc01_sse2; c->put_cavs_qpel_pixels_tab[1][ 8] = ff_put_cavs_qpel8_mc02_sse2; diff --git a/libavcodec/x86/fpel.asm b/libavcodec/x86/fpel.asm index 68a05310f2..e4becca5fb 100644 --- a/libavcodec/x86/fpel.asm +++ b/libavcodec/x86/fpel.asm @@ -27,7 +27,7 @@ SECTION .text ; void ff_put/avg_pixels(uint8_t *block, const uint8_t *pixels, ; ptrdiff_t line_size, int h) -%macro OP_PIXELS 2 +%macro OP_PIXELS 2-3 0 %if %2 == mmsize/2 %define LOAD movh %define SAVE movh @@ -35,14 +35,25 @@ SECTION .text %define LOAD movu %define SAVE mova %endif -cglobal %1_pixels%2x%2, 3,5,4 +cglobal %1_pixels%2x%2, 3,5+4*%3,%3 ? 4 : 0 mov r3d, %2 jmp %1_pixels%2_after_prologue -cglobal %1_pixels%2, 4,5,4 +cglobal %1_pixels%2, 4,5+4*%3,%3 ? 4 : 0 %1_pixels%2_after_prologue: lea r4, [r2*3] .loop: +%if %3 +; Use GPRs on UNIX64 for put8, but not on Win64 due to a lack of volatile GPRs + mov r5q, [r1] + mov r6q, [r1+r2] + mov r7q, [r1+r2*2] + mov r8q, [r1+r4] + mov [r0], r5q + mov [r0+r2], r6q + mov [r0+r2*2], r7q + mov [r0+r4], r8q +%else LOAD m0, [r1] LOAD m1, [r1+r2] LOAD m2, [r1+r2*2] @@ -57,6 +68,7 @@ cglobal %1_pixels%2, 4,5,4 SAVE [r0+r2], m1 SAVE [r0+r2*2], m2 SAVE [r0+r4], m3 +%endif sub r3d, 4 lea r1, [r1+r2*4] lea r0, [r0+r2*4] @@ -64,12 +76,10 @@ cglobal %1_pixels%2, 4,5,4 RET %endmacro -INIT_MMX mmx -OP_PIXELS put, 8 - INIT_MMX mmxext OP_PIXELS avg, 8 INIT_XMM sse2 +OP_PIXELS put, 8, UNIX64 OP_PIXELS put, 16 OP_PIXELS avg, 16 diff --git a/libavcodec/x86/fpel.h b/libavcodec/x86/fpel.h index 598a7a6f63..0b0056021e 100644 --- a/libavcodec/x86/fpel.h +++ b/libavcodec/x86/fpel.h @@ -30,10 +30,10 @@ void ff_avg_pixels16_sse2(uint8_t *block, const uint8_t *pixels, ptrdiff_t line_size, int h); void ff_avg_pixels16x16_sse2(uint8_t *block, const uint8_t *pixels, ptrdiff_t line_size); -void ff_put_pixels8_mmx(uint8_t *block, const uint8_t *pixels, - ptrdiff_t line_size, int h); -void ff_put_pixels8x8_mmx(uint8_t *block, const uint8_t *pixels, - ptrdiff_t line_size); +void ff_put_pixels8_sse2(uint8_t *block, const uint8_t *pixels, + ptrdiff_t line_size, int h); +void ff_put_pixels8x8_sse2(uint8_t *block, const uint8_t *pixels, + ptrdiff_t line_size); void ff_put_pixels16_sse2(uint8_t *block, const uint8_t *pixels, ptrdiff_t line_size, int h); void ff_put_pixels16x16_sse2(uint8_t *block, const uint8_t *pixels, diff --git a/libavcodec/x86/hpeldsp_init.c b/libavcodec/x86/hpeldsp_init.c index 1640ff83b6..3500ad1878 100644 --- a/libavcodec/x86/hpeldsp_init.c +++ b/libavcodec/x86/hpeldsp_init.c @@ -74,14 +74,6 @@ void ff_avg_pixels8_x2_mmxext(uint8_t *block, const uint8_t *pixels, void ff_avg_pixels8_y2_mmxext(uint8_t *block, const uint8_t *pixels, ptrdiff_t line_size, int h); -static void hpeldsp_init_mmx(HpelDSPContext *c, int flags) -{ -#if HAVE_MMX_EXTERNAL - c->put_no_rnd_pixels_tab[1][0] = - c->put_pixels_tab[1][0] = ff_put_pixels8_mmx; -#endif -} - static void hpeldsp_init_mmxext(HpelDSPContext *c, int flags) { #if HAVE_MMXEXT_EXTERNAL @@ -115,6 +107,9 @@ static void hpeldsp_init_sse2(HpelDSPContext *c, int flags) c->put_no_rnd_pixels_tab[0][2] = ff_put_no_rnd_pixels16_y2_sse2; c->put_no_rnd_pixels_tab[0][3] = ff_put_no_rnd_pixels16_xy2_sse2; + c->put_no_rnd_pixels_tab[1][0] = + c->put_pixels_tab[1][0] = ff_put_pixels8_sse2; + c->avg_pixels_tab[0][0] = ff_avg_pixels16_sse2; c->avg_pixels_tab[0][1] = ff_avg_pixels16_x2_sse2; c->avg_pixels_tab[0][2] = ff_avg_pixels16_y2_sse2; @@ -143,9 +138,6 @@ av_cold void ff_hpeldsp_init_x86(HpelDSPContext *c, int flags) { int cpu_flags = av_get_cpu_flags(); - if (EXTERNAL_MMX(cpu_flags)) - hpeldsp_init_mmx(c, flags); - if (EXTERNAL_MMXEXT(cpu_flags)) hpeldsp_init_mmxext(c, flags); diff --git a/libavcodec/x86/qpeldsp_init.c b/libavcodec/x86/qpeldsp_init.c index 4bd45a7779..a1d1eb80b3 100644 --- a/libavcodec/x86/qpeldsp_init.c +++ b/libavcodec/x86/qpeldsp_init.c @@ -521,8 +521,6 @@ av_cold void ff_qpeldsp_init_x86(QpelDSPContext *c) SET_QPEL_FUNCS(avg_qpel, 1, 8, mmxext, ); SET_QPEL_FUNCS(put_qpel, 0, 16, mmxext, ); - c->put_no_rnd_qpel_pixels_tab[1][0] = - c->put_qpel_pixels_tab[1][0] = ff_put_pixels8x8_mmx; SET_QPEL_FUNCS(put_qpel, 1, 8, mmxext, ); SET_QPEL_FUNCS(put_no_rnd_qpel, 0, 16, mmxext, ); SET_QPEL_FUNCS(put_no_rnd_qpel, 1, 8, mmxext, ); @@ -532,6 +530,8 @@ av_cold void ff_qpeldsp_init_x86(QpelDSPContext *c) if (EXTERNAL_SSE2(cpu_flags)) { c->put_no_rnd_qpel_pixels_tab[0][0] = c->put_qpel_pixels_tab[0][0] = ff_put_pixels16x16_sse2; + c->put_no_rnd_qpel_pixels_tab[1][0] = + c->put_qpel_pixels_tab[1][0] = ff_put_pixels8x8_sse2; c->avg_qpel_pixels_tab[0][0] = ff_avg_pixels16x16_sse2; } #endif diff --git a/libavcodec/x86/vc1dsp_init.c b/libavcodec/x86/vc1dsp_init.c index e8163f2886..e7874d2a5a 100644 --- a/libavcodec/x86/vc1dsp_init.c +++ b/libavcodec/x86/vc1dsp_init.c @@ -73,7 +73,7 @@ static void vc1_h_loop_filter16_sse4(uint8_t *src, ptrdiff_t stride, int pq) ff_ ## OP ## pixels ## DEPTH ## INSN(dst, src, stride, DEPTH); \ } -DECLARE_FUNCTION(put_, 8, _mmx) +DECLARE_FUNCTION(put_, 8, _sse2) DECLARE_FUNCTION(avg_, 8, _mmxext) DECLARE_FUNCTION(put_, 16, _sse2) DECLARE_FUNCTION(avg_, 16, _sse2) @@ -125,7 +125,6 @@ av_cold void ff_vc1dsp_init_x86(VC1DSPContext *dsp) if (EXTERNAL_MMX(cpu_flags)) { dsp->put_no_rnd_vc1_chroma_pixels_tab[0] = ff_put_vc1_chroma_mc8_nornd_mmx; - dsp->put_vc1_mspel_pixels_tab[1][0] = put_vc1_mspel_mc00_8_mmx; } if (EXTERNAL_MMXEXT(cpu_flags)) { ASSIGN_LF4(mmxext); @@ -142,6 +141,7 @@ av_cold void ff_vc1dsp_init_x86(VC1DSPContext *dsp) ASSIGN_LF816(sse2); dsp->put_vc1_mspel_pixels_tab[0][0] = put_vc1_mspel_mc00_16_sse2; + dsp->put_vc1_mspel_pixels_tab[1][0] = put_vc1_mspel_mc00_8_sse2; dsp->avg_vc1_mspel_pixels_tab[0][0] = avg_vc1_mspel_mc00_16_sse2; } if (EXTERNAL_SSSE3(cpu_flags)) { -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org