From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 5FFF84CBB5 for ; Thu, 30 Oct 2025 09:02:40 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'LA9JZpvr+e6ZZxWK1XrrrbxYBBAscMqCBRZ37JyTDr8=', expected b'm2Q4+fckBYZ/QULlSXEhGfT1jlnxx47eyyJE3+HR52g=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1761814868; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=LA9JZpvr+e6ZZxWK1XrrrbxYBBAscMqCBRZ37JyTDr8=; b=FbaCGA1/ZNimvDJ0bFgZZ3BRRxHDPCbLGBDgldukEtJt73Vhn980LptN1isVS1S+veuLU EmFnSm45+6KqRxM7z6meKff67frkJKe1iPasakg2B3JHo6Z/FFg/jVmI0rNbgE8xeia7Fva Rk14pZWqhEYjfyz+bpIumHw5v80n0ho7IuTOmENTWKxujIT8X+XVBBB4bJxwEPQfKKNuMRa rAUb3pSGVy08JxReh95xp0X3OePbhDU6fmKy/IQwhJIxbdhcfWpOot4L7uUqkPIWitaKPdx QkE3f58rG0676calHBwyf8D1vblgFxy4sCJiUcoWWHDcWxZ5vNEPmIUYnwbQ== Received: from [172.19.0.2] (unknown [172.19.0.2]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id B37A068F8A2; Thu, 30 Oct 2025 11:01:08 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1761814825; b=WE4/hXtrMGZwX7bEdRNg7RFHJ+fHO3Onh85jIiump5x3TLffoKOgJnqfoBevu4kAhlA1g YO0VIBAo4gPigKy86ZRyTuacAl0A21JgS64sNUJozqJ1EeNMegX+J7sQjfNAMbIT6ra1m39 1g8CJ5U3GqEwYJQf5YtX7YHRYl4NsBRbwf0YvojMMqo25NQGkkgfjp/szbCm1Lr64N11Osf KOx8ghjI+L/kJRPQ9rxeo9/+2CFi4C+Ayx8axX4Mq4Fmb985ie+0FK3xeSLJg6dHNcnKD8K zxULoG54QWkbPbUlOnDdDOUghquf6BPhPAkmabZxCsyhYwzMggIvhtrqhW7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1761814825; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=Zr/cq22em6Oh/+p3DmMwsfftD506r/4M4PnIaynR6zQ=; b=dBHpUu5rZMPxdNHcfikgsKi+5cXkL/RBSuVdxVK3pPLTnAop6V3WOqLvFshUEGSgCfiET /jBtRF7pZXwDypAyA9jDQFsO12X9/XOhSajgIntxzhj61vu6Y9EOEL+MZvMtE881deEFhOC UTNWYixe+aym2YcaxPM5Dx6R/MfjbAwfx1l+GGa3vc5RAKZ20d2MaF0mlI+r0QH74Wp9wZc UqJqj6LX0Gmni/WC6V2UwwBxYwld1hSbohaJN4eBiGlAlm5H6iuMx4O/lhDjurO2tH1Udq/ OinxnJt4mzOqpr59JvNhzRMqStKwxjAsbu6xzalMlg+CH772qnjyydmfRa6w== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1761814812; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=m2Q4+fckBYZ/QULlSXEhGfT1jlnxx47eyyJE3+HR52g=; b=QaHt1y35l7/F0mMlXlWjFjaI0qoq9LJfgJLkEmjjeU8jC0BHkOWl3GrcT0j1SpbtlKqXV pbH+L+9OTZ0JHdnA5LbcTKRMFROQb36fRQOSTwLgWVslZIql2yCtuVm6t/5zZ3JMkyxtUKF psPjeiooGQrVH7GTaZKeJmIE0rQDeF9Y+27PbHrw9lAWsxY42YylLkf4Co7tcNp2WMnzPwP 4AOa2YK9JHXaN/WxFKnPdQWeWpWld7nS0V1A7QAKhpogicdxjJZYN7DtpHwEMasWL0Sz/gc UM7C5XltsQZiG8pjfyqOIfEaiParFz+1gkE5DW7LKAVNCZXFplfcOHb+LeWA== Received: from 02c22a36bd31 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 8CD1E68F832 for ; Thu, 30 Oct 2025 11:00:12 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Thu, 30 Oct 2025 09:00:11 -0000 Message-ID: <176181481291.81.17220233578094963796@7d278768979e> Message-ID-Hash: IOSKJBLH4745O54URPYN6XY4EX5XFW5Z X-Message-ID-Hash: IOSKJBLH4745O54URPYN6XY4EX5XFW5Z X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] avcodec/x86/hevc/add_res: Remove AVX add_residual functions (PR #20789) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: mkver via ffmpeg-devel Cc: mkver Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #20789 opened by mkver URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20789 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20789.patch >>From a0fa1c8e484f06cc9a9e2e3cfe53ec121fb74659 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Thu, 30 Oct 2025 08:30:40 +0100 Subject: [PATCH 1/3] avcodec/x86/hevc/add_res: Remove AVX add_residual functions The AVX and SSE2 functions are identical except for the VEX encodings used since e9abef437f0a348c017d4ac8b23a122881c1dc87 and 8b8492452d53293b2ac8c842877fadf7925fc950. Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/hevc/add_res.asm | 7 +------ libavcodec/x86/hevc/dsp.h | 4 ---- libavcodec/x86/hevc/dsp_init.c | 4 ---- 3 files changed, 1 insertion(+), 14 deletions(-) diff --git a/libavcodec/x86/hevc/add_res.asm b/libavcodec/x86/hevc/add_res.asm index 3ecbd4269c..5d7115620f 100644 --- a/libavcodec/x86/hevc/add_res.asm +++ b/libavcodec/x86/hevc/add_res.asm @@ -117,7 +117,7 @@ cglobal hevc_add_residual_4_8, 3, 3, 6 %endmacro -%macro TRANSFORM_ADD_8 0 +INIT_XMM sse2 ; void ff_hevc_add_residual_8_8_(uint8_t *dst, const int16_t *res, ptrdiff_t stride) cglobal hevc_add_residual_8_8, 3, 4, 8 pxor m4, m4 @@ -154,12 +154,7 @@ cglobal hevc_add_residual_32_8, 3, 5, 7 dec r4d jg .loop RET -%endmacro -INIT_XMM sse2 -TRANSFORM_ADD_8 -INIT_XMM avx -TRANSFORM_ADD_8 %if HAVE_AVX2_EXTERNAL INIT_YMM avx2 diff --git a/libavcodec/x86/hevc/dsp.h b/libavcodec/x86/hevc/dsp.h index 03986b970a..0062699ce0 100644 --- a/libavcodec/x86/hevc/dsp.h +++ b/libavcodec/x86/hevc/dsp.h @@ -172,10 +172,6 @@ void ff_hevc_add_residual_8_8_sse2(uint8_t *dst, const int16_t *res, ptrdiff_t s void ff_hevc_add_residual_16_8_sse2(uint8_t *dst, const int16_t *res, ptrdiff_t stride); void ff_hevc_add_residual_32_8_sse2(uint8_t *dst, const int16_t *res, ptrdiff_t stride); -void ff_hevc_add_residual_8_8_avx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); -void ff_hevc_add_residual_16_8_avx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); -void ff_hevc_add_residual_32_8_avx(uint8_t *dst, const int16_t *res, ptrdiff_t stride); - void ff_hevc_add_residual_32_8_avx2(uint8_t *dst, const int16_t *res, ptrdiff_t stride); void ff_hevc_add_residual_4_10_mmxext(uint8_t *dst, const int16_t *res, ptrdiff_t stride); diff --git a/libavcodec/x86/hevc/dsp_init.c b/libavcodec/x86/hevc/dsp_init.c index 6966340c42..f1558b7e3e 100644 --- a/libavcodec/x86/hevc/dsp_init.c +++ b/libavcodec/x86/hevc/dsp_init.c @@ -877,10 +877,6 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) c->idct[0] = ff_hevc_idct_4x4_8_avx; c->idct[1] = ff_hevc_idct_8x8_8_avx; - - c->add_residual[1] = ff_hevc_add_residual_8_8_avx; - c->add_residual[2] = ff_hevc_add_residual_16_8_avx; - c->add_residual[3] = ff_hevc_add_residual_32_8_avx; } if (EXTERNAL_AVX2(cpu_flags)) { c->sao_band_filter[0] = ff_hevc_sao_band_filter_8_8_avx2; -- 2.49.1 >>From 17526beaf2ea13fd7e1484e8af0ae44baee6f8cb Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Thu, 30 Oct 2025 08:49:38 +0100 Subject: [PATCH 2/3] avcodec/x86/hevc/add_res: Reduce number of registers used This makes these functions use only volatile registers (even on Win64). Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/hevc/add_res.asm | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/libavcodec/x86/hevc/add_res.asm b/libavcodec/x86/hevc/add_res.asm index 5d7115620f..8abfcab893 100644 --- a/libavcodec/x86/hevc/add_res.asm +++ b/libavcodec/x86/hevc/add_res.asm @@ -61,20 +61,16 @@ cglobal hevc_add_residual_4_8, 3, 3, 6 movq m1, [r0+r2] punpcklbw m0, m4 punpcklbw m1, m4 - mova m2, [r1] - mova m3, [r1+16] - paddsw m0, m2 - paddsw m1, m3 + paddsw m0, [r1] + paddsw m1, [r1+16] packuswb m0, m1 movq m2, [r0+r2*2] movq m3, [r0+r3] punpcklbw m2, m4 punpcklbw m3, m4 - mova m6, [r1+32] - mova m7, [r1+48] - paddsw m2, m6 - paddsw m3, m7 + paddsw m2, [r1+32] + paddsw m3, [r1+48] packuswb m2, m3 movq [r0], m0 @@ -88,27 +84,33 @@ cglobal hevc_add_residual_4_8, 3, 3, 6 mova m2, m1 punpcklbw m1, m0 punpckhbw m2, m0 +%if cpuflag(avx2) mova xm5, [r1+%1] mova xm6, [r1+%1+16] -%if cpuflag(avx2) vinserti128 m5, m5, [r1+%1+32], 1 vinserti128 m6, m6, [r1+%1+48], 1 -%endif paddsw m1, m5 paddsw m2, m6 +%else + paddsw m1, [r1+%1] + paddsw m2, [r1+%1+16] +%endif mova m3, [%3] mova m4, m3 punpcklbw m3, m0 punpckhbw m4, m0 +%if cpuflag(avx2) mova xm5, [r1+%1+mmsize*2] mova xm6, [r1+%1+mmsize*2+16] -%if cpuflag(avx2) vinserti128 m5, m5, [r1+%1+96], 1 vinserti128 m6, m6, [r1+%1+112], 1 -%endif paddsw m3, m5 paddsw m4, m6 +%else + paddsw m3, [r1+%1+mmsize*2] + paddsw m4, [r1+%1+mmsize*2+16] +%endif packuswb m1, m2 packuswb m3, m4 @@ -119,7 +121,7 @@ cglobal hevc_add_residual_4_8, 3, 3, 6 INIT_XMM sse2 ; void ff_hevc_add_residual_8_8_(uint8_t *dst, const int16_t *res, ptrdiff_t stride) -cglobal hevc_add_residual_8_8, 3, 4, 8 +cglobal hevc_add_residual_8_8, 3, 4, 5 pxor m4, m4 lea r3, [r2*3] ADD_RES_SSE_8_8 @@ -129,7 +131,7 @@ cglobal hevc_add_residual_8_8, 3, 4, 8 RET ; void ff_hevc_add_residual_16_8_(uint8_t *dst, const int16_t *res, ptrdiff_t stride) -cglobal hevc_add_residual_16_8, 3, 5, 7 +cglobal hevc_add_residual_16_8, 3, 5, 5 pxor m0, m0 lea r3, [r2*3] mov r4d, 4 @@ -143,7 +145,7 @@ cglobal hevc_add_residual_16_8, 3, 5, 7 RET ; void ff_hevc_add_residual_32_8_(uint8_t *dst, const int16_t *res, ptrdiff_t stride) -cglobal hevc_add_residual_32_8, 3, 5, 7 +cglobal hevc_add_residual_32_8, 3, 5, 5 pxor m0, m0 mov r4d, 16 .loop: -- 2.49.1 >>From 894f415b278a07c9afbe349697bde80bbdab4e11 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Thu, 30 Oct 2025 09:58:13 +0100 Subject: [PATCH 3/3] avcodec/x86/hevc/add_res: Avoid unnecessary modification Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/hevc/add_res.asm | 32 ++++++++++++++------------------ 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/libavcodec/x86/hevc/add_res.asm b/libavcodec/x86/hevc/add_res.asm index 8abfcab893..3489e04e2b 100644 --- a/libavcodec/x86/hevc/add_res.asm +++ b/libavcodec/x86/hevc/add_res.asm @@ -27,9 +27,9 @@ cextern pw_1023 %define max_pixels_10 pw_1023 ; the add_res macros and functions were largely inspired by h264_idct.asm from the x264 project -%macro ADD_RES_MMX_4_8 0 - mova m0, [r1] - mova m2, [r1+8] +%macro ADD_RES_MMX_4_8 1 + mova m0, [r1+%1] + mova m2, [r1+%1+8] movd m1, [r0] movd m3, [r0+r2] @@ -50,27 +50,26 @@ INIT_MMX mmxext ; void ff_hevc_add_residual_4_8_mmxext(uint8_t *dst, const int16_t *res, ptrdiff_t stride) cglobal hevc_add_residual_4_8, 3, 3, 6 pxor m4, m4 - ADD_RES_MMX_4_8 - add r1, 16 + ADD_RES_MMX_4_8 0 lea r0, [r0+r2*2] - ADD_RES_MMX_4_8 + ADD_RES_MMX_4_8 16 RET -%macro ADD_RES_SSE_8_8 0 +%macro ADD_RES_SSE_8_8 1 movq m0, [r0] movq m1, [r0+r2] punpcklbw m0, m4 punpcklbw m1, m4 - paddsw m0, [r1] - paddsw m1, [r1+16] + paddsw m0, [r1+%1] + paddsw m1, [r1+%1+16] packuswb m0, m1 movq m2, [r0+r2*2] movq m3, [r0+r3] punpcklbw m2, m4 punpcklbw m3, m4 - paddsw m2, [r1+32] - paddsw m3, [r1+48] + paddsw m2, [r1+%1+32] + paddsw m3, [r1+%1+48] packuswb m2, m3 movq [r0], m0 @@ -124,10 +123,9 @@ INIT_XMM sse2 cglobal hevc_add_residual_8_8, 3, 4, 5 pxor m4, m4 lea r3, [r2*3] - ADD_RES_SSE_8_8 - add r1, 64 + ADD_RES_SSE_8_8 0 lea r0, [r0+r2*4] - ADD_RES_SSE_8_8 + ADD_RES_SSE_8_8 64 RET ; void ff_hevc_add_residual_16_8_(uint8_t *dst, const int16_t *res, ptrdiff_t stride) @@ -292,9 +290,8 @@ cglobal hevc_add_residual_4_10, 3, 3, 6 pxor m2, m2 mova m3, [max_pixels_10] ADD_RES_MMX_4_10 r0, r2, r1 - add r1, 16 lea r0, [r0+2*r2] - ADD_RES_MMX_4_10 r0, r2, r1 + ADD_RES_MMX_4_10 r0, r2, r1+16 RET INIT_XMM sse2 @@ -305,8 +302,7 @@ cglobal hevc_add_residual_8_10, 3, 4, 6 ADD_RES_SSE_8_10 r0, r2, r3, r1 lea r0, [r0+r2*4] - add r1, 64 - ADD_RES_SSE_8_10 r0, r2, r3, r1 + ADD_RES_SSE_8_10 r0, r2, r3, r1+64 RET cglobal hevc_add_residual_16_10, 3, 5, 6 -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org