From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id EEA6B4CB83 for ; Wed, 29 Oct 2025 23:37:20 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'2F+tCYjldi+4vXqcbpRJ0gLTvCUyKPYnGoTOHhvM4bI=', expected b'S9pVsaW8F8SmlQZ1RQkKSbGq1T/FAXsF0WCOnZI516U=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1761781033; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=2F+tCYjldi+4vXqcbpRJ0gLTvCUyKPYnGoTOHhvM4bI=; b=haf3C2hJoQz/443e3GdODjrYfiQk/xwbQTHYG2Qwa3zJExzFNwtbvWPB2QpY1LUBsR5Dx 3V03ZcW2scd7q7o7Bz8UbAmtNwVR92xWG8+ZhilWJsfe/ZLRD6PwHmx90qbagP6b3/5xa2b xYcsUvyg0vWilCVZ8v4FrcT11n5hwg2HuqjBmBJoEp7RN3Rr1Whvbp8a0qcoQQWp2bIjkew RioLTK1NmvaNerf1oHWdJp15bBG5OVfDRa/RHpWcTi27VgJAewdX45KpJR945SRqrWZKKA1 z4DTQF06G+19E79NFvdRg2IG+Vgi+BFRdvzGMS8DTSiNN1GLRHlnp1fawZ3A== Received: from [172.19.0.2] (unknown [172.19.0.2]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 0D4A368F818; Thu, 30 Oct 2025 01:37:13 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1761781017; b=n1kzwLsm2iIg6V8V0oHHNXRZzqJYz0OaswZ1m+0wmLtT1XuoXjAKkkLqnTWcTubIBSza/ Ew9EzIyp07pk2PWOiS/9pYlh2c9fJNzl65xpOzPzYtUJddZ/YlcrR2Pc3lxulece8DnkUI6 Dh8xm6WEBVtJ/tBJdaihh1pPW9dAXm+A5IKVIU02c+aMCRVilDwi/DblnxtXof9LsIcuCJa YDcIhJ6WXevEhQKfNSfezB55nWpGO3JHiQuppvFrw/298ZGjPENSAHVRm4kzhpukJov1dOD 5ZJ/T02Hzm7kndkQ7nnLVdwSUTazXo+ff7TnKNrI0D4XTGKpWHgJKJLT6Btg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1761781017; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=TaZPl9UKHyKh9DdYRlokaomW5fT8aLllAW6WXVvTs0I=; b=j2Qh+SsU8uTVAz4VZYzaqfOrFlVutXUOPdHymtd26c8ouVLVepjJt596gCy5dQcGqEM6M /l0NFAaXGDxqgXoE4lFTFrlZ1abgAbr86bgeNpovDhedF0ntxqGC1EDWoav2neaG1pF81cA uQSNz8DmrmGZK0AvAWKzyIFrTYVH5HC+OWThDqINjrk9LPlbfc3YA6q5fDqfldD9F/68yw4 b+44qbshU1dcwxS8bP1Iq86lSsfdVtGyCpfHoUr93bRsMgRzhPPAB0b/HdyUJwZ8b5rEusg RoloEhydQBQwlShWLfDzqDwORYvfQjo+lPx0l00Y9BXdQZEWrrhYVwWhfmEw== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1761781010; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=S9pVsaW8F8SmlQZ1RQkKSbGq1T/FAXsF0WCOnZI516U=; b=C36yYGsZQPHpVk3pxSvmREj3yjiQzMuRHhTC1lHFVH79iD5itQKy8CAAjqMl/Ne3h5/h5 JkMDv3MPhwk+1CFzszAbozRSc9ild74dLi1zjxdHS1OsF76MDnePOrRQciGaHoOaPDrwLLx pVtPVo9ONGnO8Q43WzjvBDzimI4aDPWWaYwsW/MK4HHevh9o11vK+CDC/dLNl1jHqZWuZEi k8v1lGmMpqT+QWLR8kcpuy8XHIgPUAiGnP4uDCq+f6/DWy3Dlik5rvXXIJdPmNuMVXvsxpC 2kaw6wvosqDp+VvSMtMzPlYTDqw+LmT6xpiKGBSR5YT6pCw58VO3mVzIEk/A== Received: from 02c22a36bd31 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 1777568E880 for ; Thu, 30 Oct 2025 01:36:50 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Wed, 29 Oct 2025 23:36:48 -0000 Message-ID: <176178101023.81.16874102448115443083@7d278768979e> Message-ID-Hash: JLSPCACZGVK6FM4T2CVPFBWMQ472QSQD X-Message-ID-Hash: JLSPCACZGVK6FM4T2CVPFBWMQ472QSQD X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] avcodec/x86/hevc/idct: Port ff_hevc_idct_4x4_dc_{8,10,12}_mmxext to SSE2 (PR #20788) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: mkver via ffmpeg-devel Cc: mkver Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #20788 opened by mkver URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20788 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20788.patch Practically no change in benchmarks (and in codesize). hevc_idct_4x4_dc_8_c: 7.8 ( 1.00x) hevc_idct_4x4_dc_8_mmxext: 6.9 ( 1.14x) hevc_idct_4x4_dc_8_sse2: 6.8 ( 1.15x) hevc_idct_4x4_dc_10_c: 7.9 ( 1.00x) hevc_idct_4x4_dc_10_mmxext: 6.9 ( 1.16x) hevc_idct_4x4_dc_10_sse2: 6.8 ( 1.16x) hevc_idct_4x4_dc_12_c: 7.8 ( 1.00x) hevc_idct_4x4_dc_12_mmxext: 7.0 ( 1.13x) hevc_idct_4x4_dc_12_sse2: 6.8 ( 1.15x) >>From e81edabc4d6dfb825e6432da48a0a827b69e6ade Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Thu, 30 Oct 2025 00:07:42 +0100 Subject: [PATCH] avcodec/x86/hevc/idct: Port ff_hevc_idct_4x4_dc_{8,10,12}_mmxext to SSE2 Practically no change in benchmarks (and in codesize). hevc_idct_4x4_dc_8_c: 7.8 ( 1.00x) hevc_idct_4x4_dc_8_mmxext: 6.9 ( 1.14x) hevc_idct_4x4_dc_8_sse2: 6.8 ( 1.15x) hevc_idct_4x4_dc_10_c: 7.9 ( 1.00x) hevc_idct_4x4_dc_10_mmxext: 6.9 ( 1.16x) hevc_idct_4x4_dc_10_sse2: 6.8 ( 1.16x) hevc_idct_4x4_dc_12_c: 7.8 ( 1.00x) hevc_idct_4x4_dc_12_mmxext: 7.0 ( 1.13x) hevc_idct_4x4_dc_12_sse2: 6.8 ( 1.15x) Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/hevc/dsp_init.c | 11 ++++------- libavcodec/x86/hevc/idct.asm | 19 ++++++------------- tests/checkasm/hevc_idct.c | 2 +- 3 files changed, 11 insertions(+), 21 deletions(-) diff --git a/libavcodec/x86/hevc/dsp_init.c b/libavcodec/x86/hevc/dsp_init.c index ba921e7299..6966340c42 100644 --- a/libavcodec/x86/hevc/dsp_init.c +++ b/libavcodec/x86/hevc/dsp_init.c @@ -65,7 +65,7 @@ void ff_hevc_idct_ ## W ## _dc_8_ ## opt(int16_t *coeffs); \ void ff_hevc_idct_ ## W ## _dc_10_ ## opt(int16_t *coeffs); \ void ff_hevc_idct_ ## W ## _dc_12_ ## opt(int16_t *coeffs) -IDCT_DC_FUNCS(4x4, mmxext); +IDCT_DC_FUNCS(4x4, sse2); IDCT_DC_FUNCS(8x8, sse2); IDCT_DC_FUNCS(16x16, sse2); IDCT_DC_FUNCS(32x32, sse2); @@ -816,8 +816,6 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) if (bit_depth == 8) { if (EXTERNAL_MMXEXT(cpu_flags)) { - c->idct_dc[0] = ff_hevc_idct_4x4_dc_8_mmxext; - c->add_residual[0] = ff_hevc_add_residual_4_8_mmxext; } if (EXTERNAL_SSE2(cpu_flags)) { @@ -832,6 +830,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) } SAO_BAND_INIT(8, sse2); + c->idct_dc[0] = ff_hevc_idct_4x4_dc_8_sse2; c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_sse2; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_sse2; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_sse2; @@ -998,7 +997,6 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) } else if (bit_depth == 10) { if (EXTERNAL_MMXEXT(cpu_flags)) { c->add_residual[0] = ff_hevc_add_residual_4_10_mmxext; - c->idct_dc[0] = ff_hevc_idct_4x4_dc_10_mmxext; } if (EXTERNAL_SSE2(cpu_flags)) { c->hevc_v_loop_filter_chroma = ff_hevc_v_loop_filter_chroma_10_sse2; @@ -1013,6 +1011,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) SAO_BAND_INIT(10, sse2); SAO_EDGE_INIT(10, sse2); + c->idct_dc[0] = ff_hevc_idct_4x4_dc_10_sse2; c->idct_dc[1] = ff_hevc_idct_8x8_dc_10_sse2; c->idct_dc[2] = ff_hevc_idct_16x16_dc_10_sse2; c->idct_dc[3] = ff_hevc_idct_32x32_dc_10_sse2; @@ -1218,9 +1217,6 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) } #endif /* HAVE_AVX2_EXTERNAL */ } else if (bit_depth == 12) { - if (EXTERNAL_MMXEXT(cpu_flags)) { - c->idct_dc[0] = ff_hevc_idct_4x4_dc_12_mmxext; - } if (EXTERNAL_SSE2(cpu_flags)) { c->hevc_v_loop_filter_chroma = ff_hevc_v_loop_filter_chroma_12_sse2; c->hevc_h_loop_filter_chroma = ff_hevc_h_loop_filter_chroma_12_sse2; @@ -1231,6 +1227,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) SAO_BAND_INIT(12, sse2); SAO_EDGE_INIT(12, sse2); + c->idct_dc[0] = ff_hevc_idct_4x4_dc_12_sse2; c->idct_dc[1] = ff_hevc_idct_8x8_dc_12_sse2; c->idct_dc[2] = ff_hevc_idct_16x16_dc_12_sse2; c->idct_dc[3] = ff_hevc_idct_32x32_dc_12_sse2; diff --git a/libavcodec/x86/hevc/idct.asm b/libavcodec/x86/hevc/idct.asm index 021e5dab14..088144171d 100644 --- a/libavcodec/x86/hevc/idct.asm +++ b/libavcodec/x86/hevc/idct.asm @@ -273,16 +273,11 @@ cglobal hevc_idct_%1x%1_dc_%2, 1, 2, 1, coeff, tmp sar tmpd, (15 - %2) movd m0, tmpd SPLATW m0, xm0 - mova [coeffq+mmsize*0], m0 - mova [coeffq+mmsize*1], m0 - mova [coeffq+mmsize*2], m0 - mova [coeffq+mmsize*3], m0 -%if mmsize == 16 - mova [coeffq+mmsize*4], m0 - mova [coeffq+mmsize*5], m0 - mova [coeffq+mmsize*6], m0 - mova [coeffq+mmsize*7], m0 -%endif +%assign %%offset 0 +%rep 2*%1*%1/mmsize + mova [coeffq+%%offset], m0 + %assign %%offset %%offset+mmsize +%endrep RET %endmacro @@ -809,10 +804,8 @@ cglobal hevc_idct_32x32_%1, 1, 6, 16, 256, coeffs %endmacro %macro INIT_IDCT_DC 1 -INIT_MMX mmxext -IDCT_DC_NL 4, %1 - INIT_XMM sse2 +IDCT_DC_NL 4, %1 IDCT_DC_NL 8, %1 IDCT_DC 16, 4, %1 IDCT_DC 32, 16, %1 diff --git a/tests/checkasm/hevc_idct.c b/tests/checkasm/hevc_idct.c index 2bd7ae9409..139ae81727 100644 --- a/tests/checkasm/hevc_idct.c +++ b/tests/checkasm/hevc_idct.c @@ -69,7 +69,7 @@ static void check_idct_dc(HEVCDSPContext *h, int bit_depth) for (i = 2; i <= 5; i++) { int block_size = 1 << i; int size = block_size * block_size; - declare_func_emms(AV_CPU_FLAG_MMXEXT, void, int16_t *coeffs); + declare_func(void, int16_t *coeffs); randomize_buffers(coeffs0, size); memcpy(coeffs1, coeffs0, sizeof(*coeffs0) * size); -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org