From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 301394B33B for ; Tue, 4 Jun 2024 01:42:21 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 656D768D6BF; Tue, 4 Jun 2024 04:42:19 +0300 (EEST) Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05olkn2024.outbound.protection.outlook.com [40.92.91.24]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 74D3468D631 for ; Tue, 4 Jun 2024 04:42:13 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lJgHkKGWtAYfxCe0Ye+UDx9PRSLuquDUxddUVf8j5LjkE61+qnMWWfKXEcE0uezjbxqfCvRaoqxGf4d/Hs1sOjVJE7MA510WM6vvvzeAnlhuH+N55Dvu8+PKowikk6S/z1d1YPyhBpapzvXZuiqF2kJ6ZGkCPQoMec2Mfrlkp0Wa/ablffXyq0L2DHQ9e+S7rwwBYvElIAg2dv2J3GivYpcJ4n9Z9JtEV0EXIHKg92M1QLMNex2pH2cu3aW72H18jJ5Yn97YXwkXFU1FS/N7GiIuhon8BI1ZaC2UGvY0r/BNSLwwChuDJlWY7LqqpYo182Kx8VEAI4L1eCYLFkbWJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tXjOqf/OSGO2hdCFiMAANDm2SDhjo0nLXHkX9cZ6j8E=; b=EYiGShZx86k5pXKSwHsZegYTkHp+xzRTUuL9/lWuZxIGIae8LCVX21bSiPo3w1Tl/Be8VXsIBJ6akaNEsJU9SbhBof3w9FcP3L/O5lMC+XvQqSxiVu2r+ASw4eQQJbqM9lIl0N/6vKjevuCvEb0LR8S6Zi9m5zeZ8WFY7mLMSCbz42F/B2VT+nyty6vl5owuDzn99fwpKNC1i5XOAYOzcykyDMclCseFjgzMIajoXEXYtA0jsTGeixGk7w/i62CTHEqNXL4R8gy00awcuUZGOKfTFIpkmkchAYsagPSXxWnEBOhO+mgKvHyYeSWw5KCIiGqdIZwCHDthAYX+OXssXg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tXjOqf/OSGO2hdCFiMAANDm2SDhjo0nLXHkX9cZ6j8E=; b=N71KGaWIZxFvkrMLmEOtZQ30K32gUFpUK/1Uft+OZ28oBqkKedagZ6UUcTYUF5cefRbNzMq9ShQHM1TQZ24f7d0hUDkNuglwAmmQLUKwlcOJ3pzC/LnuUaOlKz+t7H2nyktYPLhJE6A06DNKI5MiWG1El/qNtr8tBH8bvjgxRay6x7Ug0kI72MTYvAiymHTz7rCyBqCeV6O3HyY7xB9f7mfKdgtFDFsVVjihdzcXE9YmDixBKWmeDaVNnOxAjsuwauMXLpGDTe1RuaFW6nL3qm8Iyza5GhBtT6BqaCH4zx0xcxQKy6I8Xo74pNCHSYZSQ4W636rpIT/4gam5bkoF0A== Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) by PRAP250MB0465.EURP250.PROD.OUTLOOK.COM (2603:10a6:102:278::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.25; Tue, 4 Jun 2024 01:42:11 +0000 Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614]) by GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614%4]) with mapi id 15.20.7633.021; Tue, 4 Jun 2024 01:42:11 +0000 Message-ID: Date: Tue, 4 Jun 2024 03:42:08 +0200 User-Agent: Mozilla Thunderbird To: ffmpeg-devel@ffmpeg.org References: <20240604012343.1771-1-jamrial@gmail.com> Content-Language: en-US From: Andreas Rheinhardt In-Reply-To: <20240604012343.1771-1-jamrial@gmail.com> X-TMN: [FvO6vq2fNRJrYD7aWxBOEzZ9VsAuSKS54Rgz+vFSvOM=] X-ClientProxiedBy: ZR2P278CA0065.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:52::6) To GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) X-Microsoft-Original-Message-ID: <029ecc27-82b3-42a0-9be5-d5c62b808136@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: GV1P250MB0737:EE_|PRAP250MB0465:EE_ X-MS-Office365-Filtering-Correlation-Id: c029a606-ccae-4afa-94e3-08dc84378d60 X-Microsoft-Antispam: BCL:0;ARA:14566002|461199019|440099019|3412199016; X-Microsoft-Antispam-Message-Info: OOjAKHZjrSPGufcdShbxwOIWelwf2zHJ8u3dCKP/5Dbp/waJVRp9PHr5fKb+eveCB6mcSt380KBWH45PkU60oPtKa6Cx4Pzy+dVZXIiUYXoBlNaBm3QWgx8c97fwcmTFqajSe/62s9EwV9PZlcqAP98MKe9BRC6pNDRKHXBPXZ/T/ELiFJJFlc+Tq0qfssJA9n/NQqTj+PXQEAytErj3PiHg3DAOiaeUzZh52BcPCDiE4JtzlieLeivTbW8aYb9Lkhr0D4rh3sgCuIcUswfIQIkLT2akRWHzsNGs1pSG5UdWF5RRlXMrQ3KzpmSKwWayXru4OlqpwAIxY9TCKg3xmJ6iRtihZBCHVX4O9RGHUMATPZP/I2wv5GMkz6IvkhNEonf+kQg/S2on8oCGvjF6ulbHdp2zunO1Z4wrQBJ1SZ3h5d/bhMFC+gR3d0KxxdC9vvRWV9eeCXhMZFrU3CA2+5e2zjFw9i0roAa17/aTOxPbmDqiUu2iNHf7XD5zuLjvZflJxvV1hsA1gIpUT2sxFeotTlg1Xt7i7jjwfcfm3H0CFipvuz1CXRkE/pJIfrq2 X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?aktkbWs2US9JSGludEZpNksvTmdvQWpNZk54M0tiRmhMYzJaOWxSWXhJbDA5?= =?utf-8?B?UlRuOCt6ekR4UDZzeUFJRnEwNitmdHBXVDZ3T1RuS3IxYjVVT1dyOFpXWnYw?= =?utf-8?B?RU93MElRSW0zQUZkais5aEUxbUFaUGxIV084SW9sRTl0aXNFMVdrRytHQk5U?= =?utf-8?B?d0RGMUpjVHRLMU04RXBlelREWFR3bUJ3QWtkcnppVFNZeUZxUXhrQXZQR3dV?= =?utf-8?B?OVQvOG50VE0xWVluSWlpMkM5MS9VR2hmbVZxazJoY1ZyNjR1dXVjakptem1R?= =?utf-8?B?dG9MWDBJNWJySXZEWXhrQ0R6Z3JJcGJqZmhkSHVHdnNzR1pGUWUrakQ0ek5p?= =?utf-8?B?Z2NyenF5VWkxOEJBSlE2UkdXK3lFK29rZit6eWUxWFBIamRtd3psQ004VDJp?= =?utf-8?B?RU4xSTRlUndMdE9YNUo4TnZFTitzalgyQ2FBbXlNTkpzMHJ0a21FYUFZNGNI?= =?utf-8?B?UTZjemU2VXlOTlVNcmE3emtiVUFma0ZGTzQrcHdWL3VBbmRCbWc5bTdaWXB3?= =?utf-8?B?d2NIM2FjMUZkN1lFNHpnZUVxTWMxQ1Y4Wjk1Qi8vdEVhQTQ4N1FqL3pNTURQ?= =?utf-8?B?VVhFOElvSkV0ZVBYUHd5Rk1Uc1I4d0tHYzdPQWRScHp6ZnZkWDkwclpnMDBj?= =?utf-8?B?djBWMTd4TjVxYTkxQ3V4aEdCV0duNExobHAzdFh2VUlkZEthUlFKaG9ZQTRC?= =?utf-8?B?RmpYV0NSeFNjczBBSms3cVh4V2p4WjIxaE02amNmWjdnYVU4UGlRTk5MdDI2?= =?utf-8?B?T094U3MrQmp4ZXVsNVhFdlllTjF1emhqT3k2OHlrTlp5MDNNS3VtU1JaVWt2?= =?utf-8?B?QmFwUUNkYU5pOCtoemoxNDFQY1ZZcjY5R0lvUFZ1ZHpzOE9qOWNvN2FFdGhu?= =?utf-8?B?bEhzR3cvUlZmcDdMSko5ZzlzSGxsV3NYcjRDdmtKOTl5Zm1hckdVT2hkWmR2?= =?utf-8?B?NllvWXJNaUpDblowamZSNTAxalVmZkd2ZUgxd3d3YUlEMVRpdkdUVzNwa0JL?= =?utf-8?B?cnRQNHh4Q1NoS2g1S0tHOEYxSklSSzhvWG1ZbzhhVVpCSDZWYmUzdy9CMzgv?= =?utf-8?B?RUdVWnA0RjllN3ZpTVI2Z0U4cS9qZ1JTMVE3b3F4K1BkTVg0a0FSTmNqbS9w?= =?utf-8?B?TDJnNXFRMzl6cWRKckhCUmFncHRCWmt4Y0QwajdFNWxtblE2bmtHTHgzQy9J?= =?utf-8?B?QTFtaFpna1AxQVZrSE5FeElzOXhYQmFxWDNZZENuZEo0K05uYi8xbHJHcnF2?= =?utf-8?B?dHF3VStERHZyZHB2SGR2WGVsRFZHeEpwTEJIUjJFV0lyV1R1YzA1V1Q2eW9R?= =?utf-8?B?Yk56UVh2NmRmQXFsZGE0azVTV0xXVlIvNjFSQWVhOEozRkpiK3JRTWFlUjFH?= =?utf-8?B?cWpMSk1JRG05UjBlRU5jZmtIZ3NUYmVydU1jMW02cWI0czVhRThNaUJyYmlj?= =?utf-8?B?K25yT3J4VUtYbE9ZRnhGTXR0SGFrTTlLNGZaY1dUeDBaNWZ0N2tIbndCZGhH?= =?utf-8?B?K3grbkZCQ3J0c3M1Z2dWb25SVWhrYUphd0puV2FTVFQ2dW9VOWozcFRSMGtT?= =?utf-8?B?dXE1KzZTRkxDaGxpK1JCd0pYQUxZU0xZWVVjYzA5S01lM1czNExOU1dVdTNa?= =?utf-8?B?SVFIVjJkOVYvRHhvNFZHQmhYbjI4b01SK210STFGOFExb0Z0L0pzZWRBNDhR?= =?utf-8?B?ZEVKL1l4bCs3cXNDdHE0OUhRZWNzV1haUytoM0E4YklHL0ZPaTFaTG5BPT0=?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: c029a606-ccae-4afa-94e3-08dc84378d60 X-MS-Exchange-CrossTenant-AuthSource: GV1P250MB0737.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Jun 2024 01:42:11.1659 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PRAP250MB0465 Subject: Re: [FFmpeg-devel] [PATCH] x86/aacencdsp: add SSE2 and AVX versions of quantize_bands X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: James Almer: > quant_bands_signed_sse2: 417.0 > quant_bands_signed_avx: 202.0 Missing benchmark numbers for the C code > > Signed-off-by: James Almer > --- > libavcodec/aacenc.h | 2 +- > libavcodec/x86/aacencdsp.asm | 27 ++++++++++++++++++++++++--- > libavcodec/x86/aacencdsp_init.c | 6 ++++++ > tests/checkasm/aacencdsp.c | 4 ++-- > 4 files changed, 33 insertions(+), 6 deletions(-) > > diff --git a/libavcodec/aacenc.h b/libavcodec/aacenc.h > index d07960620e..ae15f91e06 100644 > --- a/libavcodec/aacenc.h > +++ b/libavcodec/aacenc.h > @@ -242,7 +242,7 @@ typedef struct AACEncContext { > enum RawDataBlockType cur_type; ///< channel group type cur_channel belongs to > > AudioFrameQueue afq; > - DECLARE_ALIGNED(16, int, qcoefs)[96]; ///< quantized coefficients > + DECLARE_ALIGNED(32, int, qcoefs)[96]; ///< quantized coefficients > DECLARE_ALIGNED(32, float, scoefs)[1024]; ///< scaled coefficients > > uint16_t quantize_band_cost_cache_generation; > diff --git a/libavcodec/x86/aacencdsp.asm b/libavcodec/x86/aacencdsp.asm > index 0d3ba4b89d..99be2d87f5 100644 > --- a/libavcodec/x86/aacencdsp.asm > +++ b/libavcodec/x86/aacencdsp.asm > @@ -53,8 +53,19 @@ cglobal abs_pow34, 3, 3, 3, out, in, size > ; int size, int is_signed, int maxval, const float Q34, > ; const float rounding) > ;******************************************************************* > -INIT_XMM sse2 > +%macro AAC_QUANTIZE_BANDS 0 > cglobal aac_quantize_bands, 5, 5, 6, out, in, scaled, size, is_signed, maxval, Q34, rounding > +%if mmsize == 32 > + vbroadcastss m0, Q34m > + vbroadcastss m1, roundingm > +%if UNIX64 == 0 > + cvtsi2ss xm3, dword maxvalm > +%else > + cvtsi2ss xm3, maxvald > +%endif > + shufps xm3, xm3, xm3, 0 > + vinsertf128 m3, m3, xm3, 1 > +%else ; mmsize == 16 > %if UNIX64 == 0 > movss m0, Q34m > movss m1, roundingm > @@ -65,9 +76,13 @@ cglobal aac_quantize_bands, 5, 5, 6, out, in, scaled, size, is_signed, maxval, Q > shufps m0, m0, 0 > shufps m1, m1, 0 > shufps m3, m3, 0 > +%endif > shl is_signedd, 31 > - movd m4, is_signedd > - shufps m4, m4, 0 > + movd xm4, is_signedd > + shufps xm4, xm4, xm4, 0 > +%if mmsize == 32 > + vinsertf128 m4, m4, xm4, 1 > +%endif > shl sized, 2 > add inq, sizeq > add outq, sizeq > @@ -84,3 +99,9 @@ cglobal aac_quantize_bands, 5, 5, 6, out, in, scaled, size, is_signed, maxval, Q > add sizeq, mmsize > jl .loop > RET > +%endmacro > + > +INIT_XMM sse2 > +AAC_QUANTIZE_BANDS > +INIT_YMM avx > +AAC_QUANTIZE_BANDS > diff --git a/libavcodec/x86/aacencdsp_init.c b/libavcodec/x86/aacencdsp_init.c > index e0d8dec4f8..cf17dbf91d 100644 > --- a/libavcodec/x86/aacencdsp_init.c > +++ b/libavcodec/x86/aacencdsp_init.c > @@ -30,6 +30,9 @@ void ff_abs_pow34_sse(float *out, const float *in, const int size); > void ff_aac_quantize_bands_sse2(int *out, const float *in, const float *scaled, > int size, int is_signed, int maxval, const float Q34, > const float rounding); > +void ff_aac_quantize_bands_avx(int *out, const float *in, const float *scaled, > + int size, int is_signed, int maxval, const float Q34, > + const float rounding); > > av_cold void ff_aacenc_dsp_init_x86(AACEncDSPContext *s) > { > @@ -40,4 +43,7 @@ av_cold void ff_aacenc_dsp_init_x86(AACEncDSPContext *s) > > if (EXTERNAL_SSE2(cpu_flags)) > s->quant_bands = ff_aac_quantize_bands_sse2; Seems like the commit message is wrong: You are not adding an SSE2 version. > + > + if (EXTERNAL_AVX_FAST(cpu_flags)) > + s->quant_bands = ff_aac_quantize_bands_avx; > } > diff --git a/tests/checkasm/aacencdsp.c b/tests/checkasm/aacencdsp.c > index 791dd30320..5308a2ac03 100644 > --- a/tests/checkasm/aacencdsp.c > +++ b/tests/checkasm/aacencdsp.c > @@ -81,8 +81,8 @@ static void test_quant_bands(AACEncDSPContext *s) > for (int sign = 0; sign <= 1; sign++) { > if (check_func(s->quant_bands, "quant_bands_%s", > sign ? "signed" : "unsigned")) { > - LOCAL_ALIGNED_16(int, out, [BUF_SIZE]); > - LOCAL_ALIGNED_16(int, out2, [BUF_SIZE]); > + LOCAL_ALIGNED_32(int, out, [BUF_SIZE]); > + LOCAL_ALIGNED_32(int, out2, [BUF_SIZE]); > > call_ref(out, in, scaled, BUF_SIZE, sign, maxval, q34, rounding); > call_new(out2, in, scaled, BUF_SIZE, sign, maxval, q34, rounding); _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".