From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id C125E4B49D for ; Thu, 6 Jun 2024 15:04:24 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id DCCE168D628; Thu, 6 Jun 2024 18:04:21 +0300 (EEST) Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04olkn2074.outbound.protection.outlook.com [40.92.74.74]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 26070680068 for ; Thu, 6 Jun 2024 18:04:15 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SSI2CnTImVJFhFTNDQ6jJfZZ84M/aKNcVCFsdYw6XUtDJJwaE7wSn1nr8Bb6M9eGUTSgXjzMMH+hBLWEWLvkbzfj1XcQbOKc287rkEwcYyzcfhBE189fBdsaVlLp4MMt9Rgpim3y1kNFO/MLvz2I6m9hB8lk4uky9wIDGzNzWGZvlWKegluihHJD1d/L5tesY7Gtsqk0pc+Id9+PIr03MylHxMlhMjxtvJcROim4QnGpiZRKVRb1xUJqHH72ZK3a7f8cSWgME1phpDp6gGY06Crno7C7kbhXum5w86t4fTRIDhHXYdQPA9m7Hlz1ju5GvQmu2pYKC/FssUyH5sukUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mPuqCD6UwXTaCkbx9PcuBYhv5syT1sACUR/HDyEMjRk=; b=FzZmK5yA4P6p6+VolvAXJ1vJcP0r6Sw3rsjH64I3ixx0V5W8Od6j5RN+HR9LxdpuJSHP/1twR1Fp3pFCASRVdOPMT3X/Ksc49LnQGDQc8sGih1sy4YPId2e7TCB1dXvI9Edkar/ff4wNN6jQ35iK/mDtyFGW6FCfFRcc1AK9MBIJzZfrZWnRjX2kXmaBvsQuhrDqT4btcr9CTec+ZeoVlFejQtcatLYpNXcqYibW3PIBmse4bvoHzBasu0Tj5N72iWZf+2aVEfO+vnt2fOXtXt0CtlkqDuSODPTUKTXqIeAr/1eCXlT+eFOgN0EWxusOhVsj8t82aaq2Q/zf/TT4Dw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mPuqCD6UwXTaCkbx9PcuBYhv5syT1sACUR/HDyEMjRk=; b=JCPXdjd0Mm7m9z7FkFhPFqpz/0JdQrz4fl3a1+fUavLIu+Dp+yg/Sfh6ZeNBsSemt1fh5fMLmLu5ZeQdBB6f8VodaDL1BkVn84ivYUuxa+6p+G+rt0+HTfuTmzCCL0XniB1VtYRJGrcwkNoAivH8dyTL9JKvHcEexezIGy30gHNKDkvza1YLPQZUhDDpySz/pKeeyDDgkDo+VwSnX++Ekfnkg7xf2TdDSIK/UjW3YyO6yTJo0u/tL72t1R6KSBsXFnv2B/DnKEMV+eyqg8sbCkuixljV1ckILaASG5QsLXiqKR2uvPvekHHkMkZSn9JE66xWuvDlPtXuGvatqYV+/Q== Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) by DU2P250MB0271.EURP250.PROD.OUTLOOK.COM (2603:10a6:10:270::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.31; Thu, 6 Jun 2024 14:48:46 +0000 Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::384d:40d4:ecb7:1c9]) by AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::384d:40d4:ecb7:1c9%4]) with mapi id 15.20.7633.021; Thu, 6 Jun 2024 14:48:46 +0000 Message-ID: Date: Thu, 6 Jun 2024 16:48:44 +0200 User-Agent: Mozilla Thunderbird To: ffmpeg-devel@ffmpeg.org References: <20240605205116.3258-1-jamrial@gmail.com> <20240606141505.132-1-jamrial@gmail.com> Content-Language: en-US From: Andreas Rheinhardt In-Reply-To: <20240606141505.132-1-jamrial@gmail.com> X-TMN: [2sGFKnFwUodU5N/AYLGjv4MJHLJBWEwnkbiT24KUk08=] X-ClientProxiedBy: FR4P281CA0368.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:f8::15) To AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) X-Microsoft-Original-Message-ID: <9d1474ca-035e-4dc3-8436-3a68e0bda61f@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AS8P250MB0744:EE_|DU2P250MB0271:EE_ X-MS-Office365-Filtering-Correlation-Id: 13cddac6-58c8-4f3f-7169-08dc8637c4e2 X-Microsoft-Antispam: BCL:0;ARA:14566002|461199019|3412199016|440099019; X-Microsoft-Antispam-Message-Info: Tuh99NC4XZ1LY0jlIs+09v2lmhKOClu0Cy2gh2JqDEpT208Ki+/gL6jbtHEQNi1eu4LtTGkDcJ4kWFp0dQF4rXPHKTWnN2B/e7I27EAxxr7QXnoN/xTxK5Gw0kjr7k+8DL4X7EMr9ipn988O33h7B4WV97nDJFW0dxkI1jDknzotTy7Tmq9vGUZ0Nyvkt/Z0qswHBqvgz5613XucFNUz/BIOMb9KGMFK+m5IeOfCw8YBDf0af4sEUN+EUABuNhadCob+cLLATvN71kIxZFUPXpSQgnIZoeEh7qUWIHp+EStVUgu5+i1gTAxm1p4ODwGkgrezRiMRmyueexFF6zAoNNOn5eic2iDIM1xxmacpAZhCgB8WRp/PZ0rbGY1SeUWHTZMqt/y5FmMnGH+cKBHJngHg3P9dCRcbeCbt3fIx9V8h75fiU4n4Ij+iTM3wFP/h+WTvgYIcmv1F1egiob9ElX1iLRHEpZeeVdY5eQaLfQZO2GpDGGu8d0mNJ1qXZppl8Qcv6ogz3Z4KgP0aNhJWqFbmDsBqVW56iEc4wabAw3e8TFCIicAL359te3su7sOM X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?b0dwSWFseFBsSnZYWlN6ZHlsaFByTWM0OXY1Y3lrYTlJWFZlRnFlQ0tvbVJW?= =?utf-8?B?T094Ykxyc3BlRWROUW1KNzFVWUJCNU1zMGl2VTA3Y2F3dlhxVHFUUnR1V1V0?= =?utf-8?B?UTVBT1Y3QXZkWjRHOGkvYVJkNUVlektRdldyM0s0SnRRUGlFTk95U25KNXBR?= =?utf-8?B?Z0djWG9XUERDWGd2K2JKRm9TdEhNclpuc2NXbkNFSVc1M3BKS1ZDNDI1MkxH?= =?utf-8?B?c0RscEk1YXcydVVZVGpSU2FrRmgrWFkrWVRSZFl5U2FxSEIyK1BNS2VVM2RD?= =?utf-8?B?bXRVaEdkUlV0VlhrU3pvUUlHKzFpZmE4QWw4a0NTQmNyb3JrZFc1bVN2SVRJ?= =?utf-8?B?ZlZWVTNNdnhjWG52bUVLakJtdmlJa3ZzQVVzeGlvcXV2bDhnOFkrQWE0LzMy?= =?utf-8?B?STJnSlZtVGs0M3ZTNlZWMTV0Uis3OXBPSG5Oa1k2WFFYdDJtVGdzbGtGdnJW?= =?utf-8?B?V2RrVWZNdFdxbGNCcy84NHRuU2daeUVIU2NBMXNET0Z0eHJHd2dzbi9yT1Rp?= =?utf-8?B?STVaQmJ2cnh5dVA4azlvSnVBQzRyNzU2RXZVTS8vVVd3MlQ1cysrWFFVa01j?= =?utf-8?B?NjhLSGR1dEN6QWlvLzA1bmZJdlFRSkE0Zms0TUY3aGVLWlBjVGw0cCt4Q25s?= =?utf-8?B?UmNsZW5PR0R0SnRKdTlZQ0tnRE9jSGVnSnpQVVdVM3ZOZ3J3a0p4alRXNnE2?= =?utf-8?B?SU5pZ09nWndxUW1mRU1tNGVPcGhhY3dNWnhGYUdtdnlRQTBQeGhJQ3oyVHpl?= =?utf-8?B?dVJPSjhVNms1MVhMKzIxdmpGY3VVZEdBWEZsQ2ZBVHV4dmFBeVc5NitVVnEw?= =?utf-8?B?Q2t1dmdGejB6UGkwaHNaZkg2TW5HekcrdDYwT3U3R0t1ckF6Y1N2ZFFLQ2tC?= =?utf-8?B?Rmk5RTNwRUhVN1dKSHVlbit0ZVI0T2RyMURPQWdMaHhRUEpGS3hlaW1JQ2Jp?= =?utf-8?B?YThpclNmK1VLS0pqUlhSUW1KenJCeDVpalo0RWRZdVIrVXpvamw3UTBmc2E2?= =?utf-8?B?bC94T0dBelhweFpPazA1cHFoQ2h5SGE2bzdudWU2b1h0YXdiVWRIMFdPT2U3?= =?utf-8?B?U2E2Y3VCYWQvTkNLa2hDNFJjakFpS0NPeTVMVmJwUmxhL1grdzlEbjFWNDZG?= =?utf-8?B?K0tqUUVzYXJIbTlGdzRxdUM4eDhRbU94QmJPSFpHcnFESFMxNWhkeFo1Nlhh?= =?utf-8?B?THRzQVlJVWRkN1M0d2w2aDJGTVJjdGxUaG9qM24zT1l2dkxLeFJDYlQ1U0dC?= =?utf-8?B?SjY5TmdaYlc5MzhqMGJBT0NTZXRvQ1NYZCtTczZ3eUoyU3RESlRkREJNNHgz?= =?utf-8?B?bzJnYXpFV1VnSjc3aDdiQWYyV2xLbzd0SzdLK3c5bWR6clFLWCtPcitMTVF4?= =?utf-8?B?akFkYWN6Sm5FQmxhbldtUi9xUDQzL1FveHFhaE8zZFluRnZSa3hIN1Jsc2Rr?= =?utf-8?B?bFRpVko5K0Z1VW1rRThhQXJOZWhHa0Jici82UDNLT2IzUE1jcTVKY2VmbzJu?= =?utf-8?B?WnBrMUZNME9GNHRmMTVpeHVYL2c4WjUveDVaVUoyODB2WWU2SzBmYTIvYmdF?= =?utf-8?B?SWFCOWM4ZDZZeGhyT0c4RHFzNlY4R0R6eUc1M1ZydlpKMTZhMUhXdGhuOXFr?= =?utf-8?B?NVVFTFkra3M0aFlVUFZXL3FNM2xlT1hRV1B6Sml3QTNSaE1peFhKUFhVands?= =?utf-8?B?Q3g4ZW01WnhkTzVESFlXWlhTMGxKMHE5ZTEzRWNjNEY5RThLMFFhRGJBPT0=?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 13cddac6-58c8-4f3f-7169-08dc8637c4e2 X-MS-Exchange-CrossTenant-AuthSource: AS8P250MB0744.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jun 2024 14:48:46.6253 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU2P250MB0271 Subject: Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add SSE2 shuffle_bytes functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: James Almer: > And remove shuffle_bytes_2103_mmxext. > > shuffle_bytes_0321_c: 28.1 > shuffle_bytes_0321_sse2: 13.6 > shuffle_bytes_0321_ssse3: 9.6 > shuffle_bytes_0321_avx2: 7.1 > shuffle_bytes_1230_c: 52.6 > shuffle_bytes_1230_sse2: 12.1 > shuffle_bytes_1230_ssse3: 8.6 > shuffle_bytes_1230_avx2: 6.6 > shuffle_bytes_2103_c: 29.1 > shuffle_bytes_2103_mmxext: 29.3 // removed > shuffle_bytes_2103_sse2: 12.5 > shuffle_bytes_2103_ssse3: 8.6 > shuffle_bytes_2103_avx2: 7.1 > shuffle_bytes_3012_c: 52.1 > shuffle_bytes_3012_sse2: 12.1 > shuffle_bytes_3012_ssse3: 8.6 > shuffle_bytes_3012_avx2: 7.1 > shuffle_bytes_3210_c: 50.6 > shuffle_bytes_3210_sse2: 14.6 > shuffle_bytes_3210_ssse3: 8.6 > shuffle_bytes_3210_avx2: 7.1 > > Signed-off-by: James Almer > --- > libswscale/x86/rgb2rgb.c | 14 ++++-- > libswscale/x86/rgb_2_rgb.asm | 83 +++++++++++++++++++++++++----------- > 2 files changed, 69 insertions(+), 28 deletions(-) > > diff --git a/libswscale/x86/rgb2rgb.c b/libswscale/x86/rgb2rgb.c > index 21ccfafe51..9f6c8efc72 100644 > --- a/libswscale/x86/rgb2rgb.c > +++ b/libswscale/x86/rgb2rgb.c > @@ -116,7 +116,11 @@ DECLARE_ALIGNED(8, extern const uint64_t, ff_bgr2UVOffset); > > #endif /* HAVE_INLINE_ASM */ > > -void ff_shuffle_bytes_2103_mmxext(const uint8_t *src, uint8_t *dst, int src_size); > +void ff_shuffle_bytes_2103_sse2(const uint8_t *src, uint8_t *dst, int src_size); > +void ff_shuffle_bytes_0321_sse2(const uint8_t *src, uint8_t *dst, int src_size); > +void ff_shuffle_bytes_1230_sse2(const uint8_t *src, uint8_t *dst, int src_size); > +void ff_shuffle_bytes_3012_sse2(const uint8_t *src, uint8_t *dst, int src_size); > +void ff_shuffle_bytes_3210_sse2(const uint8_t *src, uint8_t *dst, int src_size); > void ff_shuffle_bytes_2103_ssse3(const uint8_t *src, uint8_t *dst, int src_size); > void ff_shuffle_bytes_0321_ssse3(const uint8_t *src, uint8_t *dst, int src_size); > void ff_shuffle_bytes_1230_ssse3(const uint8_t *src, uint8_t *dst, int src_size); > @@ -154,10 +158,12 @@ av_cold void rgb2rgb_init_x86(void) > rgb2rgb_init_avx(); > #endif /* HAVE_INLINE_ASM */ > > - if (EXTERNAL_MMXEXT(cpu_flags)) { > - shuffle_bytes_2103 = ff_shuffle_bytes_2103_mmxext; > - } > if (EXTERNAL_SSE2(cpu_flags)) { > + shuffle_bytes_2103 = ff_shuffle_bytes_2103_sse2; > + shuffle_bytes_0321 = ff_shuffle_bytes_0321_sse2; > + shuffle_bytes_1230 = ff_shuffle_bytes_1230_sse2; > + shuffle_bytes_3012 = ff_shuffle_bytes_3012_sse2; > + shuffle_bytes_3210 = ff_shuffle_bytes_3210_sse2; > #if ARCH_X86_64 > uyvytoyuv422 = ff_uyvytoyuv422_sse2; > #endif > diff --git a/libswscale/x86/rgb_2_rgb.asm b/libswscale/x86/rgb_2_rgb.asm > index 0bf1278718..9fc1974389 100644 > --- a/libswscale/x86/rgb_2_rgb.asm > +++ b/libswscale/x86/rgb_2_rgb.asm > @@ -25,7 +25,6 @@ > > SECTION_RODATA > > -pb_mask_shuffle2103_mmx times 8 dw 255 > pb_shuffle2103: db 2, 1, 0, 3, 6, 5, 4, 7, 10, 9, 8, 11, 14, 13, 12, 15 > pb_shuffle0321: db 0, 3, 2, 1, 4, 7, 6, 5, 8, 11, 10, 9, 12, 15, 14, 13 > pb_shuffle1230: db 1, 2, 3, 0, 5, 6, 7, 4, 9, 10, 11, 8, 13, 14, 15, 12 > @@ -50,11 +49,50 @@ SECTION .text > ;------------------------------------------------------------------------------ > ; shuffle_bytes_2103_mmext (const uint8_t *src, uint8_t *dst, int src_size) > ;------------------------------------------------------------------------------ > -INIT_MMX mmxext > -cglobal shuffle_bytes_2103, 3, 5, 8, src, dst, w, tmp, x > - mova m6, [pb_mask_shuffle2103_mmx] > - mova m7, m6 > - psllq m7, 8 > + > +%macro SHUFFLE2103_SSE2 0 > + pshuflw m1, m0, 0xb1 > + pshufhw m1, m1, 0xb1 > + > + pand m0, m3 > + pand m1, m2 > +%endmacro > + > +%macro SHUFFLE0321_SSE2 0 > + pshuflw m1, m0, 0xb1 > + pshufhw m1, m1, 0xb1 > + > + pand m0, m2 > + pand m1, m3 > +%endmacro > + > +%macro SHUFFLE1230_SSE2 0 > + pslld m1, m0, 24 > + psrld m0, 8 > +%endmacro > + > +%macro SHUFFLE3012_SSE2 0 > + pslld m1, m0, 8 > + psrld m0, 24 > +%endmacro > + > +%macro SHUFFLE3210_SSE2 0 > + pshuflw m1, m0, 0xb1 > + pshufhw m1, m1, 0xb1 > + > + psrlw m0, m1, 8 > + psllw m1, 8 > +%endmacro > + > +; %1-4 index shuffle > +; %5 load mask > +%macro SHUFFLE_BYTES_SSE2 5 > +cglobal shuffle_bytes_%1%2%3%4, 3, 5, 4, src, dst, w, tmp, x > +%if %5 > + pcmpeqw m2, m2 > + psllw m3, m2, 8 ; (word) { 0xff00 } x4 > + psrlw m2, 8 ; (word) { 0x00ff } x4 > +%endif > > movsxdifnidn wq, wd > mov xq, wq > @@ -68,13 +106,13 @@ cglobal shuffle_bytes_2103, 3, 5, 8, src, dst, w, tmp, x > je .loop_simd > > .loop_scalar: > - mov tmpb, [srcq + wq + 2] > + mov tmpb, [srcq + wq + %1] > mov [dstq+wq + 0], tmpb > - mov tmpb, [srcq + wq + 1] > + mov tmpb, [srcq + wq + %2] > mov [dstq+wq + 1], tmpb > - mov tmpb, [srcq + wq + 0] > + mov tmpb, [srcq + wq + %3] > mov [dstq+wq + 2], tmpb > - mov tmpb, [srcq + wq + 3] > + mov tmpb, [srcq + wq + %4] > mov [dstq+wq + 3], tmpb > add wq, 4 > sub xq, 4 > @@ -86,29 +124,26 @@ jge .end > > .loop_simd: > movu m0, [srcq+wq] > - movu m1, [srcq+wq+8] > - > - pshufw m3, m0, 177 > - pshufw m5, m1, 177 > - > - pand m0, m7 > - pand m3, m6 > > - pand m1, m7 > - pand m5, m6 > + SHUFFLE%1%2%3%4_SSE2 > > - por m0, m3 > - por m1, m5 > + por m0, m1 > > movu [dstq+wq], m0 > - movu [dstq+wq + 8], m1 > > - add wq, mmsize*2 > + add wq, mmsize > jl .loop_simd > > .end: > - emms > RET > +%endmacro > + > +INIT_XMM sse2 > +SHUFFLE_BYTES_SSE2 2, 1, 0, 3, 1 > +SHUFFLE_BYTES_SSE2 0, 3, 2, 1, 1 > +SHUFFLE_BYTES_SSE2 1, 2, 3, 0, 0 > +SHUFFLE_BYTES_SSE2 3, 0, 1, 2, 0 > +SHUFFLE_BYTES_SSE2 3, 2, 1, 0, 0 > > ;------------------------------------------------------------------------------ > ; shuffle_bytes_## (const uint8_t *src, uint8_t *dst, int src_size) How old are the youngest processors with SSE2, but without SSSE3? According to Wikipedia, nearly 15 years. Which makes me believe that the SSE2 versions are not worth it (how many of these CPUs will use a new FFmpeg anyway?). - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".