From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 605CE4B470 for ; Wed, 5 Jun 2024 21:01:11 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 051F768D67A; Thu, 6 Jun 2024 00:01:10 +0300 (EEST) Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02olkn2105.outbound.protection.outlook.com [40.92.50.105]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id B240368C2E3 for ; Thu, 6 Jun 2024 00:01:02 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AI7L0+uZ6B/UAjsWiQ1UBjc+rrm9aVnHndEYBwm70WIilmA4bzqaoZUrXvT8fnLwSSa3RjpAsGqpkzbDgmrtpIBAiRblbJ3NVWkK1rCeugg80h4PDgGVTbJ6OjYoHJyhYO3hFUzzQNC9RdaCaEjMiXmZM2w3ZZXwZXtZaCC1jKlVChFZXCcxWHqE+qZ2QFPo1JVpuvgobqPVVEQlqpGzOiAeZmREqdOYbZeMCUu7N/mBQQTbKr+L385YZv03m6tyvx3SAkf7qPrsorlXGWf2/Qfc9Boe3EQ0QtsOWubs9G7qI9NVt/zGuBvtfWQhrVPlP6wWEUSN9tAxThGgXCmirw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bGgtyqEdb+FQRm56AO9m0Z0oYzFNCeTxpESkefmW7Uk=; b=GDUjchPfgtSU1HtNioRp6gCp4ImtBHAS2oOOxNXrg6R+PXYmFZd49t9iPLCONf97SAgVzZiTDRwyeo0VkgWLyGAIGuQxeFTiDNWymNkoh0lmXd4HD4cqIvVNiX1AB2BSF2oVncSrIQTWgSb/2yN2cj0mqdLNWnO8jVri+p2OkbxTC6O12MSrBIcBrnwon/1yB5HfWI3jAVD/zOxml+H44KhJkw36Izs9dZ7sPAeTRqh+dCqwzpgSU5ZLCWsMryLsCQ6/sPYm3yV1CcMT5O9oMIK2hM4Hf8O1rQwx2hesKv48ARIaeVAHL9drBZvsSc4wbBvnMHpHCvw81bRiluISzw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bGgtyqEdb+FQRm56AO9m0Z0oYzFNCeTxpESkefmW7Uk=; b=jeyd0D08m0xdmVX9pyaBCiJCRB7AV1YcU9prJgDbDgsqG5OC27iQTrXvZbCGGaPwX3sVUuW0QezmaZM25wgAF4+h4rzbARwfznEJ6AKX7FRe/9QSHjK191XgZmdzMxZMbSoOOO2QkQwSBTf5G6EIeAzbzJX8lQhH7SPhMOHkyMElhtn121dAaHrDAcNUpOOyFJQxuaQ0bf509Mcp9YEvKS0qfEfEau6y1NYM/bQkA1wKQpW7fJ/O8jZqXjLvVLr/7AwoO5RvJLC7ymoZjBXWfWCd323JNX8rxLJQoOGMM3qXkn21ImP7VICiNgd4ohHpfgFbmerNAofC4DgbC6GfFg== Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) by AS4P250MB0535.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:4ba::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.23; Wed, 5 Jun 2024 21:01:00 +0000 Received: from GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614]) by GV1P250MB0737.EURP250.PROD.OUTLOOK.COM ([fe80::d6a1:e3af:a5f1:b614%4]) with mapi id 15.20.7633.021; Wed, 5 Jun 2024 21:00:59 +0000 Message-ID: Date: Wed, 5 Jun 2024 23:00:57 +0200 User-Agent: Mozilla Thunderbird To: ffmpeg-devel@ffmpeg.org References: <20240605202853.3135-1-jamrial@gmail.com> <20240605202853.3135-2-jamrial@gmail.com> Content-Language: en-US From: Andreas Rheinhardt In-Reply-To: <20240605202853.3135-2-jamrial@gmail.com> X-TMN: [c/noXCIrkcPlBBaly4VVI8WoDrpDxSFk7682E+WdVi4=] X-ClientProxiedBy: ZR0P278CA0128.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:40::7) To GV1P250MB0737.EURP250.PROD.OUTLOOK.COM (2603:10a6:150:8e::17) X-Microsoft-Original-Message-ID: <68a087b1-ed6b-4e17-a905-b7467e43a7c5@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: GV1P250MB0737:EE_|AS4P250MB0535:EE_ X-MS-Office365-Filtering-Correlation-Id: 2e9dd6f1-c3e4-4314-97c8-08dc85a29a24 X-Microsoft-Antispam: BCL:0;ARA:14566002|461199019|440099019|3412199016; X-Microsoft-Antispam-Message-Info: v6Gam1I6E+zAuVHAmJPzBkOFStRU/BeVwnZ+yWegjti64IpPlPT9antDgqfqRoIw2/dwCk8G6RuDYHnJQEbVaqSucmov1ucHNsaL7BIcemgzZCepwItXLgb4RINJSaRLtoRXh/z0niE3rBq2d4+VsHznDSQaftJZd18NsHB1Ekj/ANr5QV6tiyovLK+vteE6c3If5yRFShn0HKsMN9j+JvD6+VRB/lZm7OELHKxKKe3UIhzZ1d8yAc++yolCgkigi3akrcBYkUxbHB0J1zog8/QNx+qOubTU+jWo3AH4TTfK+kGl2P2R70hc7CmSAyAZ7A0OhXGPY7O2YGyhFk/MB3g1TYnZ5sW0SeUKJidxGkV/i2BUio8to7k6zE002cxTRF+APND/UxcmX2bQTNQ8hPjk/2Ew9aD+8exg4r/Dx8w+4TZPSHyCkZ3JMTuiuTulk/oKd68KeKSjJsvFKEqkxJZ7bYn66k7WqsriChHScIEcUi+11YdXzJ/lsj7XImiqu0qwPQJtrPSZFuvuFfA5HJiRY7TMovpLzOnOr7IVSfpJaP88p+1e8KVu9SXJqp3z X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?K05lSEdqV0dtUlpIRnFEdmk2d3ZSZFM1YVcyd0g5ZERFMU9QZ3F4YnUxUWN5?= =?utf-8?B?WlR3Mkw5MGdmalZjSWJ5b1dRNTZCWm5VVmd1bVVjQk5tY3RRbXM1QkVuZnZs?= =?utf-8?B?bzY4ZEZQQVpEdnBCeVhGOHdhbjA3THhlR1NPTmgrN1ArUXJCVkIzREl2N0Va?= =?utf-8?B?YUV4L3dlbThCNms1ODNFTURzQlRSallNQ2I2NE9oemlYTExMUmNxUmcxSjZx?= =?utf-8?B?cDBuM0FMZGZLNW1BKzRqa1lOc3RxUEhOR3Y5eldFeUFPbXpQRktmZjhSNTJZ?= =?utf-8?B?NHZpTE11SExPZEVsRmM4dlhTNXdIMERDSEd6TFM1VzhMbXROZjUzcE1PUEYy?= =?utf-8?B?RDRWNnpJOEVRNTJhNGhQaTEwcUJsaThnNFFtb1cySllXTmZhdUF3VTRneWhD?= =?utf-8?B?RTJTc1I0U1VjSjdyTmRaOXFEaHpEZjJ3ZzZ5bDF6MS9HR3VhWWxVRmhxcXgz?= =?utf-8?B?SXFZc3llRFVKWHBUK05tTFNURFBqc2JySWJXSzVaMStlU2k3dE1RTVRpZFVB?= =?utf-8?B?R1ZlOExLREtmM1didk5kaTVPZDNWcWppSmRERjBnRDh6N0o1VlEwazVVUTE0?= =?utf-8?B?aEdmTklPdDh1RFdtcFQzeXZpemhZVi9IeFI2a3czdElvUHBmNTFZendXWXpZ?= =?utf-8?B?TFRvVHZYVXlNNzB0Q3lmaks0czB5b0wvekJ2S0d5em0yN2xZT3JmdTdZelFi?= =?utf-8?B?ck9RTUMvMkJ0bmRNZUt6WEVmQWxYVWp5UWt3OHRRL0EzejArRWVGVUpUcUtD?= =?utf-8?B?enZ0MGtSVEw5bUdWejRCQy9DdjQvdDJVaklObk5nS1NNQUZIcHNybVZBRDJi?= =?utf-8?B?dmp0ajUwNTlMZHZsTDBwU1E4eWFQZk9ZQkl0QzJGTnh3VlB0RnVKWEVIMWky?= =?utf-8?B?OVZENTF1bjdKaXFPUEZtT1Q3bnFqdkZoR2UvTVc3T3k4YTdHQjltc29heVFP?= =?utf-8?B?UThhN2VBeE83VERLTU1Xbkw2ajFHa0RpM1dIQ2plOStsY2kxa3pxb2U5OWov?= =?utf-8?B?d2NZb0wzTC9oQytEM3RsalB1UFluSG5vcmFQeHFwbVBtSHU5NGtFNXFhQ0M4?= =?utf-8?B?T0pIUUpWYWltU09Ba1dXZUNKMzh4NU5wUzh3U0ZoMGdpMzN2RDM5VUJwdWl3?= =?utf-8?B?WXZQZ0JoY3h6bzhWS01EYzNpREx4bWFRdk9zck1VMkZpY21IZVZZNUNTWHFm?= =?utf-8?B?NFFwTlBqelYvQ0ZvdnpaTTc3T1Q1WmNVYlM2RU82bkoxbTArWnhnUVBnOTVJ?= =?utf-8?B?c2E5cGFhQVpzVXMzSlNCaDVHQlQzTy90QTRHVWx3c1pCRGdraytQUFF5SnNV?= =?utf-8?B?Yno1VkpjR0tTYzN1MFVnTDRjUTNFbzBvWnB2RHRZbVQ5ZnJmd1Q4djRWSXRT?= =?utf-8?B?SEZGN2VjNFhEdjZISW9iclh0L1NLOElpTjFESVdONXN5U2ZJSStQYkdZQ1pP?= =?utf-8?B?eWQ1L0NjQXpjRnVDaDF3TFRFZksrTmRVVHZpZVRwZVkyMnRacm1OYWgyTExF?= =?utf-8?B?NlVvaTBOV3dMUjVlbHoxZVBWdDRldFVNcmdVT0sveDQ0V1kySmtCakFJR2FD?= =?utf-8?B?Z216ZTJWNUhvTXJ1OXFtMEo1YjFKdjJYM2E3M2JIYXQwaDZyUlpmZnhOd2Rm?= =?utf-8?B?cGJTUlFwNy8razZyM3FySWxwRmxHQ2JYd0RnYWEwSEIwVHNYUms2eFp1a0Ex?= =?utf-8?B?MXE2bUsyRXFTUVNTNzlZcFVSWVE3QkdXQ21ueXZoR3VBMjRha0ZTSXd3PT0=?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2e9dd6f1-c3e4-4314-97c8-08dc85a29a24 X-MS-Exchange-CrossTenant-AuthSource: GV1P250MB0737.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Jun 2024 21:00:59.8853 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS4P250MB0535 Subject: Re: [FFmpeg-devel] [PATCH 2/2] swscale/x86/input: add AVX2 optimized uyvytoyuv422 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: James Almer: > uyvytoyuv422_c: 23991.8 > uyvytoyuv422_sse2: 2817.8 > uyvytoyuv422_avx: 2819.3 Why don't you nuke the avx version in a follow-up patch? > uyvytoyuv422_avx2: 1972.3 > > Signed-off-by: James Almer > --- > libswscale/x86/rgb2rgb.c | 6 ++++++ > libswscale/x86/rgb_2_rgb.asm | 32 ++++++++++++++++++++++++-------- > 2 files changed, 30 insertions(+), 8 deletions(-) > > diff --git a/libswscale/x86/rgb2rgb.c b/libswscale/x86/rgb2rgb.c > index b325e5dbd5..21ccfafe51 100644 > --- a/libswscale/x86/rgb2rgb.c > +++ b/libswscale/x86/rgb2rgb.c > @@ -136,6 +136,9 @@ void ff_uyvytoyuv422_sse2(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, > void ff_uyvytoyuv422_avx(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, > const uint8_t *src, int width, int height, > int lumStride, int chromStride, int srcStride); > +void ff_uyvytoyuv422_avx2(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, > + const uint8_t *src, int width, int height, > + int lumStride, int chromStride, int srcStride); > #endif > > av_cold void rgb2rgb_init_x86(void) > @@ -177,5 +180,8 @@ av_cold void rgb2rgb_init_x86(void) > if (EXTERNAL_AVX(cpu_flags)) { > uyvytoyuv422 = ff_uyvytoyuv422_avx; > } > + if (EXTERNAL_AVX2_FAST(cpu_flags)) { > + uyvytoyuv422 = ff_uyvytoyuv422_avx2; > + } > #endif > } > diff --git a/libswscale/x86/rgb_2_rgb.asm b/libswscale/x86/rgb_2_rgb.asm > index 76ca1eec03..0bf1278718 100644 > --- a/libswscale/x86/rgb_2_rgb.asm > +++ b/libswscale/x86/rgb_2_rgb.asm > @@ -34,13 +34,16 @@ pb_shuffle3210: db 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 > > SECTION .text > > -%macro RSHIFT_COPY 3 > +%macro RSHIFT_COPY 5 > ; %1 dst ; %2 src ; %3 shift > -%if cpuflag(avx) > - psrldq %1, %2, %3 > +%if mmsize == 32 > + vperm2i128 %1, %2, %3, %5 > + RSHIFT %1, %4 > +%elif cpuflag(avx) > + psrldq %1, %2, %4 > %else > mova %1, %2 > - RSHIFT %1, %3 > + RSHIFT %1, %4 > %endif > %endmacro > > @@ -233,26 +236,37 @@ cglobal uyvytoyuv422, 9, 14, 8, ydst, udst, vdst, src, w, h, lum_stride, chrom_s > jge .end_line > > .loop_simd: > +%if mmsize == 32 > + movu xm2, [srcq + wtwoq ] > + movu xm3, [srcq + wtwoq + 16 ] > + movu xm4, [srcq + wtwoq + 16 * 2] > + movu xm5, [srcq + wtwoq + 16 * 3] > + vinserti128 m2, m2, [srcq + wtwoq + 16 * 4], 1 > + vinserti128 m3, m3, [srcq + wtwoq + 16 * 5], 1 > + vinserti128 m4, m4, [srcq + wtwoq + 16 * 6], 1 > + vinserti128 m5, m5, [srcq + wtwoq + 16 * 7], 1 > +%else > movu m2, [srcq + wtwoq ] > movu m3, [srcq + wtwoq + mmsize ] > movu m4, [srcq + wtwoq + mmsize * 2] > movu m5, [srcq + wtwoq + mmsize * 3] > +%endif > > ; extract y part 1 > - RSHIFT_COPY m6, m2, 1 ; UYVY UYVY -> YVYU YVY... > + RSHIFT_COPY m6, m2, m4, 1, 0x20 ; UYVY UYVY -> YVYU YVY... > pand m6, m1; YxYx YxYx... > > - RSHIFT_COPY m7, m3, 1 ; UYVY UYVY -> YVYU YVY... > + RSHIFT_COPY m7, m3, m5, 1, 0x20 ; UYVY UYVY -> YVYU YVY... > pand m7, m1 ; YxYx YxYx... > > packuswb m6, m7 ; YYYY YYYY... > movu [ydstq + wq], m6 > > ; extract y part 2 > - RSHIFT_COPY m6, m4, 1 ; UYVY UYVY -> YVYU YVY... > + RSHIFT_COPY m6, m4, m2, 1, 0x13 ; UYVY UYVY -> YVYU YVY... > pand m6, m1; YxYx YxYx... > > - RSHIFT_COPY m7, m5, 1 ; UYVY UYVY -> YVYU YVY... > + RSHIFT_COPY m7, m5, m3, 1, 0x13 ; UYVY UYVY -> YVYU YVY... > pand m7, m1 ; YxYx YxYx... > > packuswb m6, m7 ; YYYY YYYY... > @@ -309,4 +323,6 @@ UYVY_TO_YUV422 > > INIT_XMM avx > UYVY_TO_YUV422 > +INIT_YMM avx2 > +UYVY_TO_YUV422 > %endif _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".