From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 2C4304CC32 for ; Sat, 25 Jan 2025 14:26:10 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EE9F668AF22; Sat, 25 Jan 2025 16:26:06 +0200 (EET) Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A1B4F68AF22 for ; Sat, 25 Jan 2025 16:26:00 +0200 (EET) Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-2166651f752so55677105ad.3 for ; Sat, 25 Jan 2025 06:26:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737815158; x=1738419958; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=W9leEY935GenABobctIHIOUUJpDJyyW4GAJmbChmCoA=; b=KmeHF4bzUZAy+2NUTs0lXMAwb4x+UZrE7h095md1hBLpVoPcjVDJEgzip2d8VI7ZnB vcsAcXcN30tCzKZHRztH/1Ymx41vG2gL2GKVnDI7/YYwgbzR3M+DxsEuwHyq95Xx/MLO AAOa8kO9l7l1YMtdePIKvzGbKKpgnpSm4ce3bB7MUtrUFMpmVDOsxKtBiTRBtZ+bZkIP A0UspudN7a3LxuN6u8dQZNvv2N9Q9FS8PEIoMK23oIIjCP4tDcuGvjZUihmh6PBAe3ZA DSBHRcfdUthhaeu+RghejvgKtnHEvEXJXA9wdmJsI7deFZooeJs2is2fzRUUMysDjBM1 5tcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737815158; x=1738419958; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W9leEY935GenABobctIHIOUUJpDJyyW4GAJmbChmCoA=; b=jbJhqSOgiSuSSwY3RMmpW7aAys8G9/kcwqBvTmYDt+5lMzYwzI0V25OkAbR/DCufYW 0ai2ozx/Bx/dD4+q91vaAzqF5/kwba/RUGJj81Sc/Z0h+Eyj/aeNhio/+ZWKl5ka9y7E XpJPJvydfpAA5kNjgSLwPBs4ik/yNXHHyZChROZFiwa98nJhCt8ISR9xOHbZnQmNhjnD bRygBMehkOV9YB0jvvAs+sPSnXhsR43GTYzS9e/bGQgYz9/xayrEn72vRZx9LYp+UdE5 TKid/CrAH+tmMN6W/3vS9E7WZamLjlASNXZe4TFHpX2rO9eFs1Qv9g1BjWHu0QkxXTHE dQRA== X-Gm-Message-State: AOJu0YzVzDzv0zDB2WFfw/gzJottGgprmdw1zwlbl6Tw9/xilitWpZiS furxye0SlciZ/tVsDgOYLam7/OyBFLTAqHbc3f3vZOWZlawsSDXatciqyA== X-Gm-Gg: ASbGncvOy9wByWxlHxXDdWlhz4kgEjct5IXJC0phT0bikO3iz4XRkeh/YP23gTYUwhM 6lXkDBflsCSkFYEDOYFFX3lIFjM0ih380tEQLU1C2LzeMGOAWz9mVPJWmCSNwQ/5IpkLMe4u1sx lJYswT0CKFsmZXC608vlcz1gmTS2QphCwKY9HB+e6gQiF3qB/qTY8IP5ULII8UHTAKnIOBjldrC jgSIz4j8L9BNoGVotqjbVHd6UkTMJUScZDO6GHwmOj7WbZT/vQSmJdRtGaOoTkp21VD/7weXVHp ugf2oF6Y0cLt+AhyLE5zsLMqUpM= X-Google-Smtp-Source: AGHT+IFjJl9lpaVV+FKVIqjmohTvNmo/dQseIV3S7i+BD4wR6fQTqAfIYAWMxCA2RyETikwe/pFBHw== X-Received: by 2002:a17:903:2447:b0:216:725c:a12c with SMTP id d9443c01a7336-21c353eefd0mr451549745ad.9.1737815158020; Sat, 25 Jan 2025 06:25:58 -0800 (PST) Received: from localhost.localdomain ([106.51.30.183]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21da414e2f6sm32750425ad.197.2025.01.25.06.25.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 25 Jan 2025 06:25:57 -0800 (PST) From: Shreesh Adiga <16567adigashreesh@gmail.com> To: ffmpeg-devel@ffmpeg.org Date: Sat, 25 Jan 2025 19:55:46 +0530 Message-ID: <20250125142546.1244665-1-16567adigashreesh@gmail.com> X-Mailer: git-send-email 2.45.3 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com> --- libswscale/x86/rgb2rgb.c | 21 +++++++++++++++++++++ libswscale/x86/rgb_2_rgb.asm | 28 ++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+) diff --git a/libswscale/x86/rgb2rgb.c b/libswscale/x86/rgb2rgb.c index 6790551a38..4cbed54b35 100644 --- a/libswscale/x86/rgb2rgb.c +++ b/libswscale/x86/rgb2rgb.c @@ -2364,6 +2364,16 @@ void ff_shuffle_bytes_2013_avx2(const uint8_t *src, uint8_t *dst, int src_size); void ff_shuffle_bytes_2130_avx2(const uint8_t *src, uint8_t *dst, int src_size); void ff_shuffle_bytes_1203_avx2(const uint8_t *src, uint8_t *dst, int src_size); +void ff_shuffle_bytes_2103_avx512icl(const uint8_t *src, uint8_t *dst, int src_size); +void ff_shuffle_bytes_0321_avx512icl(const uint8_t *src, uint8_t *dst, int src_size); +void ff_shuffle_bytes_1230_avx512icl(const uint8_t *src, uint8_t *dst, int src_size); +void ff_shuffle_bytes_3012_avx512icl(const uint8_t *src, uint8_t *dst, int src_size); +void ff_shuffle_bytes_3210_avx512icl(const uint8_t *src, uint8_t *dst, int src_size); +void ff_shuffle_bytes_3102_avx512icl(const uint8_t *src, uint8_t *dst, int src_size); +void ff_shuffle_bytes_2013_avx512icl(const uint8_t *src, uint8_t *dst, int src_size); +void ff_shuffle_bytes_2130_avx512icl(const uint8_t *src, uint8_t *dst, int src_size); +void ff_shuffle_bytes_1203_avx512icl(const uint8_t *src, uint8_t *dst, int src_size); + void ff_uyvytoyuv422_sse2(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, const uint8_t *src, int width, int height, int lumStride, int chromStride, int srcStride); @@ -2454,6 +2464,17 @@ av_cold void rgb2rgb_init_x86(void) shuffle_bytes_2130 = ff_shuffle_bytes_2130_avx2; shuffle_bytes_1203 = ff_shuffle_bytes_1203_avx2; } + if (EXTERNAL_AVX512ICL(cpu_flags)) { + shuffle_bytes_0321 = ff_shuffle_bytes_0321_avx512icl; + shuffle_bytes_2103 = ff_shuffle_bytes_2103_avx512icl; + shuffle_bytes_1230 = ff_shuffle_bytes_1230_avx512icl; + shuffle_bytes_3012 = ff_shuffle_bytes_3012_avx512icl; + shuffle_bytes_3210 = ff_shuffle_bytes_3210_avx512icl; + shuffle_bytes_3102 = ff_shuffle_bytes_3102_avx512icl; + shuffle_bytes_2013 = ff_shuffle_bytes_2013_avx512icl; + shuffle_bytes_2130 = ff_shuffle_bytes_2130_avx512icl; + shuffle_bytes_1203 = ff_shuffle_bytes_1203_avx512icl; + } if (EXTERNAL_AVX2_FAST(cpu_flags)) { uyvytoyuv422 = ff_uyvytoyuv422_avx2; #endif diff --git a/libswscale/x86/rgb_2_rgb.asm b/libswscale/x86/rgb_2_rgb.asm index b468beb12d..64b0988c4a 100644 --- a/libswscale/x86/rgb_2_rgb.asm +++ b/libswscale/x86/rgb_2_rgb.asm @@ -64,6 +64,18 @@ cglobal shuffle_bytes_%1%2%3%4, 3, 5, 2, src, dst, w, tmp, x add dstq, wq neg wq +%if mmsize == 64 + and xq, mmsize-4 + shr xq, 2 + mov tmpd, -1 + shlx tmpd, tmpd, xd + not tmpd + kmovw k7, tmpw + vmovdqu32 m1{k7}{z}, [srcq + wq] + pshufb m1, m0 + vmovdqu32 [dstq + wq]{k7}, m1 + lea wq, [wq + 4 * xq] +%else ;calc scalar loop and xq, mmsize-4 je .loop_simd @@ -80,6 +92,7 @@ cglobal shuffle_bytes_%1%2%3%4, 3, 5, 2, src, dst, w, tmp, x add wq, 4 sub xq, 4 jg .loop_scalar +%endif ;check if src_size < mmsize cmp wq, 0 @@ -122,6 +135,21 @@ SHUFFLE_BYTES 1, 2, 0, 3 %endif %endif +%if ARCH_X86_64 +%if HAVE_AVX512ICL_EXTERNAL +INIT_ZMM avx512icl +SHUFFLE_BYTES 2, 1, 0, 3 +SHUFFLE_BYTES 0, 3, 2, 1 +SHUFFLE_BYTES 1, 2, 3, 0 +SHUFFLE_BYTES 3, 0, 1, 2 +SHUFFLE_BYTES 3, 2, 1, 0 +SHUFFLE_BYTES 3, 1, 0, 2 +SHUFFLE_BYTES 2, 0, 1, 3 +SHUFFLE_BYTES 2, 1, 3, 0 +SHUFFLE_BYTES 1, 2, 0, 3 +%endif +%endif + ;----------------------------------------------------------------------------------------------- ; uyvytoyuv422(uint8_t *ydst, uint8_t *udst, uint8_t *vdst, ; const uint8_t *src, int width, int height, -- 2.45.3 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".