From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <ffmpeg-devel-bounces@ffmpeg.org> Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id CFE294C1FF for <ffmpegdev@gitmailbox.com>; Mon, 7 Apr 2025 16:26:58 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 69869687D54; Mon, 7 Apr 2025 19:26:54 +0300 (EEST) Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7C1A2687CDC for <ffmpeg-devel@ffmpeg.org>; Mon, 7 Apr 2025 19:26:47 +0300 (EEST) Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-225477548e1so42674405ad.0 for <ffmpeg-devel@ffmpeg.org>; Mon, 07 Apr 2025 09:26:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744043205; x=1744648005; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=N8cbKId4V0oG4dN8GqNXO2wxvJCBFrYMFTPD/y369hw=; b=MKMgdra9+L1RTq+n/kX1GOW0/K5/vaigNyHBkPlI4i3it3s0MN4AL5ctyu65UlPO0O GqWJ6tWi002/wWp6ORrbL/nMCuiYEl+JY9GkmAd4yK8dWKVMzo4ty4+XjY2ts++RBPvW ih7lVlCeXvXjAk1NwHNLPaDgvpQllqJChIzW4M/Ma1o/IDto5NbYaxXErhjlgPELUldU renE1hQqMSUPYudIwYbG+Ar8MnB1xEjX29mmgUtso0M8ips6G08yOSTq/O/qzONahJAG HWr6paI79noDHaPPYwczV2Vjuj/ntICfk6cNn+k+5jSiYGz353v9ssGhpUrjnbt5FoM+ u7+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744043205; x=1744648005; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N8cbKId4V0oG4dN8GqNXO2wxvJCBFrYMFTPD/y369hw=; b=QoL6Snxdi4H4is0BWSk8M3kzGR6Uz/vARMMHFI4ybAVvMi3ZiN9PspnzZO4gfSoID+ aEE+2ATKBuxBRtEtSftL871FsyXQB3dJgpXOQYCzNjFXY4pfVbXWp39ozYvXJD9iK9AJ s6j9tOGV/OO7wm5fKCGmEf9LYE1AX/yQNL6Q+EvCTIzbowrPmAGAGJ6sK2yHACVKqnWe owK04KEAZZWpyNNdyKhJXFwGZncNOV1ET2ePQ76NSIVFy8Fe/mHkVWnqwrvXsl/rBD/N 5s2iXRQ92qopdoAq+FmPLm/OmNVcFh+ZVtwFaW1AhgjMODFLHcQZSwA+yC1if9sM1WPL tx/g== X-Gm-Message-State: AOJu0Yyjq8TCt1b82NP1J2+fQ9CaEboTdTcJv2rgVxTVq7P8myKltHO2 QYkery53eg7TqfSMqisuIOb6q/erES7d3PnaB52uRRWDQNuLJcZIiZlgoA== X-Gm-Gg: ASbGnctletGz1fXAHksYBOks1TsxuPNgh6rI1gda7PoG2JvVg9eQ7wpnpvxZTzqlNdF EJuUjntWQhK9ZeDE2pC/BT9dKxwIP1JwrpVd4V/87IbU3pxTJUuGE7cpZisDiRRaJir6M0J9yJa Ay36607Gjbw/1lLGieRQbIqD/HZFkWcrWAbC/PY51v9pUB0X4nDXOcFj4YA2J8eTSd55H4cvcAz ht3W6RUd6DKEqMnVeNH438/bYA7AUozjF6VC0f6S0HdSND5kIHnu8JZB5jMsmFM6bzqGX7xCcMn PBlJxay3SQCXDQOpQUH8zw5GbEMxOkr7LAsVrksf28UBL3s7h7NaXgVvmx9kDd9jNf/q55gqVBw = X-Google-Smtp-Source: AGHT+IHicWFzQ8OZb5mTxzkWZ0RTkCfPZkm5qFB8E6m+hKWUbqa+2mev2RmK1PN8vPP0wb7boiIing== X-Received: by 2002:a17:903:3bc6:b0:223:5525:6239 with SMTP id d9443c01a7336-22a8a8cef53mr191597495ad.38.1744043204922; Mon, 07 Apr 2025 09:26:44 -0700 (PDT) Received: from localhost.localdomain ([2800:2121:b000:82e:104f:6880:202d:b978]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2297866e24dsm83467355ad.172.2025.04.07.09.26.43 for <ffmpeg-devel@ffmpeg.org> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Apr 2025 09:26:44 -0700 (PDT) From: James Almer <jamrial@gmail.com> To: ffmpeg-devel@ffmpeg.org Date: Mon, 7 Apr 2025 13:26:32 -0300 Message-ID: <20250407162632.1142-1-jamrial@gmail.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250407000004.7306-1-jamrial@gmail.com> References: <20250407000004.7306-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] avutil/x86/aes: remove a few branches X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org> List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>, <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe> List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel> List-Post: <mailto:ffmpeg-devel@ffmpeg.org> List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help> List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>, <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe> Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org> Archived-At: <https://master.gitmailbox.com/ffmpegdev/20250407162632.1142-1-jamrial@gmail.com/> List-Archive: <https://master.gitmailbox.com/ffmpegdev/> List-Post: <mailto:ffmpegdev@gitmailbox.com> The rounds value is constant and can be one of three hardcoded values, so instead of checking it on every loop, just split the function into three different implementations for each value. Before: aes_decrypt_128_aesni: 93.8 (47.58x) aes_decrypt_192_aesni: 106.9 (49.30x) aes_decrypt_256_aesni: 109.8 (56.50x) aes_encrypt_128_aesni: 93.2 (47.70x) aes_encrypt_192_aesni: 111.1 (48.36x) aes_encrypt_256_aesni: 113.6 (56.27x) After: aes_decrypt_128_aesni: 71.5 (63.31x) aes_decrypt_192_aesni: 96.8 (55.64x) aes_decrypt_256_aesni: 106.1 (58.51x) aes_encrypt_128_aesni: 81.3 (55.92x) aes_encrypt_192_aesni: 91.2 (59.78x) aes_encrypt_256_aesni: 109.0 (58.26x) Signed-off-by: James Almer <jamrial@gmail.com> --- libavutil/aes.c | 3 +-- libavutil/x86/aes.asm | 24 +++++++++++++----------- libavutil/x86/aes_init.c | 22 ++++++++++++++++++---- 3 files changed, 32 insertions(+), 17 deletions(-) diff --git a/libavutil/aes.c b/libavutil/aes.c index 5f31412149..52a250bc00 100644 --- a/libavutil/aes.c +++ b/libavutil/aes.c @@ -234,6 +234,7 @@ int av_aes_init(AVAES *a, const uint8_t *key, int key_bits, int decrypt) int KC = key_bits >> 5; int rounds = KC + 6; + a->rounds = rounds; a->crypt = decrypt ? aes_decrypt : aes_encrypt; if (ARCH_X86) ff_init_aes_x86(a, decrypt); @@ -243,8 +244,6 @@ int av_aes_init(AVAES *a, const uint8_t *key, int key_bits, int decrypt) if (key_bits != 128 && key_bits != 192 && key_bits != 256) return AVERROR(EINVAL); - a->rounds = rounds; - memcpy(tk, key, KC * 4); memcpy(a->round_key[0].u8, key, KC * 4); diff --git a/libavutil/x86/aes.asm b/libavutil/x86/aes.asm index 7084c46055..e985a94685 100644 --- a/libavutil/x86/aes.asm +++ b/libavutil/x86/aes.asm @@ -26,12 +26,11 @@ SECTION .text ; void ff_aes_decrypt(AVAES *a, uint8_t *dst, const uint8_t *src, ; int count, uint8_t *iv, int rounds) ;----------------------------------------------------------------------------- -%macro AES_CRYPT 1 -cglobal aes_%1rypt, 6,6,2 +%macro AES_CRYPT 2 +cglobal aes_%1rypt_%2, 5, 5, 2 test r3d, r3d je .ret shl r3d, 4 - add r5d, r5d add r0, 0x60 add r2, r3 add r1, r3 @@ -45,16 +44,15 @@ cglobal aes_%1rypt, 6,6,2 %ifidn %1, enc pxor m0, m1 %endif - pxor m0, [r0+8*r5-0x60] - cmp r5d, 24 - je .rounds12 - jl .rounds10 + pxor m0, [r0+8*2*%2-0x60] +%if %2 > 12 aes%1 m0, [r0+0x70] aes%1 m0, [r0+0x60] -.rounds12: +%endif +%if %2 > 10 aes%1 m0, [r0+0x50] aes%1 m0, [r0+0x40] -.rounds10: +%endif aes%1 m0, [r0+0x30] aes%1 m0, [r0+0x20] aes%1 m0, [r0+0x10] @@ -90,6 +88,10 @@ cglobal aes_%1rypt, 6,6,2 %if HAVE_AESNI_EXTERNAL INIT_XMM aesni -AES_CRYPT enc -AES_CRYPT dec +AES_CRYPT enc, 10 +AES_CRYPT enc, 12 +AES_CRYPT enc, 14 +AES_CRYPT dec, 10 +AES_CRYPT dec, 12 +AES_CRYPT dec, 14 %endif diff --git a/libavutil/x86/aes_init.c b/libavutil/x86/aes_init.c index 0ac8c20239..c3e2003c07 100644 --- a/libavutil/x86/aes_init.c +++ b/libavutil/x86/aes_init.c @@ -22,15 +22,29 @@ #include "libavutil/aes_internal.h" #include "libavutil/x86/cpu.h" -void ff_aes_decrypt_aesni(AVAES *a, uint8_t *dst, const uint8_t *src, +void ff_aes_decrypt_10_aesni(AVAES *a, uint8_t *dst, const uint8_t *src, int count, uint8_t *iv, int rounds); -void ff_aes_encrypt_aesni(AVAES *a, uint8_t *dst, const uint8_t *src, +void ff_aes_decrypt_12_aesni(AVAES *a, uint8_t *dst, const uint8_t *src, int count, uint8_t *iv, int rounds); +void ff_aes_decrypt_14_aesni(AVAES *a, uint8_t *dst, const uint8_t *src, + int count, uint8_t *iv, int rounds); +void ff_aes_encrypt_10_aesni(AVAES *a, uint8_t *dst, const uint8_t *src, + int count, uint8_t *iv, int rounds); +void ff_aes_encrypt_12_aesni(AVAES *a, uint8_t *dst, const uint8_t *src, + int count, uint8_t *iv, int rounds); +void ff_aes_encrypt_14_aesni(AVAES *a, uint8_t *dst, const uint8_t *src, + int count, uint8_t *iv, int rounds); void ff_init_aes_x86(AVAES *a, int decrypt) { int cpu_flags = av_get_cpu_flags(); - if (EXTERNAL_AESNI(cpu_flags)) - a->crypt = decrypt ? ff_aes_decrypt_aesni : ff_aes_encrypt_aesni; + if (EXTERNAL_AESNI(cpu_flags)) { + if (a->rounds == 10) + a->crypt = decrypt ? ff_aes_decrypt_10_aesni : ff_aes_encrypt_10_aesni; + else if (a->rounds == 12) + a->crypt = decrypt ? ff_aes_decrypt_12_aesni : ff_aes_encrypt_12_aesni; + else if (a->rounds == 14) + a->crypt = decrypt ? ff_aes_decrypt_14_aesni : ff_aes_encrypt_14_aesni; + } } -- 2.49.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".