From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id E3A2846461 for ; Tue, 17 Oct 2023 11:46:49 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8F2D768C9B6; Tue, 17 Oct 2023 14:46:14 +0300 (EEST) Received: from mail-lj1-f170.google.com (mail-lj1-f170.google.com [209.85.208.170]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BA82A68C93C for ; Tue, 17 Oct 2023 14:46:05 +0300 (EEST) Received: by mail-lj1-f170.google.com with SMTP id 38308e7fff4ca-2c509d5ab43so55798681fa.0 for ; Tue, 17 Oct 2023 04:46:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1697543165; x=1698147965; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MEmfODeGLjWBad888FGyDKhcOip/MaWIDMIipQ5lKaU=; b=Ky7kphy/kaK+jLBn2VfeVy9VjG3nTY+tl2EXcKFm3+PgQDJqbkHD/XxDx+HqluMNuL RgoUGcgTmtpBjt19DeAlClUvdxFsidLTbWGsn0r0jQ753inwh6K7VhBZV4nvXXdxB+UL 04sY7UpTu65pMnSrcd0GK93Q5+FkGTEN1Tou6FEOtfAkjH+8GxUNntKcylxpWhlVR5ok RV3eD291tZJXrKXpUY4gW1cyQY5kqrB14Au+GIhyF0G2LqJsYzTH/xQ9fzX2iaUPcT5W OJ9fZCz4FQNM7M/ngHkQdz03mibnQqtzl5qm1hGQ1Q4JoP7EmVs5NTzdROoDThk5NPys WRXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697543165; x=1698147965; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MEmfODeGLjWBad888FGyDKhcOip/MaWIDMIipQ5lKaU=; b=saRZERxQluUU/Fnv9izimHPNbdjD6j0EI0kfHUbRX6hi0/C3unByh16ln6wECesy0c XpkEGLf78YasCcr6zIibbbCWT/N7jYlpc471JCFWNcTeBX0jCayZUMsrxlD146oTA+ch utJothcyN8nnln/kijpQGukC1hK4My7OOaMOySU0BRNwv5wwqqRfiUF6MrAnbbLusLOa 3AtA+gkOFTDs9DCwaZZL3iOKjOtzSL7Sf+FLfrA6zIbFU3vSsAROWGAQ9NGClwTSguM/ kW+BDEcQCXe9vr0h8+OYupAOoP4mfFAps0G9OgIcRizyXr20pA/Y0MoyD3Zv3dux/gnE HEBw== X-Gm-Message-State: AOJu0YxQjPeeb2mBX2nCe/0LbQgvxvQ6feILD31OTYLBW6ovwUxnWsN1 hSvY1NyKrL491jrWKR4bH1zSU1LkkujuQ3UL5yakaQ== X-Google-Smtp-Source: AGHT+IG39BPipJHZ7igp1bX94RbMweTE+AKNfuSuUI15Bm0aqu6lHcwhB66ksXu85304TpHUwvswgQ== X-Received: by 2002:a05:6512:3144:b0:507:a66b:c9a1 with SMTP id s4-20020a056512314400b00507a66bc9a1mr1425565lfi.17.1697543164645; Tue, 17 Oct 2023 04:46:04 -0700 (PDT) Received: from localhost.localdomain (dsl-tkubng21-58c01c-243.dhcp.inet.fi. [88.192.28.243]) by smtp.gmail.com with ESMTPSA id x25-20020a19f619000000b0050797a35f8csm244532lfe.162.2023.10.17.04.46.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 04:46:04 -0700 (PDT) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Tue, 17 Oct 2023 14:45:59 +0300 Message-Id: <20231017114601.1374712-4-martin@martin.st> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231017114601.1374712-1-martin@martin.st> References: <20231017114601.1374712-1-martin@martin.st> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 4/5] aarch64: Manually tweak vertical alignment/indentation in tx_float_neon.S X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: jdek@itanimul.li Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Favour left aligned columns over right aligned columns. In principle either style should be ok, but some of the cases easily lead to incorrect indentation in the surrounding code (see a couple of cases fixed up in the preceding patch), and show up in automatic indentation correction attempts. --- libavutil/aarch64/tx_float_neon.S | 120 +++++++++++++++--------------- 1 file changed, 60 insertions(+), 60 deletions(-) diff --git a/libavutil/aarch64/tx_float_neon.S b/libavutil/aarch64/tx_float_neon.S index 9916ad4142..30ffa2a1d4 100644 --- a/libavutil/aarch64/tx_float_neon.S +++ b/libavutil/aarch64/tx_float_neon.S @@ -733,12 +733,12 @@ FFT16_FN ns_float, 1 add x11, x1, x21, lsl #1 add x12, x1, x22 - ldp q0, q1, [x1, #((0 + \part)*32 + \off)] - ldp q4, q5, [x1, #((2 + \part)*32 + \off)] - ldp q2, q3, [x10, #((0 + \part)*32 + \off)] - ldp q6, q7, [x10, #((2 + \part)*32 + \off)] + ldp q0, q1, [x1, #((0 + \part)*32 + \off)] + ldp q4, q5, [x1, #((2 + \part)*32 + \off)] + ldp q2, q3, [x10, #((0 + \part)*32 + \off)] + ldp q6, q7, [x10, #((2 + \part)*32 + \off)] - ldp q8, q9, [x11, #((0 + \part)*32 + \off)] + ldp q8, q9, [x11, #((0 + \part)*32 + \off)] ldp q10, q11, [x11, #((2 + \part)*32 + \off)] ldp q12, q13, [x12, #((0 + \part)*32 + \off)] ldp q14, q15, [x12, #((2 + \part)*32 + \off)] @@ -747,12 +747,12 @@ FFT16_FN ns_float, 1 v8, v9, v10, v11, v12, v13, v14, v15, \ x7, x8, x9, 0 - stp q0, q1, [x1, #((0 + \part)*32 + \off)] - stp q4, q5, [x1, #((2 + \part)*32 + \off)] - stp q2, q3, [x10, #((0 + \part)*32 + \off)] - stp q6, q7, [x10, #((2 + \part)*32 + \off)] + stp q0, q1, [x1, #((0 + \part)*32 + \off)] + stp q4, q5, [x1, #((2 + \part)*32 + \off)] + stp q2, q3, [x10, #((0 + \part)*32 + \off)] + stp q6, q7, [x10, #((2 + \part)*32 + \off)] - stp q8, q9, [x11, #((0 + \part)*32 + \off)] + stp q8, q9, [x11, #((0 + \part)*32 + \off)] stp q12, q13, [x11, #((2 + \part)*32 + \off)] stp q10, q11, [x12, #((0 + \part)*32 + \off)] stp q14, q15, [x12, #((2 + \part)*32 + \off)] @@ -775,12 +775,12 @@ FFT16_FN ns_float, 1 add x12, x15, #((\part)*32 + \off) add x13, x16, #((\part)*32 + \off) - ldp q0, q1, [x10] - ldp q4, q5, [x10, #(2*32)] - ldp q2, q3, [x11] - ldp q6, q7, [x11, #(2*32)] + ldp q0, q1, [x10] + ldp q4, q5, [x10, #(2*32)] + ldp q2, q3, [x11] + ldp q6, q7, [x11, #(2*32)] - ldp q8, q9, [x12] + ldp q8, q9, [x12] ldp q10, q11, [x12, #(2*32)] ldp q12, q13, [x13] ldp q14, q15, [x13, #(2*32)] @@ -800,10 +800,10 @@ FFT16_FN ns_float, 1 zip1 v22.2d, v3.2d, v7.2d zip2 v23.2d, v3.2d, v7.2d - ldp q0, q1, [x10, #(1*32)] - ldp q4, q5, [x10, #(3*32)] - ldp q2, q3, [x11, #(1*32)] - ldp q6, q7, [x11, #(3*32)] + ldp q0, q1, [x10, #(1*32)] + ldp q4, q5, [x10, #(3*32)] + ldp q2, q3, [x11, #(1*32)] + ldp q6, q7, [x11, #(3*32)] st1 { v16.4s, v17.4s, v18.4s, v19.4s }, [x10], #64 st1 { v20.4s, v21.4s, v22.4s, v23.4s }, [x11], #64 @@ -817,7 +817,7 @@ FFT16_FN ns_float, 1 zip1 v26.2d, v11.2d, v15.2d zip2 v27.2d, v11.2d, v15.2d - ldp q8, q9, [x12, #(1*32)] + ldp q8, q9, [x12, #(1*32)] ldp q10, q11, [x12, #(3*32)] ldp q12, q13, [x13, #(1*32)] ldp q14, q15, [x13, #(3*32)] @@ -875,9 +875,9 @@ function ff_tx_fft32_\name\()_neon, export=1 SETUP_SR_RECOMB 32, x7, x8, x9 SETUP_LUT \no_perm - LOAD_INPUT 0, 1, 2, 3, x2, \no_perm - LOAD_INPUT 4, 5, 6, 7, x2, \no_perm - LOAD_INPUT 8, 9, 10, 11, x2, \no_perm + LOAD_INPUT 0, 1, 2, 3, x2, \no_perm + LOAD_INPUT 4, 5, 6, 7, x2, \no_perm + LOAD_INPUT 8, 9, 10, 11, x2, \no_perm LOAD_INPUT 12, 13, 14, 15, x2, \no_perm FFT8_X2 v8, v9, v10, v11, v12, v13, v14, v15 @@ -982,37 +982,37 @@ function ff_tx_fft_sr_\name\()_neon, export=1 32: SETUP_SR_RECOMB 32, x7, x8, x9 - LOAD_INPUT 0, 1, 2, 3, x2, \no_perm - LOAD_INPUT 4, 6, 5, 7, x2, \no_perm, 1 - LOAD_INPUT 8, 9, 10, 11, x2, \no_perm + LOAD_INPUT 0, 1, 2, 3, x2, \no_perm + LOAD_INPUT 4, 6, 5, 7, x2, \no_perm, 1 + LOAD_INPUT 8, 9, 10, 11, x2, \no_perm LOAD_INPUT 12, 13, 14, 15, x2, \no_perm FFT8_X2 v8, v9, v10, v11, v12, v13, v14, v15 FFT16 v0, v1, v2, v3, v4, v6, v5, v7 - SR_COMBINE v0, v1, v2, v3, v4, v6, v5, v7, \ - v8, v9, v10, v11, v12, v13, v14, v15, \ - x7, x8, x9, 0 + SR_COMBINE v0, v1, v2, v3, v4, v6, v5, v7, \ + v8, v9, v10, v11, v12, v13, v14, v15, \ + x7, x8, x9, 0 - stp q2, q3, [x1, #32*1] - stp q6, q7, [x1, #32*3] + stp q2, q3, [x1, #32*1] + stp q6, q7, [x1, #32*3] stp q10, q11, [x1, #32*5] stp q14, q15, [x1, #32*7] cmp w20, #32 b.gt 64f - stp q0, q1, [x1, #32*0] - stp q4, q5, [x1, #32*2] - stp q8, q9, [x1, #32*4] + stp q0, q1, [x1, #32*0] + stp q4, q5, [x1, #32*2] + stp q8, q9, [x1, #32*4] stp q12, q13, [x1, #32*6] ret 64: SETUP_SR_RECOMB 64, x7, x8, x9 - LOAD_INPUT 2, 3, 10, 11, x2, \no_perm, 1 - LOAD_INPUT 6, 14, 7, 15, x2, \no_perm, 1 + LOAD_INPUT 2, 3, 10, 11, x2, \no_perm, 1 + LOAD_INPUT 6, 14, 7, 15, x2, \no_perm, 1 FFT16 v2, v3, v10, v11, v6, v14, v7, v15 @@ -1033,38 +1033,38 @@ function ff_tx_fft_sr_\name\()_neon, export=1 // TODO: investigate doing the 2 combines like in deinterleave // TODO: experiment with spilling to gprs and converting to HALF or full - SR_COMBINE_LITE v0, v1, v8, v9, \ - v2, v3, v16, v17, \ + SR_COMBINE_LITE v0, v1, v8, v9, \ + v2, v3, v16, v17, \ v24, v25, v26, v27, \ v28, v29, v30, 0 - stp q0, q1, [x1, #32* 0] - stp q8, q9, [x1, #32* 4] - stp q2, q3, [x1, #32* 8] + stp q0, q1, [x1, #32* 0] + stp q8, q9, [x1, #32* 4] + stp q2, q3, [x1, #32* 8] stp q16, q17, [x1, #32*12] - SR_COMBINE_HALF v4, v5, v12, v13, \ - v6, v7, v20, v21, \ + SR_COMBINE_HALF v4, v5, v12, v13, \ + v6, v7, v20, v21, \ v24, v25, v26, v27, \ v28, v29, v30, v0, v1, v8, 1 - stp q4, q20, [x1, #32* 2] + stp q4, q20, [x1, #32* 2] stp q12, q21, [x1, #32* 6] - stp q6, q5, [x1, #32*10] - stp q7, q13, [x1, #32*14] + stp q6, q5, [x1, #32*10] + stp q7, q13, [x1, #32*14] - ldp q2, q3, [x1, #32*1] - ldp q6, q7, [x1, #32*3] + ldp q2, q3, [x1, #32*1] + ldp q6, q7, [x1, #32*3] ldp q12, q13, [x1, #32*5] ldp q16, q17, [x1, #32*7] - SR_COMBINE v2, v3, v12, v13, v6, v16, v7, v17, \ + SR_COMBINE v2, v3, v12, v13, v6, v16, v7, v17, \ v10, v11, v14, v15, v18, v19, v22, v23, \ - x7, x8, x9, 0, \ + x7, x8, x9, 0, \ v24, v25, v26, v27, v28, v29, v30, v8, v0, v1, v4, v5 - stp q2, q3, [x1, #32* 1] - stp q6, q7, [x1, #32* 3] + stp q2, q3, [x1, #32* 1] + stp q6, q7, [x1, #32* 3] stp q12, q13, [x1, #32* 5] stp q16, q17, [x1, #32* 7] @@ -1198,13 +1198,13 @@ SR_TRANSFORM_DEF 131072 mov x10, v23.d[0] mov x11, v23.d[1] - SR_COMBINE_LITE v0, v1, v8, v9, \ - v2, v3, v16, v17, \ + SR_COMBINE_LITE v0, v1, v8, v9, \ + v2, v3, v16, v17, \ v24, v25, v26, v27, \ v28, v29, v30, 0 - SR_COMBINE_HALF v4, v5, v12, v13, \ - v6, v7, v20, v21, \ + SR_COMBINE_HALF v4, v5, v12, v13, \ + v6, v7, v20, v21, \ v24, v25, v26, v27, \ v28, v29, v30, v23, v24, v26, 1 @@ -1236,7 +1236,7 @@ SR_TRANSFORM_DEF 131072 zip2 v3.2d, v17.2d, v13.2d // stp is faster by a little on A53, but this is faster on M1s (theory) - ldp q8, q9, [x1, #32*1] + ldp q8, q9, [x1, #32*1] ldp q12, q13, [x1, #32*5] st1 { v23.4s, v24.4s, v25.4s, v26.4s }, [x12], #64 // 32* 0...1 @@ -1247,12 +1247,12 @@ SR_TRANSFORM_DEF 131072 mov v23.d[0], x10 mov v23.d[1], x11 - ldp q6, q7, [x1, #32*3] + ldp q6, q7, [x1, #32*3] ldp q16, q17, [x1, #32*7] - SR_COMBINE v8, v9, v12, v13, v6, v16, v7, v17, \ + SR_COMBINE v8, v9, v12, v13, v6, v16, v7, v17, \ v10, v11, v14, v15, v18, v19, v22, v23, \ - x7, x8, x9, 0, \ + x7, x8, x9, 0, \ v24, v25, v26, v27, v28, v29, v30, v4, v0, v1, v5, v20 zip1 v0.2d, v8.2d, v6.2d -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".