From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 84D414A21F for ; Mon, 25 Mar 2024 15:03:26 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 79CF868D565; Mon, 25 Mar 2024 17:02:55 +0200 (EET) Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 342FD68D3CA for ; Mon, 25 Mar 2024 17:02:47 +0200 (EET) Received: by mail-lf1-f44.google.com with SMTP id 2adb3069b0e04-513d247e3c4so4091024e87.0 for ; Mon, 25 Mar 2024 08:02:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1711378966; x=1711983766; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rmN8oR40iavxU6Ooka3576KIsU+ht6CaHR0eTbXSAPA=; b=pGSji6Z/I19Mv9GtF8gaWouHoMn/ofVitJfzeybBtBI2KG9h8Fdxls8z4dzLanFG33 fJyGJLiO08ldCGUEdsqieclDy5CWFaGpKz+tUu30AKKrrcNDbyICK2iNa56eOjNTNwtT BjbgJLe/lJ0Ywa4IVm/Nue+fz4ldOUTlXngtnbTYP8iqeCSOBcTkN1x8ytKTJESWa0tH Frc3pmaJpZvEaqt9i9cFyneqwv94xJyz4Mgo8YDDNP++13Z4nWW4ELaCkBUMfYeIGPOu jjQ2E1e57hzrklAqcJbh64eFsnveEpyli0Fw1r0DViZtGCGkmqjQpW72/UH5zY+iKLwm CqnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711378966; x=1711983766; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rmN8oR40iavxU6Ooka3576KIsU+ht6CaHR0eTbXSAPA=; b=ZnLDho+DYfVcCZHaQIVCBKL497cy6TFhY6d1Oh0PY5/vg9VyfbYeptVZFF+86uIe+7 pBeN81FsBvhahQ4SZQAIV4Eio//hzMff5VmtSEtnNRfFuF3IhLToMJ6s9kDFsaVvKiVp MDQ3S5Wm2jGSWJxGXHN9+UEVAw0PIBXwd+7zFVRoimFfwMnnG99lPMFNm9hIk1sWagvX i4dJWyqa/EcRhoL7ZQfKzKVK/aRimIxdY0gsKiOOZPnVZjeFQl9DYLUA4KgiyMH3Or5k 0oX6sV07ophF7/w0misyFkatuHlZciktn8o+UAcJJuCWKJvm75iZuRvoZnjBosmjHhJX z+NA== X-Gm-Message-State: AOJu0Yw+Ym7YwQVGDCe5BBHOb8aCKd/GAdnEdFY4xQLqnrnKOy5p8Q4e 7lFW9oDw2lfj1eN9Qbh8/KpegTJRRHi7ndtHM44B5s17QkjEPbBsXLkNPgFFQ9YwnXvqlM1A7NA Qclqk X-Google-Smtp-Source: AGHT+IENNXkjnGndrshGSt7hGh2YP75nNf2KHfIJ/7Br3jX9ZAEGQjJe4GZ8kOZZtnkyno0Daulp9w== X-Received: by 2002:a19:5e59:0:b0:513:d820:c97e with SMTP id z25-20020a195e59000000b00513d820c97emr2126985lfi.11.1711378966524; Mon, 25 Mar 2024 08:02:46 -0700 (PDT) Received: from localhost (host-114-191.parnet.fi. [77.234.114.191]) by smtp.gmail.com with ESMTPSA id k7-20020a2e9207000000b002d68ceb9d57sm1483101ljg.28.2024.03.25.08.02.46 (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 25 Mar 2024 08:02:46 -0700 (PDT) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 25 Mar 2024 17:02:25 +0200 Message-Id: <20240325150243.59058-4-martin@martin.st> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240325150243.59058-1-martin@martin.st> References: <20240325150243.59058-1-martin@martin.st> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 03/21] aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Logan Lyu , "J . Dekker" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: This gets rid of a couple instructions, but the actual performance is almost identical on Cortex A72/A73. On Cortex A53, it is a handful of cycles faster. --- libavcodec/aarch64/hevcdsp_qpel_neon.S | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index 815d897094..432558bb95 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -512,11 +512,10 @@ function ff_hevc_put_hevc_\type\()_h16_8_neon, export=1 .ifc \type, qpel mov dststride, #(MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel - mov x14, #((MAX_PB_SIZE << 2) - 16) + mov x14, #(MAX_PB_SIZE << 2) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel - sub x14, x14, #8 .endif add x10, dst, dststride // dstb add x12, src, srcstride // srcb @@ -527,10 +526,8 @@ function ff_hevc_put_hevc_\type\()_h16_8_neon, export=1 bl ff_hevc_put_hevc_h16_8_neon .ifc \type, qpel - st1 {v26.8h}, [dst], #16 - st1 {v28.8h}, [x10], #16 - st1 {v27.8h}, [dst], x14 - st1 {v29.8h}, [x10], x14 + st1 {v26.8h, v27.8h}, [dst], x14 + st1 {v28.8h, v29.8h}, [x10], x14 .else .ifc \type, qpel_bi ld1 {v16.8h, v17.8h}, [ x4], x16 @@ -549,10 +546,8 @@ function ff_hevc_put_hevc_\type\()_h16_8_neon, export=1 sqrshrun v28.8b, v28.8h, #6 sqrshrun v29.8b, v29.8h, #6 .endif - st1 {v26.8b}, [dst], #8 - st1 {v28.8b}, [x10], #8 - st1 {v27.8b}, [dst], x14 - st1 {v29.8b}, [x10], x14 + st1 {v26.8b, v27.8b}, [dst], x14 + st1 {v28.8b, v29.8b}, [x10], x14 .endif b.gt 1b // double line subs width, width, #16 -- 2.39.3 (Apple Git-146) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".