From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 1840F4A216 for ; Mon, 25 Mar 2024 15:05:54 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id ACDD268D608; Mon, 25 Mar 2024 17:03:12 +0200 (EET) Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7A70668D50B for ; Mon, 25 Mar 2024 17:02:57 +0200 (EET) Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-513dd2d2415so4609369e87.3 for ; Mon, 25 Mar 2024 08:02:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=martin-st.20230601.gappssmtp.com; s=20230601; t=1711378977; x=1711983777; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7qTFCnHod5EvDekm/u4Rw8IPRAX675o8k45IJ5Yzryk=; b=dWmIW6wmf3Z6EZl9c7KTyQ4j2B/t9rdGiwaA3aO008PzD7bfZfUbXk73t50+2VoWp0 UTbVW89vhxe+LF+ZZlQzHOZ/dv3sDcJ/RLdcV7aOtGD3GkRVZoUc/87snF7OsPP0dgOG i4B4Z6tYP/ZUqaG6xvMHJ18VU4den/wXE7aoYEAK9OC/qZMqH6fl4VtoUnpUMeXFTjB7 Geyt5ZjfTNSG52K87lESXazRl75D/L8CA7p8L2TVfKvtCKlz9l6HOlIXl6gFEMlXS8HI OUi4rDHHl/rfk/2Jih7wld86g61Pt4JDzInIveuEN5k7cevqkL0/pQjX8e++VX888FZ0 ETGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711378977; x=1711983777; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7qTFCnHod5EvDekm/u4Rw8IPRAX675o8k45IJ5Yzryk=; b=o+hPkTuOIFGov67Fgk2o0C3Tq0PfhO5/in0hE3VBuj7+Xrm4UZcA032XXtjs19zbhx EzpvFyGp8fdf7u/uPtFyK93QIm2YQ+xuSbgoyegTBw3WXsTf1BSGqGPnBREOVbZhNaal qhHW7KwZRR2J0e/coVt1rJ4Wcjk1cfUXVAdCpUa8tKilDWjNgZ1hERFSpWKgGrUQedNc jnuWhJT+cniRL/uQp2nDTDG69V/tsC+mxIvv5UrPsSW4iUJkJpNA5NNfObtKCNh0SH2u kltJ5ZFbOc3Exy1yRwS0q4JSy2MGV1WsZwynuULAe8uK/2FdcMUIjLe4oUkW/PMWpsv/ HioQ== X-Gm-Message-State: AOJu0Yz/1EFoVGhCgrbX8FNYl9U+5xd/UeGh1V2YmyGAz2g6Zdyns5GD 1t4rv4ExlwaFNQX9DzPqgbIkOQs8yasMpPixTrgHrQ8ZeW659xeZSsEStCdlodSa8eeh0ZkMVAD m7d/B X-Google-Smtp-Source: AGHT+IH4QzhVk3dBWnjIBX5XqAzcxkV5/Y4T6g9Ppbm8iPmk6eVqROI7qOk9jBewz1t4OUUN1IiD+w== X-Received: by 2002:ac2:4649:0:b0:515:a3de:5d70 with SMTP id s9-20020ac24649000000b00515a3de5d70mr4257696lfo.2.1711378976880; Mon, 25 Mar 2024 08:02:56 -0700 (PDT) Received: from localhost (host-114-191.parnet.fi. [77.234.114.191]) by smtp.gmail.com with ESMTPSA id t12-20020a19910c000000b00515a6e4bdbdsm850041lfd.250.2024.03.25.08.02.56 (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 25 Mar 2024 08:02:56 -0700 (PDT) From: =?UTF-8?q?Martin=20Storsj=C3=B6?= To: ffmpeg-devel@ffmpeg.org Date: Mon, 25 Mar 2024 17:02:38 +0200 Message-Id: <20240325150243.59058-17-martin@martin.st> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240325150243.59058-1-martin@martin.st> References: <20240325150243.59058-1-martin@martin.st> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 16/21] aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Logan Lyu , "J . Dekker" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: The hv32 and hv64 functions were identical - both loop and process 16 pixels at a time. The hv16 function was near identical, except for the outer loop (and using sp instead of a separate register). Given the size of these functions, the extra cost of the outer loop is negligible, so use the same function for hv16 as well. This removes over 200 lines of duplicated assembly, and over 4 KB of binary size. --- libavcodec/aarch64/hevcdsp_qpel_neon.S | 220 +------------------------ 1 file changed, 3 insertions(+), 217 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index c04e8dbea8..06832603d9 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -4381,231 +4381,17 @@ function ff_hevc_put_hevc_qpel_uni_w_hv16_8_neon_i8mm, export=1 b hevc_put_hevc_qpel_uni_w_hv16_8_end_neon endfunc -function hevc_put_hevc_qpel_uni_w_hv16_8_end_neon - ldp q16, q1, [sp] - add sp, sp, x10 - ldp q17, q2, [sp] - add sp, sp, x10 - ldp q18, q3, [sp] - add sp, sp, x10 - ldp q19, q4, [sp] - add sp, sp, x10 - ldp q20, q5, [sp] - add sp, sp, x10 - ldp q21, q6, [sp] - add sp, sp, x10 - ldp q22, q7, [sp] - add sp, sp, x10 -1: - ldp q23, q31, [sp] - add sp, sp, x10 - QPEL_FILTER_H v24, v16, v17, v18, v19, v20, v21, v22, v23 - QPEL_FILTER_H2 v25, v16, v17, v18, v19, v20, v21, v22, v23 - QPEL_FILTER_H v26, v1, v2, v3, v4, v5, v6, v7, v31 - QPEL_FILTER_H2 v27, v1, v2, v3, v4, v5, v6, v7, v31 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q16, q1, [sp] - add sp, sp, x10 - QPEL_FILTER_H v24, v17, v18, v19, v20, v21, v22, v23, v16 - QPEL_FILTER_H2 v25, v17, v18, v19, v20, v21, v22, v23, v16 - QPEL_FILTER_H v26, v2, v3, v4, v5, v6, v7, v31, v1 - QPEL_FILTER_H2 v27, v2, v3, v4, v5, v6, v7, v31, v1 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q17, q2, [sp] - add sp, sp, x10 - QPEL_FILTER_H v24, v18, v19, v20, v21, v22, v23, v16, v17 - QPEL_FILTER_H2 v25, v18, v19, v20, v21, v22, v23, v16, v17 - QPEL_FILTER_H v26, v3, v4, v5, v6, v7, v31, v1, v2 - QPEL_FILTER_H2 v27, v3, v4, v5, v6, v7, v31, v1, v2 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q18, q3, [sp] - add sp, sp, x10 - QPEL_FILTER_H v24, v19, v20, v21, v22, v23, v16, v17, v18 - QPEL_FILTER_H2 v25, v19, v20, v21, v22, v23, v16, v17, v18 - QPEL_FILTER_H v26, v4, v5, v6, v7, v31, v1, v2, v3 - QPEL_FILTER_H2 v27, v4, v5, v6, v7, v31, v1, v2, v3 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q19, q4, [sp] - add sp, sp, x10 - QPEL_FILTER_H v24, v20, v21, v22, v23, v16, v17, v18, v19 - QPEL_FILTER_H2 v25, v20, v21, v22, v23, v16, v17, v18, v19 - QPEL_FILTER_H v26, v5, v6, v7, v31, v1, v2, v3, v4 - QPEL_FILTER_H2 v27, v5, v6, v7, v31, v1, v2, v3, v4 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q20, q5, [sp] - add sp, sp, x10 - QPEL_FILTER_H v24, v21, v22, v23, v16, v17, v18, v19, v20 - QPEL_FILTER_H2 v25, v21, v22, v23, v16, v17, v18, v19, v20 - QPEL_FILTER_H v26, v6, v7, v31, v1, v2, v3, v4, v5 - QPEL_FILTER_H2 v27, v6, v7, v31, v1, v2, v3, v4, v5 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q21, q6, [sp] - add sp, sp, x10 - QPEL_FILTER_H v24, v22, v23, v16, v17, v18, v19, v20, v21 - QPEL_FILTER_H2 v25, v22, v23, v16, v17, v18, v19, v20, v21 - QPEL_FILTER_H v26, v7, v31, v1, v2, v3, v4, v5, v6 - QPEL_FILTER_H2 v27, v7, v31, v1, v2, v3, v4, v5, v6 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q22, q7, [sp] - add sp, sp, x10 - QPEL_FILTER_H v24, v23, v16, v17, v18, v19, v20, v21, v22 - QPEL_FILTER_H2 v25, v23, v16, v17, v18, v19, v20, v21, v22 - QPEL_FILTER_H v26, v31, v1, v2, v3, v4, v5, v6, v7 - QPEL_FILTER_H2 v27, v31, v1, v2, v3, v4, v5, v6, v7 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.hi 1b - -2: - QPEL_UNI_W_HV_END - ret -endfunc - - function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 QPEL_UNI_W_HV_HEADER 32 - b hevc_put_hevc_qpel_uni_w_hv32_8_end_neon -endfunc - -function hevc_put_hevc_qpel_uni_w_hv32_8_end_neon - mov x11, sp - mov w12, w22 - mov x13, x20 - mov x14, sp -3: - ldp q16, q1, [x11] - add x11, x11, x10 - ldp q17, q2, [x11] - add x11, x11, x10 - ldp q18, q3, [x11] - add x11, x11, x10 - ldp q19, q4, [x11] - add x11, x11, x10 - ldp q20, q5, [x11] - add x11, x11, x10 - ldp q21, q6, [x11] - add x11, x11, x10 - ldp q22, q7, [x11] - add x11, x11, x10 -1: - ldp q23, q31, [x11] - add x11, x11, x10 - QPEL_FILTER_H v24, v16, v17, v18, v19, v20, v21, v22, v23 - QPEL_FILTER_H2 v25, v16, v17, v18, v19, v20, v21, v22, v23 - QPEL_FILTER_H v26, v1, v2, v3, v4, v5, v6, v7, v31 - QPEL_FILTER_H2 v27, v1, v2, v3, v4, v5, v6, v7, v31 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q16, q1, [x11] - add x11, x11, x10 - QPEL_FILTER_H v24, v17, v18, v19, v20, v21, v22, v23, v16 - QPEL_FILTER_H2 v25, v17, v18, v19, v20, v21, v22, v23, v16 - QPEL_FILTER_H v26, v2, v3, v4, v5, v6, v7, v31, v1 - QPEL_FILTER_H2 v27, v2, v3, v4, v5, v6, v7, v31, v1 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q17, q2, [x11] - add x11, x11, x10 - QPEL_FILTER_H v24, v18, v19, v20, v21, v22, v23, v16, v17 - QPEL_FILTER_H2 v25, v18, v19, v20, v21, v22, v23, v16, v17 - QPEL_FILTER_H v26, v3, v4, v5, v6, v7, v31, v1, v2 - QPEL_FILTER_H2 v27, v3, v4, v5, v6, v7, v31, v1, v2 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q18, q3, [x11] - add x11, x11, x10 - QPEL_FILTER_H v24, v19, v20, v21, v22, v23, v16, v17, v18 - QPEL_FILTER_H2 v25, v19, v20, v21, v22, v23, v16, v17, v18 - QPEL_FILTER_H v26, v4, v5, v6, v7, v31, v1, v2, v3 - QPEL_FILTER_H2 v27, v4, v5, v6, v7, v31, v1, v2, v3 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q19, q4, [x11] - add x11, x11, x10 - QPEL_FILTER_H v24, v20, v21, v22, v23, v16, v17, v18, v19 - QPEL_FILTER_H2 v25, v20, v21, v22, v23, v16, v17, v18, v19 - QPEL_FILTER_H v26, v5, v6, v7, v31, v1, v2, v3, v4 - QPEL_FILTER_H2 v27, v5, v6, v7, v31, v1, v2, v3, v4 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q20, q5, [x11] - add x11, x11, x10 - QPEL_FILTER_H v24, v21, v22, v23, v16, v17, v18, v19, v20 - QPEL_FILTER_H2 v25, v21, v22, v23, v16, v17, v18, v19, v20 - QPEL_FILTER_H v26, v6, v7, v31, v1, v2, v3, v4, v5 - QPEL_FILTER_H2 v27, v6, v7, v31, v1, v2, v3, v4, v5 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q21, q6, [x11] - add x11, x11, x10 - QPEL_FILTER_H v24, v22, v23, v16, v17, v18, v19, v20, v21 - QPEL_FILTER_H2 v25, v22, v23, v16, v17, v18, v19, v20, v21 - QPEL_FILTER_H v26, v7, v31, v1, v2, v3, v4, v5, v6 - QPEL_FILTER_H2 v27, v7, v31, v1, v2, v3, v4, v5, v6 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.eq 2f - - ldp q22, q7, [x11] - add x11, x11, x10 - QPEL_FILTER_H v24, v23, v16, v17, v18, v19, v20, v21, v22 - QPEL_FILTER_H2 v25, v23, v16, v17, v18, v19, v20, v21, v22 - QPEL_FILTER_H v26, v31, v1, v2, v3, v4, v5, v6, v7 - QPEL_FILTER_H2 v27, v31, v1, v2, v3, v4, v5, v6, v7 - QPEL_UNI_W_HV_16 - subs w22, w22, #1 - b.hi 1b -2: - subs w27, w27, #16 - add x11, x14, #32 - add x20, x13, #16 - mov w22, w12 - mov x14, x11 - mov x13, x20 - b.hi 3b - QPEL_UNI_W_HV_END - ret + b hevc_put_hevc_qpel_uni_w_hv16_8_end_neon endfunc function ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, export=1 QPEL_UNI_W_HV_HEADER 64 - b hevc_put_hevc_qpel_uni_w_hv64_8_end_neon + b hevc_put_hevc_qpel_uni_w_hv16_8_end_neon endfunc -function hevc_put_hevc_qpel_uni_w_hv64_8_end_neon +function hevc_put_hevc_qpel_uni_w_hv16_8_end_neon mov x11, sp mov w12, w22 mov x13, x20 -- 2.39.3 (Apple Git-146) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".