From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 88C60447B1 for ; Fri, 26 May 2023 08:34:13 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8F37D68C175; Fri, 26 May 2023 11:34:10 +0300 (EEST) Received: from mail8.parnet.fi (mail8.parnet.fi [77.234.108.134]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 1953668C116 for ; Fri, 26 May 2023 11:34:04 +0300 (EEST) Received: from mail9.parnet.fi (mail9.parnet.fi [77.234.108.21]) by mail8.parnet.fi with ESMTP id 34Q8Y2Ot031502-34Q8Y2Ou031502 for ; Fri, 26 May 2023 11:34:02 +0300 Received: from foo.martin.st (host-97-187.parnet.fi [77.234.97.187]) by mail9.parnet.fi (Postfix) with ESMTPS id BF8D6A1407 for ; Fri, 26 May 2023 11:34:02 +0300 (EEST) Date: Fri, 26 May 2023 11:34:01 +0300 (EEST) From: =?ISO-8859-15?Q?Martin_Storsj=F6?= To: FFmpeg development discussions and patches In-Reply-To: Message-ID: <4badda67-2eb-e262-f791-9d2847dbd71@martin.st> References: <530864e2-a55b-603e-00d4-f6876d391d9e@myais.com.cn> <3e19c1bc-3af7-c9b0-337c-7350d72d52fb@martin.st> MIME-Version: 1.0 X-FE-Policy-ID: 3:14:2:SYSTEM Subject: Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Hi, Overall these patches seem mostly ok, but I've got a few minor points to make: - The usdot instruction requires the i8mm extension (part of armv8.6-a), while udot or sdot would require the dotprod extension (available in armv8.4-a). If you could manage with udot or sdot, these functions would be usable on a wider set of CPUs. Therefore, the current guards are wrong. Also, I finally got support implemented for optionally using these cpu extensions, even if the baseline of the compile don't include it, by runtime enabling it. See the patchset at https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=9009. To adapt your patches on top of this, see the two topmost commits at https://github.com/mstorsjo/ffmpeg/commits/archext. - The indentation is inconsistent; in the first patch, you have some instructions written like this: + sqadd v1.4s, v1.4s, v29.4s While you later use this style: + dup v1.16b, v28.b[1] The latter seems to match the style we commonly use; please reformat your code to match that consistently. With some macro invocations in the first patch, you also seem to have too much indentation in some places. See e.g. this: +1: ldr q23, [x2, x3] + add x2, x2, x3, lsl #1 + QPEL_FILTER_B v26, v16, v17, v18, v19, v20, v21, v22, v23 + QPEL_FILTER_B2 v27, v16, v17, v18, v19, v20, v21, v22, v23 + QPEL_UNI_W_V_16 + subs w4, w4, #1 + b.eq 2f (If the macro name is too long, that's ok, but here there's no need to have those lines unaligned.) - In the third patch, you've got multiple parameters from the stack like this: + ldp x14, x15, [sp] // mx, my + ldr w13, [sp, #16] // width I see that the mx an my parameters are intptr_t; that's good, since if they would be 32 bit integers, the ABI for such parameters on the stack differ between macOS/Darwin and Linux. But as long as they're intptr_t they behave the same. - At the same place, you're backing up a bunch of registers: + stp x20, x21, [sp, #-16]! + stp x22, x23, [sp, #-16]! + stp x24, x25, [sp, #-16]! + stp x26, x27, [sp, #-16]! + stp x28, x30, [sp, #-16]! This is inefficient; instead, do this: + stp x28, x30, [sp, #-80]! + stp x20, x21, [sp, #16] + stp x22, x23, [sp, #32] + stp x24, x25, [sp, #48] + stp x26, x27, [sp, #64] Also, following that, I see that you back up the stack pointer in x28. Why do you use x28 for that? Using x29 would be customary as frame pointer. Aside for that, I think the rest of the patches is acceptable. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".