From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id CC62D48C8C for ; Thu, 23 May 2024 12:28:08 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8E49068D392; Thu, 23 May 2024 15:27:57 +0300 (EEST) Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7772768D355 for ; Thu, 23 May 2024 15:27:51 +0300 (EEST) Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1ee954e0aa6so19955245ad.3 for ; Thu, 23 May 2024 05:27:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716467269; x=1717072069; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=waA1qhxwTV58NOjeeYqzq/nxU+Ey3ICRRSOVn8w41RU=; b=IdFoYH0mWS/0dsGSw5lG6vZTZbr6BfdRj/D/7Y0HiwrWCV+cWdUdk+c8XNN8qNUWdG 26CyV2SbszknWxc62gG6cp1OBOkI/w08AjTzNyCuMp8uaRFBV7QH85Fpo5F69kFX4SaR tUzHS9ntDxG2ONNKNgI0zOoi5jWWgaJjVsAYy+8TXryGJKDKnvPOcVPSZONlSDGC5jBP 45dmKIR64v5c2EjR01OKY9UDIRk/YRCuwTeSdnEyBTtFDwVkXkDU5uvvGbh/VLXlUStE OtvQXiV0hsegUlBNkzSDyCz5WuvnfKC/y60fpuG3TXiPKRw11YYlXMvKQDOcOXCBHlWs bw3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716467269; x=1717072069; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=waA1qhxwTV58NOjeeYqzq/nxU+Ey3ICRRSOVn8w41RU=; b=g1xVzJ+xmsry/7v7kNUMHHOdRTnaU0G51LoVIxA2biTR2Vp9gyBhOHESLSrDBxk/c1 a61d9ARRlu76qitkeAbIrzo+g68idC699ZCHLi4VfO62rDCz1El6/7loELGJGOmgK9d3 3lYTR8gYb8gI9p16uZtYlRTDRg1x5nKQ/iucw/31ylAjOsHaR2IIqPZMcaOgqihRtCuC Q9b6m/BY4CmTSQ38whOHEsI44zI+rAIR0QwPpkrVy8xMPLTj95nxd+ZbyDfQThG3IBob 2E33GBPuIoZJ4gu395aUVRPc2hBagU/DZ9SrHC3L3+79yvUo/GUJjWJAIZkF5P+Q8P1C vnzw== X-Gm-Message-State: AOJu0Yy1UPcH9zHBjU7bNZrURasdy/rsD+hFOSU+icI6QeeF1ckiHsMZ 4LfEV67e2f5BeE+EZchAY2OTfz3DTQEqzUGEk1Dd71IRaTGb847JD3OWbA== X-Google-Smtp-Source: AGHT+IHgka9hQ9GNnUQp+hv1e97XL3/mZenw9I1bv0No8lGPTM831Vtp0hkRLPSht4MVk/3sPaQ3bA== X-Received: by 2002:a17:902:b908:b0:1f2:f497:2409 with SMTP id d9443c01a7336-1f31c978a36mr42404885ad.19.1716467268759; Thu, 23 May 2024 05:27:48 -0700 (PDT) Received: from localhost.localdomain ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f335da4b87sm16158405ad.100.2024.05.23.05.27.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 05:27:48 -0700 (PDT) From: James Almer To: ffmpeg-devel@ffmpeg.org Date: Thu, 23 May 2024 09:27:13 -0300 Message-ID: <20240523122716.2158-2-jamrial@gmail.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240523122716.2158-1-jamrial@gmail.com> References: <20240523122716.2158-1-jamrial@gmail.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/5] x86/vvc_sad: optimize vvc_sad_16 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Signed-off-by: James Almer --- libavcodec/x86/vvc/vvc_sad.asm | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/libavcodec/x86/vvc/vvc_sad.asm b/libavcodec/x86/vvc/vvc_sad.asm index a20818530f..829dbce489 100644 --- a/libavcodec/x86/vvc/vvc_sad.asm +++ b/libavcodec/x86/vvc/vvc_sad.asm @@ -96,7 +96,7 @@ cglobal vvc_sad_8, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, ro movd eax, xm0 RET -cglobal vvc_sad_16, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, row_idx +cglobal vvc_sad_16, 6, 8, 5, src1, src2, dx, dy, block_w, block_h, off1, off2 movsxdifnidn dxq, dxd movsxdifnidn dyq, dyd @@ -121,26 +121,27 @@ cglobal vvc_sad_16, 6, 9, 5, src1, src2, dx, dy, block_w, block_h, off1, off2, r pxor m3, m3 vpbroadcastd m4, [pw_1] - sar block_wd, 4 + shl block_wd, 1 + add src1q, block_wq + add src2q, block_wq + neg block_wq + +DEFINE_ARGS src1, src2, dx, dy, block_w, block_h, row_idx .loop_height: - mov off1q, src1q - mov off2q, src2q - mov row_idxd, block_wd + mov row_idxq, block_wq .loop_width: - movu m0, [src1q] - movu m1, [src2q] + movu m0, [src1q+row_idxq] + movu m1, [src2q+row_idxq] MIN_MAX_SAD m1, m0, m2 pmaddwd m1, m4 paddd m3, m1 - add src1q, 32 - add src2q, 32 - dec row_idxd - jg .loop_width + add row_idxq, mmsize + jl .loop_width - lea src1q, [off1q + ROWS * MAX_PB_SIZE * 2] - lea src2q, [off2q + ROWS * MAX_PB_SIZE * 2] + add src1q, ROWS * MAX_PB_SIZE * 2 + add src2q, ROWS * MAX_PB_SIZE * 2 sub block_hd, 2 jg .loop_height -- 2.45.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".