From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 559AD443B8 for ; Thu, 8 Sep 2022 09:25:42 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8CEE368BA42; Thu, 8 Sep 2022 12:25:40 +0300 (EEST) Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C19F268B991 for ; Thu, 8 Sep 2022 12:25:34 +0300 (EEST) Received: by mail-lj1-f179.google.com with SMTP id p5so6111096ljc.13 for ; Thu, 08 Sep 2022 02:25:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date; bh=+qSStw+FFoyEXC+/ajWUqolYMJU5LgcoPIEWxllWu+M=; b=FkEXsc7Sq6r4NApNCx1TPLUexh4PqIUckxOdVNwwmSl95UJCmyuxUExwbJcD8sc9vv 5XYgksjb/FvtGiAWEq8w8Cclq5up9mZUOSqDZvusu8vpJAHY6EZf7PCZ/XItyIuVUE1W ZZ5IjCmDEVUNgMlykDudIsmHZjPmYp8sNp76YQvrNuAx0pgdqFMdubZ0sl5VnN2xU/J4 iEy0oVEe7W2LnYMle9mpOpyp+qXOe/UVH37nNEciFp+sviTU+ao4IGGDEqR6qX8qw7p8 ZjBxc0m6G0F/brc8eLE+xekMZeL2UQy7rfMfFV99NFRnIbY74wW+nKus5dCO33DuOSU1 ha4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date; bh=+qSStw+FFoyEXC+/ajWUqolYMJU5LgcoPIEWxllWu+M=; b=Z66bmUQkUoXw9E5ZVbPHbsMhoJHyjYqeMyJ47V7zBItgL34xygYvloiTNGr3JexTOd vLjdRokb4GOGt/pa/yCbNHASFsGCJlGUTWreg3RUEjnJuyjXBKiER4qA66r9M+7cfNBa RBa8vRaqOEASXGEfsyYTvGaOwO7DR89qnG7eXE7l9wMpG8Zc458FdU9FCutFKbC/ppKu 29w6RdJqKGJA/YUFN+TKSXO7BELYiVqCOmG+0xKyhEfRq0+zQRnDBIzoq3/slWxfRh1O JO21QaUnnCNxKSAAhuLD5/iDvf59m+p5Go/zbzY3jz2HI9NfAaskahQIM6fMJXWlNQPW 7xLw== X-Gm-Message-State: ACgBeo2kRnEaMBLnSS2o4PF8dP+/ND/HwTW5CtkFHpSmnLPUbPk4D8Ns i4/giBlNryL0r5SJmSx3f+DnD3aS9kPWmA== X-Google-Smtp-Source: AA6agR5nNApsgl1ZhhGauu60ipgBa66Yh8XT9qdBHfNH1rbAqIa+TTI9NtTcYTjex5wl3Eo00dycOA== X-Received: by 2002:a05:651c:2103:b0:25d:6478:2a57 with SMTP id a3-20020a05651c210300b0025d64782a57mr2226253ljq.496.1662629132881; Thu, 08 Sep 2022 02:25:32 -0700 (PDT) Received: from hum-HP-ProBook-440-G7.semihalf.net (apn-46-169-39-115.dynamic.gprs.plus.pl. [46.169.39.115]) by smtp.gmail.com with ESMTPSA id k20-20020a2ea274000000b00261e8e4e381sm3103998ljm.2.2022.09.08.02.25.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Sep 2022 02:25:32 -0700 (PDT) From: Hubert Mazur To: ffmpeg-devel@ffmpeg.org Date: Thu, 8 Sep 2022 11:25:02 +0200 Message-Id: <20220908092507.63319-1-hum@semihalf.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 0/5] Provide optimized neon implementation X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com, Hubert Mazur , martin@martin.st, mw@semihalf.com, spop@amazon.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Fix minor issues in the patches. Regarding vsse16 I didn't change saba & umlal to sub & smlal. It doesn't affect the performance, so left it as it was. The majority of changes refer to nsse16: - fixed indentation (thanks for pointing out), - applied the patch from Martin which fixes the balance within instructions, - interleaved instructions - apparently this helped a little to achieve better benchmarks. I have also updated the benchmark results for each function - not a huge performance improvement, but worth the effort. For nsse and vsse are shown below (these are the biggest changes). - vsse16 asm from 64.7 to 59.2, - nsse16 asm from 120.0 to 116.5. Hubert Mazur (5): lavc/aarch64: Add neon implementation for vsad16 lavc/aarch64: Add neon implementation of vsse16 lavc/aarch64: Add neon implementation for vsad_intra16 lavc/aarch64: Add neon implementation for vsse_intra16 lavc/aarch64: Provide neon implementation of nsse16 libavcodec/aarch64/me_cmp_init_aarch64.c | 30 ++ libavcodec/aarch64/me_cmp_neon.S | 385 +++++++++++++++++++++++ 2 files changed, 415 insertions(+) -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".