* [FFmpeg-devel] [PR] Backport CI to release/4.3 (PR #21351)
@ 2026-01-01 21:28 Timo Rothenpieler via ffmpeg-devel
0 siblings, 0 replies; only message in thread
From: Timo Rothenpieler via ffmpeg-devel @ 2026-01-01 21:28 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Timo Rothenpieler
PR #21351 opened by Timo Rothenpieler (BtbN)
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21351
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21351.patch
>From b6bf959e120124d1fa70393b1889c1d5be778b26 Mon Sep 17 00:00:00 2001
From: Timo Rothenpieler <timo@rothenpieler.org>
Date: Sun, 30 Nov 2025 16:25:19 +0100
Subject: [PATCH 1/4] forgejo: backport CI to release/4.3
---
.forgejo/pre-commit/config.yaml | 28 +++
.forgejo/pre-commit/ignored-words.txt | 119 +++++++++++++
.forgejo/workflows/lint.yml | 26 +++
.forgejo/workflows/test.yml | 76 ++++++++
tools/check_arm_indent.sh | 55 ++++++
tools/indent_arm_assembly.pl | 243 ++++++++++++++++++++++++++
6 files changed, 547 insertions(+)
create mode 100644 .forgejo/pre-commit/config.yaml
create mode 100644 .forgejo/pre-commit/ignored-words.txt
create mode 100644 .forgejo/workflows/lint.yml
create mode 100644 .forgejo/workflows/test.yml
create mode 100755 tools/check_arm_indent.sh
create mode 100755 tools/indent_arm_assembly.pl
diff --git a/.forgejo/pre-commit/config.yaml b/.forgejo/pre-commit/config.yaml
new file mode 100644
index 0000000000..cbb91256ca
--- /dev/null
+++ b/.forgejo/pre-commit/config.yaml
@@ -0,0 +1,28 @@
+exclude: ^tests/ref/
+
+repos:
+- repo: https://github.com/pre-commit/pre-commit-hooks
+ rev: v5.0.0
+ hooks:
+ - id: check-case-conflict
+ - id: check-executables-have-shebangs
+ - id: check-illegal-windows-names
+ - id: check-shebang-scripts-are-executable
+ - id: check-yaml
+ - id: end-of-file-fixer
+ - id: file-contents-sorter
+ files:
+ .forgejo/pre-commit/ignored-words.txt
+ args:
+ - --ignore-case
+ - id: fix-byte-order-marker
+ - id: mixed-line-ending
+ - id: trailing-whitespace
+- repo: local
+ hooks:
+ - id: aarch64-asm-indent
+ name: fix aarch64 assembly indentation
+ files: ^.*/aarch64/.*\.S$
+ language: script
+ entry: ./tools/check_arm_indent.sh --apply
+ pass_filenames: false
diff --git a/.forgejo/pre-commit/ignored-words.txt b/.forgejo/pre-commit/ignored-words.txt
new file mode 100644
index 0000000000..870fd96be3
--- /dev/null
+++ b/.forgejo/pre-commit/ignored-words.txt
@@ -0,0 +1,119 @@
+abl
+ACN
+acount
+addin
+alis
+alls
+ALOG
+ALS
+als
+ANC
+anc
+ANS
+ans
+anull
+basf
+bloc
+brane
+BREIF
+BU
+bu
+bufer
+CAF
+caf
+clen
+clens
+Collet
+compre
+dum
+endin
+erro
+FIEL
+fiel
+filp
+fils
+FILTERD
+filterd
+fle
+fo
+FPR
+fro
+Hald
+indx
+ine
+inh
+inout
+inouts
+inport
+ist
+LAF
+laf
+lastr
+LinS
+mapp
+mis
+mot
+nd
+nIn
+offsetp
+orderd
+ot
+outout
+padd
+PAETH
+paeth
+PARM
+parm
+parms
+pEvents
+PixelX
+Psot
+quater
+readd
+recuse
+redY
+Reencode
+reencode
+remaind
+renderD
+rin
+SAV
+SEH
+SER
+ser
+setts
+shft
+SIZ
+siz
+skipd
+sme
+som
+sover
+STAP
+startd
+statics
+struc
+suble
+TE
+tE
+te
+tha
+tne
+tolen
+tpye
+tre
+TRUN
+trun
+truns
+Tung
+TYE
+ue
+UES
+ues
+vai
+vas
+vie
+VILL
+vor
+wel
+wih
diff --git a/.forgejo/workflows/lint.yml b/.forgejo/workflows/lint.yml
new file mode 100644
index 0000000000..9efee16c65
--- /dev/null
+++ b/.forgejo/workflows/lint.yml
@@ -0,0 +1,26 @@
+on:
+ push:
+ branches:
+ - release/4.4
+ pull_request:
+
+jobs:
+ lint:
+ runs-on: utilities
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+ - name: Install pre-commit CI
+ id: install
+ run: |
+ python3 -m venv ~/pre-commit
+ ~/pre-commit/bin/pip install --upgrade pip setuptools
+ ~/pre-commit/bin/pip install pre-commit
+ echo "envhash=$({ python3 --version && cat .forgejo/pre-commit/config.yaml; } | sha256sum | cut -d' ' -f1)" >> $FORGEJO_OUTPUT
+ - name: Cache
+ uses: actions/cache@v4
+ with:
+ path: ~/.cache/pre-commit
+ key: pre-commit-${{ steps.install.outputs.envhash }}
+ - name: Run pre-commit CI
+ run: ~/pre-commit/bin/pre-commit run -c .forgejo/pre-commit/config.yaml --show-diff-on-failure --color=always --all-files
diff --git a/.forgejo/workflows/test.yml b/.forgejo/workflows/test.yml
new file mode 100644
index 0000000000..b0812dd647
--- /dev/null
+++ b/.forgejo/workflows/test.yml
@@ -0,0 +1,76 @@
+on:
+ push:
+ branches:
+ - release/4.4
+ pull_request:
+
+jobs:
+ run_fate:
+ strategy:
+ fail-fast: false
+ matrix:
+ runner: [linux-aarch64]
+ shared: ['static']
+ bits: ['64']
+ include:
+ - runner: linux-amd64
+ shared: 'static'
+ bits: '32'
+ - runner: linux-amd64
+ shared: 'shared'
+ bits: '64'
+ runs-on: ${{ matrix.runner }}
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+ - name: Configure
+ run: |
+ ./configure --enable-gpl --enable-nonfree --enable-memory-poisoning --assert-level=2 \
+ $([ "${{ matrix.bits }}" != "32" ] || echo --arch=x86_32 --extra-cflags=-m32 --extra-cxxflags=-m32 --extra-ldflags=-m32) \
+ $([ "${{ matrix.shared }}" != "shared" ] || echo --enable-shared --disable-static) \
+ || CFGRES=$? && CFGRES=$?
+ cat ffbuild/config.log
+ exit $CFGRES
+ - name: Build
+ run: make -j$(nproc)
+ - name: Restore Cached Fate-Suite
+ id: cache
+ uses: actions/cache/restore@v4
+ with:
+ path: fate-suite
+ key: fate-suite
+ restore-keys: |
+ fate-suite-
+ - name: Sync Fate-Suite
+ id: fate
+ run: |
+ make fate-rsync SAMPLES=$PWD/fate-suite
+ echo "hash=$(find fate-suite -type f -printf "%P %s %T@\n" | sort | sha256sum | cut -d' ' -f1)" >> $FORGEJO_OUTPUT
+ - name: Cache Fate-Suite
+ uses: actions/cache/save@v4
+ if: ${{ format('fate-suite-{0}', steps.fate.outputs.hash) != steps.cache.outputs.cache-matched-key }}
+ with:
+ path: fate-suite
+ key: fate-suite-${{ steps.fate.outputs.hash }}
+ - name: Run Fate
+ run: LD_LIBRARY_PATH="$(printf "%s:" "$PWD"/lib*)$PWD" make fate fate-build SAMPLES=$PWD/fate-suite -j$(nproc)
+ compile_only:
+ strategy:
+ fail-fast: false
+ matrix:
+ image: ["ghcr.io/btbn/ffmpeg-builds/win64-gpl:latest"]
+ runs-on: linux-amd64
+ container: ${{ matrix.image }}
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+ - name: Configure
+ run: |
+ ./configure --pkg-config-flags="--static" $FFBUILD_TARGET_FLAGS $FF_CONFIGURE \
+ --cc="$CC" --cxx="$CXX" --ar="$AR" --ranlib="$RANLIB" --nm="$NM" \
+ --extra-cflags="$FF_CFLAGS" --extra-cxxflags="$FF_CXXFLAGS" \
+ --extra-libs="$FF_LIBS" --extra-ldflags="$FF_LDFLAGS" --extra-ldexeflags="$FF_LDEXEFLAGS"
+ - name: Build
+ run: make -j$(nproc)
+ - name: Run Fate
+ run: make -j$(nproc) fate-build
diff --git a/tools/check_arm_indent.sh b/tools/check_arm_indent.sh
new file mode 100755
index 0000000000..5becfe0aec
--- /dev/null
+++ b/tools/check_arm_indent.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+#
+# Copyright (c) 2025 Martin Storsjo
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+cd $(dirname $0)/..
+
+if [ "$1" = "--apply" ]; then
+ apply=1
+fi
+
+ret=0
+
+for i in */aarch64/*.S */aarch64/*/*.S; do
+ case $i in
+ libavcodec/aarch64/h264idct_neon.S|libavcodec/aarch64/h26x/epel_neon.S|libavcodec/aarch64/h26x/qpel_neon.S|libavcodec/aarch64/vc1dsp_neon.S)
+ # Skip files with known (and tolerated) deviations from the tool.
+ continue
+ esac
+ ./tools/indent_arm_assembly.pl < "$i" > tmp.S || ret=$?
+ if ! git diff --quiet --no-index "$i" tmp.S; then
+ if [ -n "$apply" ]; then
+ mv tmp.S "$i"
+ else
+ git --no-pager diff --no-index "$i" tmp.S
+ fi
+ ret=1
+ fi
+done
+
+rm -f tmp.S
+
+exit $ret
diff --git a/tools/indent_arm_assembly.pl b/tools/indent_arm_assembly.pl
new file mode 100755
index 0000000000..359c2bcf4f
--- /dev/null
+++ b/tools/indent_arm_assembly.pl
@@ -0,0 +1,243 @@
+#!/usr/bin/env perl
+#
+# Copyright (c) 2025 Martin Storsjo
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+# A script for reformatting ARM/AArch64 assembly according to the following
+# style:
+# - Instructions start after 8 columns, operands start after 24 columns
+# - Vector register layouts and modifiers like "uxtw" are written in lowercase
+# - Optionally align operand columns vertically according to their
+# maximum width (accommodating for e.g. x0 vs x10, or v0.8b vs v16.16b).
+#
+# The script can be executed as "indent_arm_assembly.pl file [outfile]".
+# If no outfile is specified, the given file is overwritten in place.
+#
+# Alternatively, the if no file parameters are given, the script reads input
+# code on stdin, and outputs the reformatted code on stdout.
+
+use strict;
+
+my $indent_operands = 0;
+my $instr_indent = 8;
+my $operand_indent = 24;
+my $match_indent = 0;
+my $file;
+my $outfile;
+
+while (@ARGV) {
+ my $opt = shift;
+
+ if ($opt eq "-operands") {
+ $indent_operands = 1;
+ } elsif ($opt eq "-indent") {
+ $instr_indent = shift;
+ } elsif ($opt eq "-operand-indent") {
+ $operand_indent = shift;
+ } elsif ($opt eq "-match-indent") {
+ $match_indent = 1;
+ } else {
+ if (!$file) {
+ $file = $opt;
+ } elsif (!$outfile) {
+ $outfile = $opt;
+ } else {
+ die "Unrecognized parameter $opt\n";
+ }
+ }
+}
+
+if ($operand_indent < $instr_indent) {
+ die "Can't indent operands to $operand_indent while indenting " .
+ "instructions to $instr_indent\n";
+}
+
+# Return a string consisting of n spaces
+sub spaces {
+ my $n = $_[0];
+ return " " x $n;
+}
+
+sub indentcolumns {
+ my $input = $_[0];
+ my $chars = $_[1];
+ my @operands = split(/,/, $input);
+ my $num = @operands;
+ my $ret = "";
+ for (my $i = 0; $i < $num; $i++) {
+ my $cur = $operands[$i];
+ # Trim out leading/trailing whitespace
+ $cur =~ s/^\s+|\s+$//g;
+ $ret .= $cur;
+ if ($i + 1 < $num) {
+ # If we have a following operand, add a comma and whitespace to
+ # align the next operand.
+ my $next = $operands[$i+1];
+ my $len = length($cur);
+ if ($len > $chars) {
+ # If this operand was too wide for the intended column width,
+ # don't try to realign the line at all, just return the input
+ # untouched.
+ return $input;
+ }
+ my $pad = $chars - $len;
+ if ($next =~ /[su]xt[bhw]|[la]s[lr]/) {
+ # If the next item isn't a regular operand, but a modifier,
+ # don't try to align that. E.g. "add x0, x0, w1, uxtw #1".
+ $pad = 0;
+ }
+ $ret .= "," . spaces(1 + $pad);
+ }
+ }
+ return $ret;
+}
+
+# Realign the operands part of an instruction line, making each operand
+# take up the maximum width for that kind of operand.
+sub columns {
+ my $rest = $_[0];
+ if ($rest !~ /,/) {
+ # No commas, no operands to split and align
+ return $rest;
+ }
+ if ($rest =~ /{|[^\w]\[/) {
+ # Check for instructions that use register ranges, like {v0.8b,v1.8b}
+ # or mem address operands, like "ldr x0, [sp]" - we skip trying to
+ # realign these.
+ return $rest;
+ }
+ if ($rest =~ /v[0-9]+\.[0-9]+[bhsd]/) {
+ # If we have references to aarch64 style vector registers, like
+ # v0.8b, then align all operands to the maximum width of such
+ # operands - v16.16b.
+ #
+ # TODO: Ideally, we'd handle mixed operand types individually.
+ return indentcolumns($rest, 7);
+ }
+ # Indent operands according to the maximum width of regular registers,
+ # like x10.
+ return indentcolumns($rest, 3);
+}
+
+my $in;
+my $out;
+my $tempfile;
+
+if ($file) {
+ open(INPUT, "$file") or die "Unable to open $file: $!";
+ $in = *INPUT;
+ if ($outfile) {
+ open(OUTPUT, ">$outfile") or die "Unable to open $outfile: $!";
+ } else {
+ $tempfile = "$file.tmp";
+ open(OUTPUT, ">$tempfile") or die "Unable to open $tempfile: $!";
+ }
+ $out = *OUTPUT;
+} else {
+ $in = *STDIN;
+ $out = *STDOUT;
+}
+
+while (<$in>) {
+ # Trim off trailing whitespace.
+ chomp;
+ if (/^([\.\w\d]+:)?(\s+)([\w\\][\w\\\.]*)(?:(\s+)(.*)|$)/) {
+ my $label = $1;
+ my $indent = $2;
+ my $instr = $3;
+ my $origspace = $4;
+ my $rest = $5;
+
+ my $orig_operand_indent = length($label) + length($indent) +
+ length($instr) + length($origspace);
+
+ if ($indent_operands) {
+ $rest = columns($rest);
+ }
+
+ my $size = $instr_indent;
+ if ($match_indent) {
+ # Try to check the current attempted indent size and normalize
+ # to it; match existing ident sizes of 4, 8, 10 and 12 columns.
+ my $cur_indent = length($label) + length($indent);
+ if ($cur_indent >= 3 && $cur_indent <= 5) {
+ $size = 4;
+ } elsif ($cur_indent >= 7 && $cur_indent <= 9) {
+ $size = 8;
+ } elsif ($cur_indent == 10 || $cur_indent == 12) {
+ $size = $cur_indent;
+ }
+ }
+ if (length($label) >= $size) {
+ # Not enough space for the label; just add a space between the label
+ # and the instruction.
+ $indent = " ";
+ } else {
+ $indent = spaces($size - length($label));
+ }
+
+ my $instr_end = length($label) + length($indent) + length($instr);
+ $size = $operand_indent - $instr_end;
+ if ($match_indent) {
+ # Check how the operands currently seem to be indented.
+ my $cur_indent = $orig_operand_indent;
+ if ($cur_indent >= 11 && $cur_indent <= 13) {
+ $size = 12;
+ } elsif ($cur_indent >= 14 && $cur_indent <= 17) {
+ $size = 16;
+ } elsif ($cur_indent >= 18 && $cur_indent <= 22) {
+ $size = 20;
+ } elsif ($cur_indent >= 23 && $cur_indent <= 27) {
+ $size = 24;
+ }
+ $size -= $instr_end;
+ }
+ my $operand_space = " ";
+ if ($size > 0) {
+ $operand_space = spaces($size);
+ }
+
+ # Lowercase the aarch64 vector layout description, .8B -> .8b
+ $rest =~ s/(\.[84216]*[BHSD])/lc($1)/ge;
+ # Lowercase modifiers like "uxtw" or "lsl"
+ $rest =~ s/([SU]XT[BWH]|[LA]S[LR])/lc($1)/ge;
+
+ # Reassemble the line
+ if ($rest eq "") {
+ $_ = $label . $indent . $instr;
+ } else {
+ $_ = $label . $indent . $instr . $operand_space . $rest;
+ }
+ }
+ print $out $_ . "\n";
+}
+
+if ($file) {
+ close(INPUT);
+ close(OUTPUT);
+}
+if ($tempfile) {
+ rename($tempfile, $file);
+}
--
2.49.1
>From 6a626c8998b678790a73337024f32b6453b1a937 Mon Sep 17 00:00:00 2001
From: Timo Rothenpieler <timo@rothenpieler.org>
Date: Sun, 30 Nov 2025 16:58:33 +0100
Subject: [PATCH 2/4] forgejo: apply needed CI changes for 4.3
---
.forgejo/pre-commit/config.yaml | 5 --
.forgejo/pre-commit/ignored-words.txt | 119 --------------------------
.forgejo/workflows/lint.yml | 2 +-
.forgejo/workflows/test.yml | 4 +-
4 files changed, 3 insertions(+), 127 deletions(-)
delete mode 100644 .forgejo/pre-commit/ignored-words.txt
diff --git a/.forgejo/pre-commit/config.yaml b/.forgejo/pre-commit/config.yaml
index cbb91256ca..8ce116dae6 100644
--- a/.forgejo/pre-commit/config.yaml
+++ b/.forgejo/pre-commit/config.yaml
@@ -10,11 +10,6 @@ repos:
- id: check-shebang-scripts-are-executable
- id: check-yaml
- id: end-of-file-fixer
- - id: file-contents-sorter
- files:
- .forgejo/pre-commit/ignored-words.txt
- args:
- - --ignore-case
- id: fix-byte-order-marker
- id: mixed-line-ending
- id: trailing-whitespace
diff --git a/.forgejo/pre-commit/ignored-words.txt b/.forgejo/pre-commit/ignored-words.txt
deleted file mode 100644
index 870fd96be3..0000000000
--- a/.forgejo/pre-commit/ignored-words.txt
+++ /dev/null
@@ -1,119 +0,0 @@
-abl
-ACN
-acount
-addin
-alis
-alls
-ALOG
-ALS
-als
-ANC
-anc
-ANS
-ans
-anull
-basf
-bloc
-brane
-BREIF
-BU
-bu
-bufer
-CAF
-caf
-clen
-clens
-Collet
-compre
-dum
-endin
-erro
-FIEL
-fiel
-filp
-fils
-FILTERD
-filterd
-fle
-fo
-FPR
-fro
-Hald
-indx
-ine
-inh
-inout
-inouts
-inport
-ist
-LAF
-laf
-lastr
-LinS
-mapp
-mis
-mot
-nd
-nIn
-offsetp
-orderd
-ot
-outout
-padd
-PAETH
-paeth
-PARM
-parm
-parms
-pEvents
-PixelX
-Psot
-quater
-readd
-recuse
-redY
-Reencode
-reencode
-remaind
-renderD
-rin
-SAV
-SEH
-SER
-ser
-setts
-shft
-SIZ
-siz
-skipd
-sme
-som
-sover
-STAP
-startd
-statics
-struc
-suble
-TE
-tE
-te
-tha
-tne
-tolen
-tpye
-tre
-TRUN
-trun
-truns
-Tung
-TYE
-ue
-UES
-ues
-vai
-vas
-vie
-VILL
-vor
-wel
-wih
diff --git a/.forgejo/workflows/lint.yml b/.forgejo/workflows/lint.yml
index 9efee16c65..1120b56ea2 100644
--- a/.forgejo/workflows/lint.yml
+++ b/.forgejo/workflows/lint.yml
@@ -1,7 +1,7 @@
on:
push:
branches:
- - release/4.4
+ - release/4.3
pull_request:
jobs:
diff --git a/.forgejo/workflows/test.yml b/.forgejo/workflows/test.yml
index b0812dd647..7be5eca958 100644
--- a/.forgejo/workflows/test.yml
+++ b/.forgejo/workflows/test.yml
@@ -1,7 +1,7 @@
on:
push:
branches:
- - release/4.4
+ - release/4.3
pull_request:
jobs:
@@ -58,7 +58,7 @@ jobs:
strategy:
fail-fast: false
matrix:
- image: ["ghcr.io/btbn/ffmpeg-builds/win64-gpl:latest"]
+ image: ["ghcr.io/btbn/ffmpeg-builds/win64-gpl-4.4:latest"]
runs-on: linux-amd64
container: ${{ matrix.image }}
steps:
--
2.49.1
>From 4873655eecb20fe50fcb4d5bfa5332db06bfe017 Mon Sep 17 00:00:00 2001
From: Timo Rothenpieler <timo@rothenpieler.org>
Date: Thu, 1 Jan 2026 22:24:08 +0100
Subject: [PATCH 3/4] all: apply linter fixes
---
COPYING.LGPLv2.1 | 18 +-
doc/build_system.txt | 1 -
doc/nut.texi | 1 -
doc/texi2pod.pl | 0
doc/texidep.pl | 0
doc/undefined.txt | 1 -
ffbuild/libversion.sh | 2 +
libavcodec/aarch64/aacpsdsp_neon.S | 218 ++---
libavcodec/aarch64/fft_neon.S | 24 +-
libavcodec/aarch64/h264cmc_neon.S | 410 +++++-----
libavcodec/aarch64/h264dsp_neon.S | 1062 ++++++++++++-------------
libavcodec/aarch64/h264qpel_neon.S | 544 ++++++-------
libavcodec/aarch64/hpeldsp_neon.S | 362 ++++-----
libavcodec/aarch64/neon.S | 192 ++---
libavcodec/aarch64/opusdsp_neon.S | 102 +--
libavcodec/aarch64/sbrdsp_neon.S | 294 +++----
libavcodec/aarch64/simple_idct_neon.S | 380 ++++-----
libavcodec/aarch64/vp8dsp_neon.S | 302 +++----
libavcodec/arm/int_neon.S | 1 -
libavcodec/cljrdec.c | 1 -
libavcodec/dv.c | 1 -
libavcodec/dv_profile.c | 1 -
libavcodec/ffv1_template.c | 1 -
libavcodec/ffv1enc_template.c | 1 -
libavcodec/h264_mc_template.c | 1 -
libavcodec/hevc_cabac.c | 1 -
libavcodec/mpegaudiodsp_template.c | 1 -
libavcodec/mpegaudioenc_template.c | 1 -
libavcodec/msmpeg4.c | 1 -
libavcodec/snow_dwt.c | 2 -
libavcodec/x86/fmtconvert.asm | 1 -
libavcodec/x86/mpegvideoencdsp.asm | 1 -
libavfilter/aarch64/vf_nlmeans_neon.S | 78 +-
libavfilter/scene_sad.c | 1 -
libavfilter/vf_overlay_cuda.cu | 1 -
libavformat/caf.c | 1 -
libavformat/dash.c | 2 -
libavformat/fifo_test.c | 1 -
libavformat/g726.c | 1 -
libavformat/hlsplaylist.c | 1 -
libavformat/mlpdec.c | 1 -
libavformat/mpjpegdec.c | 2 -
libavformat/rdt.c | 1 -
libavutil/aarch64/float_dsp_neon.S | 200 ++---
libavutil/aes.c | 1 -
libavutil/hwcontext_cuda_internal.h | 1 -
libavutil/hwcontext_qsv.h | 1 -
libavutil/tests/blowfish.c | 1 -
libswresample/aarch64/resample.S | 88 +-
libswresample/soxr_resample.c | 1 -
libswresample/swresample_frame.c | 1 -
libswscale/aarch64/hscale.S | 110 +--
libswscale/aarch64/output.S | 64 +-
libswscale/aarch64/yuv2rgb_neon.S | 230 +++---
libswscale/gamma.c | 1 -
libswscale/vscale.c | 2 -
libswscale/x86/yuv2rgb_template.c | 1 -
tests/extended.ffconcat | 1 -
tests/fate/ffprobe.mak | 1 -
tests/fate/flvenc.mak | 2 -
tests/fate/lossless-audio.mak | 1 -
tests/simple1.ffconcat | 1 -
tests/simple2.ffconcat | 1 -
63 files changed, 2341 insertions(+), 2386 deletions(-)
mode change 100644 => 100755 doc/texi2pod.pl
mode change 100644 => 100755 doc/texidep.pl
diff --git a/COPYING.LGPLv2.1 b/COPYING.LGPLv2.1
index 58af0d3787..40924c2a6d 100644
--- a/COPYING.LGPLv2.1
+++ b/COPYING.LGPLv2.1
@@ -55,7 +55,7 @@ modified by someone else and passed on, the recipients should know
that what they have is not the original version, so that the original
author's reputation will not be affected by problems that might be
introduced by others.
-\f
+
Finally, software patents pose a constant threat to the existence of
any free program. We wish to make sure that a company cannot
effectively restrict the users of a free program by obtaining a
@@ -111,7 +111,7 @@ modification follow. Pay close attention to the difference between a
"work based on the library" and a "work that uses the library". The
former contains code derived from the library, whereas the latter must
be combined with the library in order to run.
-\f
+
GNU LESSER GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
@@ -158,7 +158,7 @@ Library.
You may charge a fee for the physical act of transferring a copy,
and you may at your option offer warranty protection in exchange for a
fee.
-\f
+
2. You may modify your copy or copies of the Library or any portion
of it, thus forming a work based on the Library, and copy and
distribute such modifications or work under the terms of Section 1
@@ -216,7 +216,7 @@ instead of to this License. (If a newer version than version 2 of the
ordinary GNU General Public License has appeared, then you can specify
that version instead if you wish.) Do not make any other change in
these notices.
-\f
+
Once this change is made in a given copy, it is irreversible for
that copy, so the ordinary GNU General Public License applies to all
subsequent copies and derivative works made from that copy.
@@ -267,7 +267,7 @@ Library will still fall under Section 6.)
distribute the object code for the work under the terms of Section 6.
Any executables containing that work also fall under Section 6,
whether or not they are linked directly with the Library itself.
-\f
+
6. As an exception to the Sections above, you may also combine or
link a "work that uses the Library" with the Library to produce a
work containing portions of the Library, and distribute that work
@@ -329,7 +329,7 @@ restrictions of other proprietary libraries that do not normally
accompany the operating system. Such a contradiction means you cannot
use both them and the Library together in an executable that you
distribute.
-\f
+
7. You may place library facilities that are a work based on the
Library side-by-side in a single library together with other library
facilities not covered by this License, and distribute such a combined
@@ -370,7 +370,7 @@ subject to these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties with
this License.
-\f
+
11. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
@@ -422,7 +422,7 @@ conditions either of that version or of any later version published by
the Free Software Foundation. If the Library does not specify a
license version number, you may choose any version ever published by
the Free Software Foundation.
-\f
+
14. If you wish to incorporate parts of the Library into other free
programs whose distribution conditions are incompatible with these,
write to the author to ask for permission. For software which is
@@ -456,7 +456,7 @@ SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
END OF TERMS AND CONDITIONS
-\f
+
How to Apply These Terms to Your New Libraries
If you develop a new library, and you want it to be of the greatest
diff --git a/doc/build_system.txt b/doc/build_system.txt
index 0b1b0c2054..a0aa17bdad 100644
--- a/doc/build_system.txt
+++ b/doc/build_system.txt
@@ -63,4 +63,3 @@ make -j<num>
make -k
Continue build in case of errors, this is useful for the regression tests
sometimes but note that it will still not run all reg tests.
-
diff --git a/doc/nut.texi b/doc/nut.texi
index f9b18f5660..d3a5b8de39 100644
--- a/doc/nut.texi
+++ b/doc/nut.texi
@@ -157,4 +157,3 @@ PFD[32] would for example be signed 32 bit little-endian IEEE float
@item XVID @tab non-compliant MPEG-4 generated by old Xvid
@item XVIX @tab non-compliant MPEG-4 generated by old Xvid with interlacing bug
@end multitable
-
diff --git a/doc/texi2pod.pl b/doc/texi2pod.pl
old mode 100644
new mode 100755
diff --git a/doc/texidep.pl b/doc/texidep.pl
old mode 100644
new mode 100755
diff --git a/doc/undefined.txt b/doc/undefined.txt
index e01299bef1..faf8816c66 100644
--- a/doc/undefined.txt
+++ b/doc/undefined.txt
@@ -44,4 +44,3 @@ a+b*c;
here the reader knows that a,b,c are meant to be signed integers but for C
standard compliance / to avoid undefined behavior they are stored in unsigned
ints.
-
diff --git a/ffbuild/libversion.sh b/ffbuild/libversion.sh
index 990ce9f640..30046b1d25 100755
--- a/ffbuild/libversion.sh
+++ b/ffbuild/libversion.sh
@@ -1,3 +1,5 @@
+#!/bin/sh
+
toupper(){
echo "$@" | tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
}
diff --git a/libavcodec/aarch64/aacpsdsp_neon.S b/libavcodec/aarch64/aacpsdsp_neon.S
index ff4e6e244a..f8cb0b2959 100644
--- a/libavcodec/aarch64/aacpsdsp_neon.S
+++ b/libavcodec/aarch64/aacpsdsp_neon.S
@@ -19,130 +19,130 @@
#include "libavutil/aarch64/asm.S"
function ff_ps_add_squares_neon, export=1
-1: ld1 {v0.4S,v1.4S}, [x1], #32
- fmul v0.4S, v0.4S, v0.4S
- fmul v1.4S, v1.4S, v1.4S
- faddp v2.4S, v0.4S, v1.4S
- ld1 {v3.4S}, [x0]
- fadd v3.4S, v3.4S, v2.4S
- st1 {v3.4S}, [x0], #16
- subs w2, w2, #4
- b.gt 1b
+1: ld1 {v0.4s,v1.4s}, [x1], #32
+ fmul v0.4s, v0.4s, v0.4s
+ fmul v1.4s, v1.4s, v1.4s
+ faddp v2.4s, v0.4s, v1.4s
+ ld1 {v3.4s}, [x0]
+ fadd v3.4s, v3.4s, v2.4s
+ st1 {v3.4s}, [x0], #16
+ subs w2, w2, #4
+ b.gt 1b
ret
endfunc
function ff_ps_mul_pair_single_neon, export=1
-1: ld1 {v0.4S,v1.4S}, [x1], #32
- ld1 {v2.4S}, [x2], #16
- zip1 v3.4S, v2.4S, v2.4S
- zip2 v4.4S, v2.4S, v2.4S
- fmul v0.4S, v0.4S, v3.4S
- fmul v1.4S, v1.4S, v4.4S
- st1 {v0.4S,v1.4S}, [x0], #32
- subs w3, w3, #4
- b.gt 1b
+1: ld1 {v0.4s,v1.4s}, [x1], #32
+ ld1 {v2.4s}, [x2], #16
+ zip1 v3.4s, v2.4s, v2.4s
+ zip2 v4.4s, v2.4s, v2.4s
+ fmul v0.4s, v0.4s, v3.4s
+ fmul v1.4s, v1.4s, v4.4s
+ st1 {v0.4s,v1.4s}, [x0], #32
+ subs w3, w3, #4
+ b.gt 1b
ret
endfunc
function ff_ps_stereo_interpolate_neon, export=1
- ld1 {v0.4S}, [x2]
- ld1 {v1.4S}, [x3]
- zip1 v4.4S, v0.4S, v0.4S
- zip2 v5.4S, v0.4S, v0.4S
- zip1 v6.4S, v1.4S, v1.4S
- zip2 v7.4S, v1.4S, v1.4S
-1: ld1 {v2.2S}, [x0]
- ld1 {v3.2S}, [x1]
- fadd v4.4S, v4.4S, v6.4S
- fadd v5.4S, v5.4S, v7.4S
- mov v2.D[1], v2.D[0]
- mov v3.D[1], v3.D[0]
- fmul v2.4S, v2.4S, v4.4S
- fmla v2.4S, v3.4S, v5.4S
- st1 {v2.D}[0], [x0], #8
- st1 {v2.D}[1], [x1], #8
- subs w4, w4, #1
- b.gt 1b
+ ld1 {v0.4s}, [x2]
+ ld1 {v1.4s}, [x3]
+ zip1 v4.4s, v0.4s, v0.4s
+ zip2 v5.4s, v0.4s, v0.4s
+ zip1 v6.4s, v1.4s, v1.4s
+ zip2 v7.4s, v1.4s, v1.4s
+1: ld1 {v2.2s}, [x0]
+ ld1 {v3.2s}, [x1]
+ fadd v4.4s, v4.4s, v6.4s
+ fadd v5.4s, v5.4s, v7.4s
+ mov v2.d[1], v2.d[0]
+ mov v3.d[1], v3.d[0]
+ fmul v2.4s, v2.4s, v4.4s
+ fmla v2.4s, v3.4s, v5.4s
+ st1 {v2.d}[0], [x0], #8
+ st1 {v2.d}[1], [x1], #8
+ subs w4, w4, #1
+ b.gt 1b
ret
endfunc
function ff_ps_stereo_interpolate_ipdopd_neon, export=1
- ld1 {v0.4S,v1.4S}, [x2]
- ld1 {v6.4S,v7.4S}, [x3]
- fneg v2.4S, v1.4S
- fneg v3.4S, v7.4S
- zip1 v16.4S, v0.4S, v0.4S
- zip2 v17.4S, v0.4S, v0.4S
- zip1 v18.4S, v2.4S, v1.4S
- zip2 v19.4S, v2.4S, v1.4S
- zip1 v20.4S, v6.4S, v6.4S
- zip2 v21.4S, v6.4S, v6.4S
- zip1 v22.4S, v3.4S, v7.4S
- zip2 v23.4S, v3.4S, v7.4S
-1: ld1 {v2.2S}, [x0]
- ld1 {v3.2S}, [x1]
- fadd v16.4S, v16.4S, v20.4S
- fadd v17.4S, v17.4S, v21.4S
- mov v2.D[1], v2.D[0]
- mov v3.D[1], v3.D[0]
- fmul v4.4S, v2.4S, v16.4S
- fmla v4.4S, v3.4S, v17.4S
- fadd v18.4S, v18.4S, v22.4S
- fadd v19.4S, v19.4S, v23.4S
- ext v2.16B, v2.16B, v2.16B, #4
- ext v3.16B, v3.16B, v3.16B, #4
- fmla v4.4S, v2.4S, v18.4S
- fmla v4.4S, v3.4S, v19.4S
- st1 {v4.D}[0], [x0], #8
- st1 {v4.D}[1], [x1], #8
- subs w4, w4, #1
- b.gt 1b
+ ld1 {v0.4s,v1.4s}, [x2]
+ ld1 {v6.4s,v7.4s}, [x3]
+ fneg v2.4s, v1.4s
+ fneg v3.4s, v7.4s
+ zip1 v16.4s, v0.4s, v0.4s
+ zip2 v17.4s, v0.4s, v0.4s
+ zip1 v18.4s, v2.4s, v1.4s
+ zip2 v19.4s, v2.4s, v1.4s
+ zip1 v20.4s, v6.4s, v6.4s
+ zip2 v21.4s, v6.4s, v6.4s
+ zip1 v22.4s, v3.4s, v7.4s
+ zip2 v23.4s, v3.4s, v7.4s
+1: ld1 {v2.2s}, [x0]
+ ld1 {v3.2s}, [x1]
+ fadd v16.4s, v16.4s, v20.4s
+ fadd v17.4s, v17.4s, v21.4s
+ mov v2.d[1], v2.d[0]
+ mov v3.d[1], v3.d[0]
+ fmul v4.4s, v2.4s, v16.4s
+ fmla v4.4s, v3.4s, v17.4s
+ fadd v18.4s, v18.4s, v22.4s
+ fadd v19.4s, v19.4s, v23.4s
+ ext v2.16b, v2.16b, v2.16b, #4
+ ext v3.16b, v3.16b, v3.16b, #4
+ fmla v4.4s, v2.4s, v18.4s
+ fmla v4.4s, v3.4s, v19.4s
+ st1 {v4.d}[0], [x0], #8
+ st1 {v4.d}[1], [x1], #8
+ subs w4, w4, #1
+ b.gt 1b
ret
endfunc
function ff_ps_hybrid_analysis_neon, export=1
- lsl x3, x3, #3
- ld2 {v0.4S,v1.4S}, [x1], #32
- ld2 {v2.2S,v3.2S}, [x1], #16
- ld1 {v24.2S}, [x1], #8
- ld2 {v4.2S,v5.2S}, [x1], #16
- ld2 {v6.4S,v7.4S}, [x1]
- rev64 v6.4S, v6.4S
- rev64 v7.4S, v7.4S
- ext v6.16B, v6.16B, v6.16B, #8
- ext v7.16B, v7.16B, v7.16B, #8
- rev64 v4.2S, v4.2S
- rev64 v5.2S, v5.2S
- mov v2.D[1], v3.D[0]
- mov v4.D[1], v5.D[0]
- mov v5.D[1], v2.D[0]
- mov v3.D[1], v4.D[0]
- fadd v16.4S, v0.4S, v6.4S
- fadd v17.4S, v1.4S, v7.4S
- fsub v18.4S, v1.4S, v7.4S
- fsub v19.4S, v0.4S, v6.4S
- fadd v22.4S, v2.4S, v4.4S
- fsub v23.4S, v5.4S, v3.4S
- trn1 v20.2D, v22.2D, v23.2D // {re4+re8, re5+re7, im8-im4, im7-im5}
- trn2 v21.2D, v22.2D, v23.2D // {im4+im8, im5+im7, re4-re8, re5-re7}
-1: ld2 {v2.4S,v3.4S}, [x2], #32
- ld2 {v4.2S,v5.2S}, [x2], #16
- ld1 {v6.2S}, [x2], #8
- add x2, x2, #8
- mov v4.D[1], v5.D[0]
- mov v6.S[1], v6.S[0]
- fmul v6.2S, v6.2S, v24.2S
- fmul v0.4S, v2.4S, v16.4S
- fmul v1.4S, v2.4S, v17.4S
- fmls v0.4S, v3.4S, v18.4S
- fmla v1.4S, v3.4S, v19.4S
- fmla v0.4S, v4.4S, v20.4S
- fmla v1.4S, v4.4S, v21.4S
- faddp v0.4S, v0.4S, v1.4S
- faddp v0.4S, v0.4S, v0.4S
- fadd v0.2S, v0.2S, v6.2S
- st1 {v0.2S}, [x0], x3
- subs w4, w4, #1
- b.gt 1b
+ lsl x3, x3, #3
+ ld2 {v0.4s,v1.4s}, [x1], #32
+ ld2 {v2.2s,v3.2s}, [x1], #16
+ ld1 {v24.2s}, [x1], #8
+ ld2 {v4.2s,v5.2s}, [x1], #16
+ ld2 {v6.4s,v7.4s}, [x1]
+ rev64 v6.4s, v6.4s
+ rev64 v7.4s, v7.4s
+ ext v6.16b, v6.16b, v6.16b, #8
+ ext v7.16b, v7.16b, v7.16b, #8
+ rev64 v4.2s, v4.2s
+ rev64 v5.2s, v5.2s
+ mov v2.d[1], v3.d[0]
+ mov v4.d[1], v5.d[0]
+ mov v5.d[1], v2.d[0]
+ mov v3.d[1], v4.d[0]
+ fadd v16.4s, v0.4s, v6.4s
+ fadd v17.4s, v1.4s, v7.4s
+ fsub v18.4s, v1.4s, v7.4s
+ fsub v19.4s, v0.4s, v6.4s
+ fadd v22.4s, v2.4s, v4.4s
+ fsub v23.4s, v5.4s, v3.4s
+ trn1 v20.2d, v22.2d, v23.2d // {re4+re8, re5+re7, im8-im4, im7-im5}
+ trn2 v21.2d, v22.2d, v23.2d // {im4+im8, im5+im7, re4-re8, re5-re7}
+1: ld2 {v2.4s,v3.4s}, [x2], #32
+ ld2 {v4.2s,v5.2s}, [x2], #16
+ ld1 {v6.2s}, [x2], #8
+ add x2, x2, #8
+ mov v4.d[1], v5.d[0]
+ mov v6.s[1], v6.s[0]
+ fmul v6.2s, v6.2s, v24.2s
+ fmul v0.4s, v2.4s, v16.4s
+ fmul v1.4s, v2.4s, v17.4s
+ fmls v0.4s, v3.4s, v18.4s
+ fmla v1.4s, v3.4s, v19.4s
+ fmla v0.4s, v4.4s, v20.4s
+ fmla v1.4s, v4.4s, v21.4s
+ faddp v0.4s, v0.4s, v1.4s
+ faddp v0.4s, v0.4s, v0.4s
+ fadd v0.2s, v0.2s, v6.2s
+ st1 {v0.2s}, [x0], x3
+ subs w4, w4, #1
+ b.gt 1b
ret
endfunc
diff --git a/libavcodec/aarch64/fft_neon.S b/libavcodec/aarch64/fft_neon.S
index 862039f97d..4d81fd290d 100644
--- a/libavcodec/aarch64/fft_neon.S
+++ b/libavcodec/aarch64/fft_neon.S
@@ -353,18 +353,18 @@ function fft\n\()_neon, align=6
endfunc
.endm
- def_fft 32, 16, 8
- def_fft 64, 32, 16
- def_fft 128, 64, 32
- def_fft 256, 128, 64
- def_fft 512, 256, 128
- def_fft 1024, 512, 256
- def_fft 2048, 1024, 512
- def_fft 4096, 2048, 1024
- def_fft 8192, 4096, 2048
- def_fft 16384, 8192, 4096
- def_fft 32768, 16384, 8192
- def_fft 65536, 32768, 16384
+ def_fft 32, 16, 8
+ def_fft 64, 32, 16
+ def_fft 128, 64, 32
+ def_fft 256, 128, 64
+ def_fft 512, 256, 128
+ def_fft 1024, 512, 256
+ def_fft 2048, 1024, 512
+ def_fft 4096, 2048, 1024
+ def_fft 8192, 4096, 2048
+ def_fft 16384, 8192, 4096
+ def_fft 32768, 16384, 8192
+ def_fft 65536, 32768, 16384
function ff_fft_calc_neon, export=1
prfm pldl1keep, [x1]
diff --git a/libavcodec/aarch64/h264cmc_neon.S b/libavcodec/aarch64/h264cmc_neon.S
index 8be7578001..67de21671b 100644
--- a/libavcodec/aarch64/h264cmc_neon.S
+++ b/libavcodec/aarch64/h264cmc_neon.S
@@ -36,11 +36,11 @@ function ff_\type\()_\codec\()_chroma_mc8_neon, export=1
lsl w9, w9, #3
lsl w10, w10, #1
add w9, w9, w10
- add x6, x6, w9, UXTW
- ld1r {v22.8H}, [x6]
+ add x6, x6, w9, uxtw
+ ld1r {v22.8h}, [x6]
.endif
.ifc \codec,vc1
- movi v22.8H, #28
+ movi v22.8h, #28
.endif
mul w7, w4, w5
lsl w14, w5, #3
@@ -53,139 +53,139 @@ function ff_\type\()_\codec\()_chroma_mc8_neon, export=1
add w4, w4, #64
b.eq 2f
- dup v0.8B, w4
- dup v1.8B, w12
- ld1 {v4.8B, v5.8B}, [x1], x2
- dup v2.8B, w6
- dup v3.8B, w7
- ext v5.8B, v4.8B, v5.8B, #1
-1: ld1 {v6.8B, v7.8B}, [x1], x2
- umull v16.8H, v4.8B, v0.8B
- umlal v16.8H, v5.8B, v1.8B
- ext v7.8B, v6.8B, v7.8B, #1
- ld1 {v4.8B, v5.8B}, [x1], x2
- umlal v16.8H, v6.8B, v2.8B
+ dup v0.8b, w4
+ dup v1.8b, w12
+ ld1 {v4.8b, v5.8b}, [x1], x2
+ dup v2.8b, w6
+ dup v3.8b, w7
+ ext v5.8b, v4.8b, v5.8b, #1
+1: ld1 {v6.8b, v7.8b}, [x1], x2
+ umull v16.8h, v4.8b, v0.8b
+ umlal v16.8h, v5.8b, v1.8b
+ ext v7.8b, v6.8b, v7.8b, #1
+ ld1 {v4.8b, v5.8b}, [x1], x2
+ umlal v16.8h, v6.8b, v2.8b
prfm pldl1strm, [x1]
- ext v5.8B, v4.8B, v5.8B, #1
- umlal v16.8H, v7.8B, v3.8B
- umull v17.8H, v6.8B, v0.8B
+ ext v5.8b, v4.8b, v5.8b, #1
+ umlal v16.8h, v7.8b, v3.8b
+ umull v17.8h, v6.8b, v0.8b
subs w3, w3, #2
- umlal v17.8H, v7.8B, v1.8B
- umlal v17.8H, v4.8B, v2.8B
- umlal v17.8H, v5.8B, v3.8B
+ umlal v17.8h, v7.8b, v1.8b
+ umlal v17.8h, v4.8b, v2.8b
+ umlal v17.8h, v5.8b, v3.8b
prfm pldl1strm, [x1, x2]
.ifc \codec,h264
- rshrn v16.8B, v16.8H, #6
- rshrn v17.8B, v17.8H, #6
+ rshrn v16.8b, v16.8h, #6
+ rshrn v17.8b, v17.8h, #6
.else
- add v16.8H, v16.8H, v22.8H
- add v17.8H, v17.8H, v22.8H
- shrn v16.8B, v16.8H, #6
- shrn v17.8B, v17.8H, #6
+ add v16.8h, v16.8h, v22.8h
+ add v17.8h, v17.8h, v22.8h
+ shrn v16.8b, v16.8h, #6
+ shrn v17.8b, v17.8h, #6
.endif
.ifc \type,avg
- ld1 {v20.8B}, [x8], x2
- ld1 {v21.8B}, [x8], x2
- urhadd v16.8B, v16.8B, v20.8B
- urhadd v17.8B, v17.8B, v21.8B
+ ld1 {v20.8b}, [x8], x2
+ ld1 {v21.8b}, [x8], x2
+ urhadd v16.8b, v16.8b, v20.8b
+ urhadd v17.8b, v17.8b, v21.8b
.endif
- st1 {v16.8B}, [x0], x2
- st1 {v17.8B}, [x0], x2
+ st1 {v16.8b}, [x0], x2
+ st1 {v17.8b}, [x0], x2
b.gt 1b
ret
2: adds w12, w12, w6
- dup v0.8B, w4
+ dup v0.8b, w4
b.eq 5f
tst w6, w6
- dup v1.8B, w12
+ dup v1.8b, w12
b.eq 4f
- ld1 {v4.8B}, [x1], x2
-3: ld1 {v6.8B}, [x1], x2
- umull v16.8H, v4.8B, v0.8B
- umlal v16.8H, v6.8B, v1.8B
- ld1 {v4.8B}, [x1], x2
- umull v17.8H, v6.8B, v0.8B
- umlal v17.8H, v4.8B, v1.8B
+ ld1 {v4.8b}, [x1], x2
+3: ld1 {v6.8b}, [x1], x2
+ umull v16.8h, v4.8b, v0.8b
+ umlal v16.8h, v6.8b, v1.8b
+ ld1 {v4.8b}, [x1], x2
+ umull v17.8h, v6.8b, v0.8b
+ umlal v17.8h, v4.8b, v1.8b
prfm pldl1strm, [x1]
.ifc \codec,h264
- rshrn v16.8B, v16.8H, #6
- rshrn v17.8B, v17.8H, #6
+ rshrn v16.8b, v16.8h, #6
+ rshrn v17.8b, v17.8h, #6
.else
- add v16.8H, v16.8H, v22.8H
- add v17.8H, v17.8H, v22.8H
- shrn v16.8B, v16.8H, #6
- shrn v17.8B, v17.8H, #6
+ add v16.8h, v16.8h, v22.8h
+ add v17.8h, v17.8h, v22.8h
+ shrn v16.8b, v16.8h, #6
+ shrn v17.8b, v17.8h, #6
.endif
prfm pldl1strm, [x1, x2]
.ifc \type,avg
- ld1 {v20.8B}, [x8], x2
- ld1 {v21.8B}, [x8], x2
- urhadd v16.8B, v16.8B, v20.8B
- urhadd v17.8B, v17.8B, v21.8B
+ ld1 {v20.8b}, [x8], x2
+ ld1 {v21.8b}, [x8], x2
+ urhadd v16.8b, v16.8b, v20.8b
+ urhadd v17.8b, v17.8b, v21.8b
.endif
subs w3, w3, #2
- st1 {v16.8B}, [x0], x2
- st1 {v17.8B}, [x0], x2
+ st1 {v16.8b}, [x0], x2
+ st1 {v17.8b}, [x0], x2
b.gt 3b
ret
-4: ld1 {v4.8B, v5.8B}, [x1], x2
- ld1 {v6.8B, v7.8B}, [x1], x2
- ext v5.8B, v4.8B, v5.8B, #1
- ext v7.8B, v6.8B, v7.8B, #1
+4: ld1 {v4.8b, v5.8b}, [x1], x2
+ ld1 {v6.8b, v7.8b}, [x1], x2
+ ext v5.8b, v4.8b, v5.8b, #1
+ ext v7.8b, v6.8b, v7.8b, #1
prfm pldl1strm, [x1]
subs w3, w3, #2
- umull v16.8H, v4.8B, v0.8B
- umlal v16.8H, v5.8B, v1.8B
- umull v17.8H, v6.8B, v0.8B
- umlal v17.8H, v7.8B, v1.8B
+ umull v16.8h, v4.8b, v0.8b
+ umlal v16.8h, v5.8b, v1.8b
+ umull v17.8h, v6.8b, v0.8b
+ umlal v17.8h, v7.8b, v1.8b
prfm pldl1strm, [x1, x2]
.ifc \codec,h264
- rshrn v16.8B, v16.8H, #6
- rshrn v17.8B, v17.8H, #6
+ rshrn v16.8b, v16.8h, #6
+ rshrn v17.8b, v17.8h, #6
.else
- add v16.8H, v16.8H, v22.8H
- add v17.8H, v17.8H, v22.8H
- shrn v16.8B, v16.8H, #6
- shrn v17.8B, v17.8H, #6
+ add v16.8h, v16.8h, v22.8h
+ add v17.8h, v17.8h, v22.8h
+ shrn v16.8b, v16.8h, #6
+ shrn v17.8b, v17.8h, #6
.endif
.ifc \type,avg
- ld1 {v20.8B}, [x8], x2
- ld1 {v21.8B}, [x8], x2
- urhadd v16.8B, v16.8B, v20.8B
- urhadd v17.8B, v17.8B, v21.8B
+ ld1 {v20.8b}, [x8], x2
+ ld1 {v21.8b}, [x8], x2
+ urhadd v16.8b, v16.8b, v20.8b
+ urhadd v17.8b, v17.8b, v21.8b
.endif
- st1 {v16.8B}, [x0], x2
- st1 {v17.8B}, [x0], x2
+ st1 {v16.8b}, [x0], x2
+ st1 {v17.8b}, [x0], x2
b.gt 4b
ret
-5: ld1 {v4.8B}, [x1], x2
- ld1 {v5.8B}, [x1], x2
+5: ld1 {v4.8b}, [x1], x2
+ ld1 {v5.8b}, [x1], x2
prfm pldl1strm, [x1]
subs w3, w3, #2
- umull v16.8H, v4.8B, v0.8B
- umull v17.8H, v5.8B, v0.8B
+ umull v16.8h, v4.8b, v0.8b
+ umull v17.8h, v5.8b, v0.8b
prfm pldl1strm, [x1, x2]
.ifc \codec,h264
- rshrn v16.8B, v16.8H, #6
- rshrn v17.8B, v17.8H, #6
+ rshrn v16.8b, v16.8h, #6
+ rshrn v17.8b, v17.8h, #6
.else
- add v16.8H, v16.8H, v22.8H
- add v17.8H, v17.8H, v22.8H
- shrn v16.8B, v16.8H, #6
- shrn v17.8B, v17.8H, #6
+ add v16.8h, v16.8h, v22.8h
+ add v17.8h, v17.8h, v22.8h
+ shrn v16.8b, v16.8h, #6
+ shrn v17.8b, v17.8h, #6
.endif
.ifc \type,avg
- ld1 {v20.8B}, [x8], x2
- ld1 {v21.8B}, [x8], x2
- urhadd v16.8B, v16.8B, v20.8B
- urhadd v17.8B, v17.8B, v21.8B
+ ld1 {v20.8b}, [x8], x2
+ ld1 {v21.8b}, [x8], x2
+ urhadd v16.8b, v16.8b, v20.8b
+ urhadd v17.8b, v17.8b, v21.8b
.endif
- st1 {v16.8B}, [x0], x2
- st1 {v17.8B}, [x0], x2
+ st1 {v16.8b}, [x0], x2
+ st1 {v17.8b}, [x0], x2
b.gt 5b
ret
endfunc
@@ -206,11 +206,11 @@ function ff_\type\()_\codec\()_chroma_mc4_neon, export=1
lsl w9, w9, #3
lsl w10, w10, #1
add w9, w9, w10
- add x6, x6, w9, UXTW
- ld1r {v22.8H}, [x6]
+ add x6, x6, w9, uxtw
+ ld1r {v22.8h}, [x6]
.endif
.ifc \codec,vc1
- movi v22.8H, #28
+ movi v22.8h, #28
.endif
mul w7, w4, w5
lsl w14, w5, #3
@@ -223,133 +223,133 @@ function ff_\type\()_\codec\()_chroma_mc4_neon, export=1
add w4, w4, #64
b.eq 2f
- dup v24.8B, w4
- dup v25.8B, w12
- ld1 {v4.8B}, [x1], x2
- dup v26.8B, w6
- dup v27.8B, w7
- ext v5.8B, v4.8B, v5.8B, #1
- trn1 v0.2S, v24.2S, v25.2S
- trn1 v2.2S, v26.2S, v27.2S
- trn1 v4.2S, v4.2S, v5.2S
-1: ld1 {v6.8B}, [x1], x2
- ext v7.8B, v6.8B, v7.8B, #1
- trn1 v6.2S, v6.2S, v7.2S
- umull v18.8H, v4.8B, v0.8B
- umlal v18.8H, v6.8B, v2.8B
- ld1 {v4.8B}, [x1], x2
- ext v5.8B, v4.8B, v5.8B, #1
- trn1 v4.2S, v4.2S, v5.2S
+ dup v24.8b, w4
+ dup v25.8b, w12
+ ld1 {v4.8b}, [x1], x2
+ dup v26.8b, w6
+ dup v27.8b, w7
+ ext v5.8b, v4.8b, v5.8b, #1
+ trn1 v0.2s, v24.2s, v25.2s
+ trn1 v2.2s, v26.2s, v27.2s
+ trn1 v4.2s, v4.2s, v5.2s
+1: ld1 {v6.8b}, [x1], x2
+ ext v7.8b, v6.8b, v7.8b, #1
+ trn1 v6.2s, v6.2s, v7.2s
+ umull v18.8h, v4.8b, v0.8b
+ umlal v18.8h, v6.8b, v2.8b
+ ld1 {v4.8b}, [x1], x2
+ ext v5.8b, v4.8b, v5.8b, #1
+ trn1 v4.2s, v4.2s, v5.2s
prfm pldl1strm, [x1]
- umull v19.8H, v6.8B, v0.8B
- umlal v19.8H, v4.8B, v2.8B
- trn1 v30.2D, v18.2D, v19.2D
- trn2 v31.2D, v18.2D, v19.2D
- add v18.8H, v30.8H, v31.8H
+ umull v19.8h, v6.8b, v0.8b
+ umlal v19.8h, v4.8b, v2.8b
+ trn1 v30.2d, v18.2d, v19.2d
+ trn2 v31.2d, v18.2d, v19.2d
+ add v18.8h, v30.8h, v31.8h
.ifc \codec,h264
- rshrn v16.8B, v18.8H, #6
+ rshrn v16.8b, v18.8h, #6
.else
- add v18.8H, v18.8H, v22.8H
- shrn v16.8B, v18.8H, #6
+ add v18.8h, v18.8h, v22.8h
+ shrn v16.8b, v18.8h, #6
.endif
subs w3, w3, #2
prfm pldl1strm, [x1, x2]
.ifc \type,avg
- ld1 {v20.S}[0], [x8], x2
- ld1 {v20.S}[1], [x8], x2
- urhadd v16.8B, v16.8B, v20.8B
+ ld1 {v20.s}[0], [x8], x2
+ ld1 {v20.s}[1], [x8], x2
+ urhadd v16.8b, v16.8b, v20.8b
.endif
- st1 {v16.S}[0], [x0], x2
- st1 {v16.S}[1], [x0], x2
+ st1 {v16.s}[0], [x0], x2
+ st1 {v16.s}[1], [x0], x2
b.gt 1b
ret
2: adds w12, w12, w6
- dup v30.8B, w4
+ dup v30.8b, w4
b.eq 5f
tst w6, w6
- dup v31.8B, w12
- trn1 v0.2S, v30.2S, v31.2S
- trn2 v1.2S, v30.2S, v31.2S
+ dup v31.8b, w12
+ trn1 v0.2s, v30.2s, v31.2s
+ trn2 v1.2s, v30.2s, v31.2s
b.eq 4f
- ext v1.8B, v0.8B, v1.8B, #4
- ld1 {v4.S}[0], [x1], x2
-3: ld1 {v4.S}[1], [x1], x2
- umull v18.8H, v4.8B, v0.8B
- ld1 {v4.S}[0], [x1], x2
- umull v19.8H, v4.8B, v1.8B
- trn1 v30.2D, v18.2D, v19.2D
- trn2 v31.2D, v18.2D, v19.2D
- add v18.8H, v30.8H, v31.8H
+ ext v1.8b, v0.8b, v1.8b, #4
+ ld1 {v4.s}[0], [x1], x2
+3: ld1 {v4.s}[1], [x1], x2
+ umull v18.8h, v4.8b, v0.8b
+ ld1 {v4.s}[0], [x1], x2
+ umull v19.8h, v4.8b, v1.8b
+ trn1 v30.2d, v18.2d, v19.2d
+ trn2 v31.2d, v18.2d, v19.2d
+ add v18.8h, v30.8h, v31.8h
prfm pldl1strm, [x1]
.ifc \codec,h264
- rshrn v16.8B, v18.8H, #6
+ rshrn v16.8b, v18.8h, #6
.else
- add v18.8H, v18.8H, v22.8H
- shrn v16.8B, v18.8H, #6
+ add v18.8h, v18.8h, v22.8h
+ shrn v16.8b, v18.8h, #6
.endif
.ifc \type,avg
- ld1 {v20.S}[0], [x8], x2
- ld1 {v20.S}[1], [x8], x2
- urhadd v16.8B, v16.8B, v20.8B
+ ld1 {v20.s}[0], [x8], x2
+ ld1 {v20.s}[1], [x8], x2
+ urhadd v16.8b, v16.8b, v20.8b
.endif
subs w3, w3, #2
prfm pldl1strm, [x1, x2]
- st1 {v16.S}[0], [x0], x2
- st1 {v16.S}[1], [x0], x2
+ st1 {v16.s}[0], [x0], x2
+ st1 {v16.s}[1], [x0], x2
b.gt 3b
ret
-4: ld1 {v4.8B}, [x1], x2
- ld1 {v6.8B}, [x1], x2
- ext v5.8B, v4.8B, v5.8B, #1
- ext v7.8B, v6.8B, v7.8B, #1
- trn1 v4.2S, v4.2S, v5.2S
- trn1 v6.2S, v6.2S, v7.2S
- umull v18.8H, v4.8B, v0.8B
- umull v19.8H, v6.8B, v0.8B
+4: ld1 {v4.8b}, [x1], x2
+ ld1 {v6.8b}, [x1], x2
+ ext v5.8b, v4.8b, v5.8b, #1
+ ext v7.8b, v6.8b, v7.8b, #1
+ trn1 v4.2s, v4.2s, v5.2s
+ trn1 v6.2s, v6.2s, v7.2s
+ umull v18.8h, v4.8b, v0.8b
+ umull v19.8h, v6.8b, v0.8b
subs w3, w3, #2
- trn1 v30.2D, v18.2D, v19.2D
- trn2 v31.2D, v18.2D, v19.2D
- add v18.8H, v30.8H, v31.8H
+ trn1 v30.2d, v18.2d, v19.2d
+ trn2 v31.2d, v18.2d, v19.2d
+ add v18.8h, v30.8h, v31.8h
prfm pldl1strm, [x1]
.ifc \codec,h264
- rshrn v16.8B, v18.8H, #6
+ rshrn v16.8b, v18.8h, #6
.else
- add v18.8H, v18.8H, v22.8H
- shrn v16.8B, v18.8H, #6
+ add v18.8h, v18.8h, v22.8h
+ shrn v16.8b, v18.8h, #6
.endif
.ifc \type,avg
- ld1 {v20.S}[0], [x8], x2
- ld1 {v20.S}[1], [x8], x2
- urhadd v16.8B, v16.8B, v20.8B
+ ld1 {v20.s}[0], [x8], x2
+ ld1 {v20.s}[1], [x8], x2
+ urhadd v16.8b, v16.8b, v20.8b
.endif
prfm pldl1strm, [x1]
- st1 {v16.S}[0], [x0], x2
- st1 {v16.S}[1], [x0], x2
+ st1 {v16.s}[0], [x0], x2
+ st1 {v16.s}[1], [x0], x2
b.gt 4b
ret
-5: ld1 {v4.S}[0], [x1], x2
- ld1 {v4.S}[1], [x1], x2
- umull v18.8H, v4.8B, v30.8B
+5: ld1 {v4.s}[0], [x1], x2
+ ld1 {v4.s}[1], [x1], x2
+ umull v18.8h, v4.8b, v30.8b
subs w3, w3, #2
prfm pldl1strm, [x1]
.ifc \codec,h264
- rshrn v16.8B, v18.8H, #6
+ rshrn v16.8b, v18.8h, #6
.else
- add v18.8H, v18.8H, v22.8H
- shrn v16.8B, v18.8H, #6
+ add v18.8h, v18.8h, v22.8h
+ shrn v16.8b, v18.8h, #6
.endif
.ifc \type,avg
- ld1 {v20.S}[0], [x8], x2
- ld1 {v20.S}[1], [x8], x2
- urhadd v16.8B, v16.8B, v20.8B
+ ld1 {v20.s}[0], [x8], x2
+ ld1 {v20.s}[1], [x8], x2
+ urhadd v16.8b, v16.8b, v20.8b
.endif
prfm pldl1strm, [x1]
- st1 {v16.S}[0], [x0], x2
- st1 {v16.S}[1], [x0], x2
+ st1 {v16.s}[0], [x0], x2
+ st1 {v16.s}[1], [x0], x2
b.gt 5b
ret
endfunc
@@ -370,51 +370,51 @@ function ff_\type\()_h264_chroma_mc2_neon, export=1
sub w4, w7, w13
sub w4, w4, w14
add w4, w4, #64
- dup v0.8B, w4
- dup v2.8B, w12
- dup v1.8B, w6
- dup v3.8B, w7
- trn1 v0.4H, v0.4H, v2.4H
- trn1 v1.4H, v1.4H, v3.4H
+ dup v0.8b, w4
+ dup v2.8b, w12
+ dup v1.8b, w6
+ dup v3.8b, w7
+ trn1 v0.4h, v0.4h, v2.4h
+ trn1 v1.4h, v1.4h, v3.4h
1:
- ld1 {v4.S}[0], [x1], x2
- ld1 {v4.S}[1], [x1], x2
- rev64 v5.2S, v4.2S
- ld1 {v5.S}[1], [x1]
- ext v6.8B, v4.8B, v5.8B, #1
- ext v7.8B, v5.8B, v4.8B, #1
- trn1 v4.4H, v4.4H, v6.4H
- trn1 v5.4H, v5.4H, v7.4H
- umull v16.8H, v4.8B, v0.8B
- umlal v16.8H, v5.8B, v1.8B
+ ld1 {v4.s}[0], [x1], x2
+ ld1 {v4.s}[1], [x1], x2
+ rev64 v5.2s, v4.2s
+ ld1 {v5.s}[1], [x1]
+ ext v6.8b, v4.8b, v5.8b, #1
+ ext v7.8b, v5.8b, v4.8b, #1
+ trn1 v4.4h, v4.4h, v6.4h
+ trn1 v5.4h, v5.4h, v7.4h
+ umull v16.8h, v4.8b, v0.8b
+ umlal v16.8h, v5.8b, v1.8b
.ifc \type,avg
- ld1 {v18.H}[0], [x0], x2
- ld1 {v18.H}[2], [x0]
+ ld1 {v18.h}[0], [x0], x2
+ ld1 {v18.h}[2], [x0]
sub x0, x0, x2
.endif
- rev64 v17.4S, v16.4S
- add v16.8H, v16.8H, v17.8H
- rshrn v16.8B, v16.8H, #6
+ rev64 v17.4s, v16.4s
+ add v16.8h, v16.8h, v17.8h
+ rshrn v16.8b, v16.8h, #6
.ifc \type,avg
- urhadd v16.8B, v16.8B, v18.8B
+ urhadd v16.8b, v16.8b, v18.8b
.endif
- st1 {v16.H}[0], [x0], x2
- st1 {v16.H}[2], [x0], x2
+ st1 {v16.h}[0], [x0], x2
+ st1 {v16.h}[2], [x0], x2
subs w3, w3, #2
b.gt 1b
ret
2:
- ld1 {v16.H}[0], [x1], x2
- ld1 {v16.H}[1], [x1], x2
+ ld1 {v16.h}[0], [x1], x2
+ ld1 {v16.h}[1], [x1], x2
.ifc \type,avg
- ld1 {v18.H}[0], [x0], x2
- ld1 {v18.H}[1], [x0]
+ ld1 {v18.h}[0], [x0], x2
+ ld1 {v18.h}[1], [x0]
sub x0, x0, x2
- urhadd v16.8B, v16.8B, v18.8B
+ urhadd v16.8b, v16.8b, v18.8b
.endif
- st1 {v16.H}[0], [x0], x2
- st1 {v16.H}[1], [x0], x2
+ st1 {v16.h}[0], [x0], x2
+ st1 {v16.h}[1], [x0], x2
subs w3, w3, #2
b.gt 2b
ret
diff --git a/libavcodec/aarch64/h264dsp_neon.S b/libavcodec/aarch64/h264dsp_neon.S
index fbb8ecc463..28bb31dbfb 100644
--- a/libavcodec/aarch64/h264dsp_neon.S
+++ b/libavcodec/aarch64/h264dsp_neon.S
@@ -27,7 +27,7 @@
cmp w2, #0
ldr w6, [x4]
ccmp w3, #0, #0, ne
- mov v24.S[0], w6
+ mov v24.s[0], w6
and w8, w6, w6, lsl #16
b.eq 1f
ands w8, w8, w8, lsl #8
@@ -38,96 +38,96 @@
.endm
.macro h264_loop_filter_luma
- dup v22.16B, w2 // alpha
- uxtl v24.8H, v24.8B
- uabd v21.16B, v16.16B, v0.16B // abs(p0 - q0)
- uxtl v24.4S, v24.4H
- uabd v28.16B, v18.16B, v16.16B // abs(p1 - p0)
- sli v24.8H, v24.8H, #8
- uabd v30.16B, v2.16B, v0.16B // abs(q1 - q0)
- sli v24.4S, v24.4S, #16
- cmhi v21.16B, v22.16B, v21.16B // < alpha
- dup v22.16B, w3 // beta
- cmlt v23.16B, v24.16B, #0
- cmhi v28.16B, v22.16B, v28.16B // < beta
- cmhi v30.16B, v22.16B, v30.16B // < beta
- bic v21.16B, v21.16B, v23.16B
- uabd v17.16B, v20.16B, v16.16B // abs(p2 - p0)
- and v21.16B, v21.16B, v28.16B
- uabd v19.16B, v4.16B, v0.16B // abs(q2 - q0)
- and v21.16B, v21.16B, v30.16B // < beta
+ dup v22.16b, w2 // alpha
+ uxtl v24.8h, v24.8b
+ uabd v21.16b, v16.16b, v0.16b // abs(p0 - q0)
+ uxtl v24.4s, v24.4h
+ uabd v28.16b, v18.16b, v16.16b // abs(p1 - p0)
+ sli v24.8h, v24.8h, #8
+ uabd v30.16b, v2.16b, v0.16b // abs(q1 - q0)
+ sli v24.4s, v24.4s, #16
+ cmhi v21.16b, v22.16b, v21.16b // < alpha
+ dup v22.16b, w3 // beta
+ cmlt v23.16b, v24.16b, #0
+ cmhi v28.16b, v22.16b, v28.16b // < beta
+ cmhi v30.16b, v22.16b, v30.16b // < beta
+ bic v21.16b, v21.16b, v23.16b
+ uabd v17.16b, v20.16b, v16.16b // abs(p2 - p0)
+ and v21.16b, v21.16b, v28.16b
+ uabd v19.16b, v4.16b, v0.16b // abs(q2 - q0)
+ and v21.16b, v21.16b, v30.16b // < beta
shrn v30.8b, v21.8h, #4
mov x7, v30.d[0]
- cmhi v17.16B, v22.16B, v17.16B // < beta
- cmhi v19.16B, v22.16B, v19.16B // < beta
+ cmhi v17.16b, v22.16b, v17.16b // < beta
+ cmhi v19.16b, v22.16b, v19.16b // < beta
cbz x7, 9f
- and v17.16B, v17.16B, v21.16B
- and v19.16B, v19.16B, v21.16B
- and v24.16B, v24.16B, v21.16B
- urhadd v28.16B, v16.16B, v0.16B
- sub v21.16B, v24.16B, v17.16B
- uqadd v23.16B, v18.16B, v24.16B
- uhadd v20.16B, v20.16B, v28.16B
- sub v21.16B, v21.16B, v19.16B
- uhadd v28.16B, v4.16B, v28.16B
- umin v23.16B, v23.16B, v20.16B
- uqsub v22.16B, v18.16B, v24.16B
- uqadd v4.16B, v2.16B, v24.16B
- umax v23.16B, v23.16B, v22.16B
- uqsub v22.16B, v2.16B, v24.16B
- umin v28.16B, v4.16B, v28.16B
- uxtl v4.8H, v0.8B
- umax v28.16B, v28.16B, v22.16B
- uxtl2 v20.8H, v0.16B
- usubw v4.8H, v4.8H, v16.8B
- usubw2 v20.8H, v20.8H, v16.16B
- shl v4.8H, v4.8H, #2
- shl v20.8H, v20.8H, #2
- uaddw v4.8H, v4.8H, v18.8B
- uaddw2 v20.8H, v20.8H, v18.16B
- usubw v4.8H, v4.8H, v2.8B
- usubw2 v20.8H, v20.8H, v2.16B
- rshrn v4.8B, v4.8H, #3
- rshrn2 v4.16B, v20.8H, #3
- bsl v17.16B, v23.16B, v18.16B
- bsl v19.16B, v28.16B, v2.16B
- neg v23.16B, v21.16B
- uxtl v28.8H, v16.8B
- smin v4.16B, v4.16B, v21.16B
- uxtl2 v21.8H, v16.16B
- smax v4.16B, v4.16B, v23.16B
- uxtl v22.8H, v0.8B
- uxtl2 v24.8H, v0.16B
- saddw v28.8H, v28.8H, v4.8B
- saddw2 v21.8H, v21.8H, v4.16B
- ssubw v22.8H, v22.8H, v4.8B
- ssubw2 v24.8H, v24.8H, v4.16B
- sqxtun v16.8B, v28.8H
- sqxtun2 v16.16B, v21.8H
- sqxtun v0.8B, v22.8H
- sqxtun2 v0.16B, v24.8H
+ and v17.16b, v17.16b, v21.16b
+ and v19.16b, v19.16b, v21.16b
+ and v24.16b, v24.16b, v21.16b
+ urhadd v28.16b, v16.16b, v0.16b
+ sub v21.16b, v24.16b, v17.16b
+ uqadd v23.16b, v18.16b, v24.16b
+ uhadd v20.16b, v20.16b, v28.16b
+ sub v21.16b, v21.16b, v19.16b
+ uhadd v28.16b, v4.16b, v28.16b
+ umin v23.16b, v23.16b, v20.16b
+ uqsub v22.16b, v18.16b, v24.16b
+ uqadd v4.16b, v2.16b, v24.16b
+ umax v23.16b, v23.16b, v22.16b
+ uqsub v22.16b, v2.16b, v24.16b
+ umin v28.16b, v4.16b, v28.16b
+ uxtl v4.8h, v0.8b
+ umax v28.16b, v28.16b, v22.16b
+ uxtl2 v20.8h, v0.16b
+ usubw v4.8h, v4.8h, v16.8b
+ usubw2 v20.8h, v20.8h, v16.16b
+ shl v4.8h, v4.8h, #2
+ shl v20.8h, v20.8h, #2
+ uaddw v4.8h, v4.8h, v18.8b
+ uaddw2 v20.8h, v20.8h, v18.16b
+ usubw v4.8h, v4.8h, v2.8b
+ usubw2 v20.8h, v20.8h, v2.16b
+ rshrn v4.8b, v4.8h, #3
+ rshrn2 v4.16b, v20.8h, #3
+ bsl v17.16b, v23.16b, v18.16b
+ bsl v19.16b, v28.16b, v2.16b
+ neg v23.16b, v21.16b
+ uxtl v28.8h, v16.8b
+ smin v4.16b, v4.16b, v21.16b
+ uxtl2 v21.8h, v16.16b
+ smax v4.16b, v4.16b, v23.16b
+ uxtl v22.8h, v0.8b
+ uxtl2 v24.8h, v0.16b
+ saddw v28.8h, v28.8h, v4.8b
+ saddw2 v21.8h, v21.8h, v4.16b
+ ssubw v22.8h, v22.8h, v4.8b
+ ssubw2 v24.8h, v24.8h, v4.16b
+ sqxtun v16.8b, v28.8h
+ sqxtun2 v16.16b, v21.8h
+ sqxtun v0.8b, v22.8h
+ sqxtun2 v0.16b, v24.8h
.endm
function ff_h264_v_loop_filter_luma_neon, export=1
h264_loop_filter_start
sxtw x1, w1
- ld1 {v0.16B}, [x0], x1
- ld1 {v2.16B}, [x0], x1
- ld1 {v4.16B}, [x0], x1
+ ld1 {v0.16b}, [x0], x1
+ ld1 {v2.16b}, [x0], x1
+ ld1 {v4.16b}, [x0], x1
sub x0, x0, x1, lsl #2
sub x0, x0, x1, lsl #1
- ld1 {v20.16B}, [x0], x1
- ld1 {v18.16B}, [x0], x1
- ld1 {v16.16B}, [x0], x1
+ ld1 {v20.16b}, [x0], x1
+ ld1 {v18.16b}, [x0], x1
+ ld1 {v16.16b}, [x0], x1
h264_loop_filter_luma
sub x0, x0, x1, lsl #1
- st1 {v17.16B}, [x0], x1
- st1 {v16.16B}, [x0], x1
- st1 {v0.16B}, [x0], x1
- st1 {v19.16B}, [x0]
+ st1 {v17.16b}, [x0], x1
+ st1 {v16.16b}, [x0], x1
+ st1 {v0.16b}, [x0], x1
+ st1 {v19.16b}, [x0]
9:
ret
endfunc
@@ -137,22 +137,22 @@ function ff_h264_h_loop_filter_luma_neon, export=1
sxtw x1, w1
sub x0, x0, #4
- ld1 {v6.8B}, [x0], x1
- ld1 {v20.8B}, [x0], x1
- ld1 {v18.8B}, [x0], x1
- ld1 {v16.8B}, [x0], x1
- ld1 {v0.8B}, [x0], x1
- ld1 {v2.8B}, [x0], x1
- ld1 {v4.8B}, [x0], x1
- ld1 {v26.8B}, [x0], x1
- ld1 {v6.D}[1], [x0], x1
- ld1 {v20.D}[1], [x0], x1
- ld1 {v18.D}[1], [x0], x1
- ld1 {v16.D}[1], [x0], x1
- ld1 {v0.D}[1], [x0], x1
- ld1 {v2.D}[1], [x0], x1
- ld1 {v4.D}[1], [x0], x1
- ld1 {v26.D}[1], [x0], x1
+ ld1 {v6.8b}, [x0], x1
+ ld1 {v20.8b}, [x0], x1
+ ld1 {v18.8b}, [x0], x1
+ ld1 {v16.8b}, [x0], x1
+ ld1 {v0.8b}, [x0], x1
+ ld1 {v2.8b}, [x0], x1
+ ld1 {v4.8b}, [x0], x1
+ ld1 {v26.8b}, [x0], x1
+ ld1 {v6.d}[1], [x0], x1
+ ld1 {v20.d}[1], [x0], x1
+ ld1 {v18.d}[1], [x0], x1
+ ld1 {v16.d}[1], [x0], x1
+ ld1 {v0.d}[1], [x0], x1
+ ld1 {v2.d}[1], [x0], x1
+ ld1 {v4.d}[1], [x0], x1
+ ld1 {v26.d}[1], [x0], x1
transpose_8x16B v6, v20, v18, v16, v0, v2, v4, v26, v21, v23
@@ -162,254 +162,254 @@ function ff_h264_h_loop_filter_luma_neon, export=1
sub x0, x0, x1, lsl #4
add x0, x0, #2
- st1 {v17.S}[0], [x0], x1
- st1 {v16.S}[0], [x0], x1
- st1 {v0.S}[0], [x0], x1
- st1 {v19.S}[0], [x0], x1
- st1 {v17.S}[1], [x0], x1
- st1 {v16.S}[1], [x0], x1
- st1 {v0.S}[1], [x0], x1
- st1 {v19.S}[1], [x0], x1
- st1 {v17.S}[2], [x0], x1
- st1 {v16.S}[2], [x0], x1
- st1 {v0.S}[2], [x0], x1
- st1 {v19.S}[2], [x0], x1
- st1 {v17.S}[3], [x0], x1
- st1 {v16.S}[3], [x0], x1
- st1 {v0.S}[3], [x0], x1
- st1 {v19.S}[3], [x0], x1
+ st1 {v17.s}[0], [x0], x1
+ st1 {v16.s}[0], [x0], x1
+ st1 {v0.s}[0], [x0], x1
+ st1 {v19.s}[0], [x0], x1
+ st1 {v17.s}[1], [x0], x1
+ st1 {v16.s}[1], [x0], x1
+ st1 {v0.s}[1], [x0], x1
+ st1 {v19.s}[1], [x0], x1
+ st1 {v17.s}[2], [x0], x1
+ st1 {v16.s}[2], [x0], x1
+ st1 {v0.s}[2], [x0], x1
+ st1 {v19.s}[2], [x0], x1
+ st1 {v17.s}[3], [x0], x1
+ st1 {v16.s}[3], [x0], x1
+ st1 {v0.s}[3], [x0], x1
+ st1 {v19.s}[3], [x0], x1
9:
ret
endfunc
.macro h264_loop_filter_start_intra
- orr w4, w2, w3
- cbnz w4, 1f
- ret
+ orr w4, w2, w3
+ cbnz w4, 1f
+ ret
1:
- sxtw x1, w1
- dup v30.16b, w2 // alpha
- dup v31.16b, w3 // beta
+ sxtw x1, w1
+ dup v30.16b, w2 // alpha
+ dup v31.16b, w3 // beta
.endm
.macro h264_loop_filter_luma_intra
- uabd v16.16b, v7.16b, v0.16b // abs(p0 - q0)
- uabd v17.16b, v6.16b, v7.16b // abs(p1 - p0)
- uabd v18.16b, v1.16b, v0.16b // abs(q1 - q0)
- cmhi v19.16b, v30.16b, v16.16b // < alpha
- cmhi v17.16b, v31.16b, v17.16b // < beta
- cmhi v18.16b, v31.16b, v18.16b // < beta
+ uabd v16.16b, v7.16b, v0.16b // abs(p0 - q0)
+ uabd v17.16b, v6.16b, v7.16b // abs(p1 - p0)
+ uabd v18.16b, v1.16b, v0.16b // abs(q1 - q0)
+ cmhi v19.16b, v30.16b, v16.16b // < alpha
+ cmhi v17.16b, v31.16b, v17.16b // < beta
+ cmhi v18.16b, v31.16b, v18.16b // < beta
- movi v29.16b, #2
- ushr v30.16b, v30.16b, #2 // alpha >> 2
- add v30.16b, v30.16b, v29.16b // (alpha >> 2) + 2
- cmhi v16.16b, v30.16b, v16.16b // < (alpha >> 2) + 2
+ movi v29.16b, #2
+ ushr v30.16b, v30.16b, #2 // alpha >> 2
+ add v30.16b, v30.16b, v29.16b // (alpha >> 2) + 2
+ cmhi v16.16b, v30.16b, v16.16b // < (alpha >> 2) + 2
- and v19.16b, v19.16b, v17.16b
- and v19.16b, v19.16b, v18.16b
- shrn v20.8b, v19.8h, #4
- mov x4, v20.d[0]
- cbz x4, 9f
+ and v19.16b, v19.16b, v17.16b
+ and v19.16b, v19.16b, v18.16b
+ shrn v20.8b, v19.8h, #4
+ mov x4, v20.d[0]
+ cbz x4, 9f
- ushll v20.8h, v6.8b, #1
- ushll v22.8h, v1.8b, #1
- ushll2 v21.8h, v6.16b, #1
- ushll2 v23.8h, v1.16b, #1
- uaddw v20.8h, v20.8h, v7.8b
- uaddw v22.8h, v22.8h, v0.8b
- uaddw2 v21.8h, v21.8h, v7.16b
- uaddw2 v23.8h, v23.8h, v0.16b
- uaddw v20.8h, v20.8h, v1.8b
- uaddw v22.8h, v22.8h, v6.8b
- uaddw2 v21.8h, v21.8h, v1.16b
- uaddw2 v23.8h, v23.8h, v6.16b
+ ushll v20.8h, v6.8b, #1
+ ushll v22.8h, v1.8b, #1
+ ushll2 v21.8h, v6.16b, #1
+ ushll2 v23.8h, v1.16b, #1
+ uaddw v20.8h, v20.8h, v7.8b
+ uaddw v22.8h, v22.8h, v0.8b
+ uaddw2 v21.8h, v21.8h, v7.16b
+ uaddw2 v23.8h, v23.8h, v0.16b
+ uaddw v20.8h, v20.8h, v1.8b
+ uaddw v22.8h, v22.8h, v6.8b
+ uaddw2 v21.8h, v21.8h, v1.16b
+ uaddw2 v23.8h, v23.8h, v6.16b
- rshrn v24.8b, v20.8h, #2 // p0'_1
- rshrn v25.8b, v22.8h, #2 // q0'_1
- rshrn2 v24.16b, v21.8h, #2 // p0'_1
- rshrn2 v25.16b, v23.8h, #2 // q0'_1
+ rshrn v24.8b, v20.8h, #2 // p0'_1
+ rshrn v25.8b, v22.8h, #2 // q0'_1
+ rshrn2 v24.16b, v21.8h, #2 // p0'_1
+ rshrn2 v25.16b, v23.8h, #2 // q0'_1
- uabd v17.16b, v5.16b, v7.16b // abs(p2 - p0)
- uabd v18.16b, v2.16b, v0.16b // abs(q2 - q0)
- cmhi v17.16b, v31.16b, v17.16b // < beta
- cmhi v18.16b, v31.16b, v18.16b // < beta
+ uabd v17.16b, v5.16b, v7.16b // abs(p2 - p0)
+ uabd v18.16b, v2.16b, v0.16b // abs(q2 - q0)
+ cmhi v17.16b, v31.16b, v17.16b // < beta
+ cmhi v18.16b, v31.16b, v18.16b // < beta
- and v17.16b, v16.16b, v17.16b // if_2 && if_3
- and v18.16b, v16.16b, v18.16b // if_2 && if_4
+ and v17.16b, v16.16b, v17.16b // if_2 && if_3
+ and v18.16b, v16.16b, v18.16b // if_2 && if_4
- not v30.16b, v17.16b
- not v31.16b, v18.16b
+ not v30.16b, v17.16b
+ not v31.16b, v18.16b
- and v30.16b, v30.16b, v19.16b // if_1 && !(if_2 && if_3)
- and v31.16b, v31.16b, v19.16b // if_1 && !(if_2 && if_4)
+ and v30.16b, v30.16b, v19.16b // if_1 && !(if_2 && if_3)
+ and v31.16b, v31.16b, v19.16b // if_1 && !(if_2 && if_4)
- and v17.16b, v19.16b, v17.16b // if_1 && if_2 && if_3
- and v18.16b, v19.16b, v18.16b // if_1 && if_2 && if_4
+ and v17.16b, v19.16b, v17.16b // if_1 && if_2 && if_3
+ and v18.16b, v19.16b, v18.16b // if_1 && if_2 && if_4
//calc p, v7, v6, v5, v4, v17, v7, v6, v5, v4
- uaddl v26.8h, v5.8b, v7.8b
- uaddl2 v27.8h, v5.16b, v7.16b
- uaddw v26.8h, v26.8h, v0.8b
- uaddw2 v27.8h, v27.8h, v0.16b
- add v20.8h, v20.8h, v26.8h
- add v21.8h, v21.8h, v27.8h
- uaddw v20.8h, v20.8h, v0.8b
- uaddw2 v21.8h, v21.8h, v0.16b
- rshrn v20.8b, v20.8h, #3 // p0'_2
- rshrn2 v20.16b, v21.8h, #3 // p0'_2
- uaddw v26.8h, v26.8h, v6.8b
- uaddw2 v27.8h, v27.8h, v6.16b
- rshrn v21.8b, v26.8h, #2 // p1'_2
- rshrn2 v21.16b, v27.8h, #2 // p1'_2
- uaddl v28.8h, v4.8b, v5.8b
- uaddl2 v29.8h, v4.16b, v5.16b
- shl v28.8h, v28.8h, #1
- shl v29.8h, v29.8h, #1
- add v28.8h, v28.8h, v26.8h
- add v29.8h, v29.8h, v27.8h
- rshrn v19.8b, v28.8h, #3 // p2'_2
- rshrn2 v19.16b, v29.8h, #3 // p2'_2
+ uaddl v26.8h, v5.8b, v7.8b
+ uaddl2 v27.8h, v5.16b, v7.16b
+ uaddw v26.8h, v26.8h, v0.8b
+ uaddw2 v27.8h, v27.8h, v0.16b
+ add v20.8h, v20.8h, v26.8h
+ add v21.8h, v21.8h, v27.8h
+ uaddw v20.8h, v20.8h, v0.8b
+ uaddw2 v21.8h, v21.8h, v0.16b
+ rshrn v20.8b, v20.8h, #3 // p0'_2
+ rshrn2 v20.16b, v21.8h, #3 // p0'_2
+ uaddw v26.8h, v26.8h, v6.8b
+ uaddw2 v27.8h, v27.8h, v6.16b
+ rshrn v21.8b, v26.8h, #2 // p1'_2
+ rshrn2 v21.16b, v27.8h, #2 // p1'_2
+ uaddl v28.8h, v4.8b, v5.8b
+ uaddl2 v29.8h, v4.16b, v5.16b
+ shl v28.8h, v28.8h, #1
+ shl v29.8h, v29.8h, #1
+ add v28.8h, v28.8h, v26.8h
+ add v29.8h, v29.8h, v27.8h
+ rshrn v19.8b, v28.8h, #3 // p2'_2
+ rshrn2 v19.16b, v29.8h, #3 // p2'_2
//calc q, v0, v1, v2, v3, v18, v0, v1, v2, v3
- uaddl v26.8h, v2.8b, v0.8b
- uaddl2 v27.8h, v2.16b, v0.16b
- uaddw v26.8h, v26.8h, v7.8b
- uaddw2 v27.8h, v27.8h, v7.16b
- add v22.8h, v22.8h, v26.8h
- add v23.8h, v23.8h, v27.8h
- uaddw v22.8h, v22.8h, v7.8b
- uaddw2 v23.8h, v23.8h, v7.16b
- rshrn v22.8b, v22.8h, #3 // q0'_2
- rshrn2 v22.16b, v23.8h, #3 // q0'_2
- uaddw v26.8h, v26.8h, v1.8b
- uaddw2 v27.8h, v27.8h, v1.16b
- rshrn v23.8b, v26.8h, #2 // q1'_2
- rshrn2 v23.16b, v27.8h, #2 // q1'_2
- uaddl v28.8h, v2.8b, v3.8b
- uaddl2 v29.8h, v2.16b, v3.16b
- shl v28.8h, v28.8h, #1
- shl v29.8h, v29.8h, #1
- add v28.8h, v28.8h, v26.8h
- add v29.8h, v29.8h, v27.8h
- rshrn v26.8b, v28.8h, #3 // q2'_2
- rshrn2 v26.16b, v29.8h, #3 // q2'_2
+ uaddl v26.8h, v2.8b, v0.8b
+ uaddl2 v27.8h, v2.16b, v0.16b
+ uaddw v26.8h, v26.8h, v7.8b
+ uaddw2 v27.8h, v27.8h, v7.16b
+ add v22.8h, v22.8h, v26.8h
+ add v23.8h, v23.8h, v27.8h
+ uaddw v22.8h, v22.8h, v7.8b
+ uaddw2 v23.8h, v23.8h, v7.16b
+ rshrn v22.8b, v22.8h, #3 // q0'_2
+ rshrn2 v22.16b, v23.8h, #3 // q0'_2
+ uaddw v26.8h, v26.8h, v1.8b
+ uaddw2 v27.8h, v27.8h, v1.16b
+ rshrn v23.8b, v26.8h, #2 // q1'_2
+ rshrn2 v23.16b, v27.8h, #2 // q1'_2
+ uaddl v28.8h, v2.8b, v3.8b
+ uaddl2 v29.8h, v2.16b, v3.16b
+ shl v28.8h, v28.8h, #1
+ shl v29.8h, v29.8h, #1
+ add v28.8h, v28.8h, v26.8h
+ add v29.8h, v29.8h, v27.8h
+ rshrn v26.8b, v28.8h, #3 // q2'_2
+ rshrn2 v26.16b, v29.8h, #3 // q2'_2
- bit v7.16b, v24.16b, v30.16b // p0'_1
- bit v0.16b, v25.16b, v31.16b // q0'_1
- bit v7.16b, v20.16b, v17.16b // p0'_2
- bit v6.16b, v21.16b, v17.16b // p1'_2
- bit v5.16b, v19.16b, v17.16b // p2'_2
- bit v0.16b, v22.16b, v18.16b // q0'_2
- bit v1.16b, v23.16b, v18.16b // q1'_2
- bit v2.16b, v26.16b, v18.16b // q2'_2
+ bit v7.16b, v24.16b, v30.16b // p0'_1
+ bit v0.16b, v25.16b, v31.16b // q0'_1
+ bit v7.16b, v20.16b, v17.16b // p0'_2
+ bit v6.16b, v21.16b, v17.16b // p1'_2
+ bit v5.16b, v19.16b, v17.16b // p2'_2
+ bit v0.16b, v22.16b, v18.16b // q0'_2
+ bit v1.16b, v23.16b, v18.16b // q1'_2
+ bit v2.16b, v26.16b, v18.16b // q2'_2
.endm
function ff_h264_v_loop_filter_luma_intra_neon, export=1
- h264_loop_filter_start_intra
+ h264_loop_filter_start_intra
- ld1 {v0.16b}, [x0], x1 // q0
- ld1 {v1.16b}, [x0], x1 // q1
- ld1 {v2.16b}, [x0], x1 // q2
- ld1 {v3.16b}, [x0], x1 // q3
- sub x0, x0, x1, lsl #3
- ld1 {v4.16b}, [x0], x1 // p3
- ld1 {v5.16b}, [x0], x1 // p2
- ld1 {v6.16b}, [x0], x1 // p1
- ld1 {v7.16b}, [x0] // p0
+ ld1 {v0.16b}, [x0], x1 // q0
+ ld1 {v1.16b}, [x0], x1 // q1
+ ld1 {v2.16b}, [x0], x1 // q2
+ ld1 {v3.16b}, [x0], x1 // q3
+ sub x0, x0, x1, lsl #3
+ ld1 {v4.16b}, [x0], x1 // p3
+ ld1 {v5.16b}, [x0], x1 // p2
+ ld1 {v6.16b}, [x0], x1 // p1
+ ld1 {v7.16b}, [x0] // p0
- h264_loop_filter_luma_intra
+ h264_loop_filter_luma_intra
- sub x0, x0, x1, lsl #1
- st1 {v5.16b}, [x0], x1 // p2
- st1 {v6.16b}, [x0], x1 // p1
- st1 {v7.16b}, [x0], x1 // p0
- st1 {v0.16b}, [x0], x1 // q0
- st1 {v1.16b}, [x0], x1 // q1
- st1 {v2.16b}, [x0] // q2
+ sub x0, x0, x1, lsl #1
+ st1 {v5.16b}, [x0], x1 // p2
+ st1 {v6.16b}, [x0], x1 // p1
+ st1 {v7.16b}, [x0], x1 // p0
+ st1 {v0.16b}, [x0], x1 // q0
+ st1 {v1.16b}, [x0], x1 // q1
+ st1 {v2.16b}, [x0] // q2
9:
- ret
+ ret
endfunc
function ff_h264_h_loop_filter_luma_intra_neon, export=1
- h264_loop_filter_start_intra
+ h264_loop_filter_start_intra
- sub x0, x0, #4
- ld1 {v4.8b}, [x0], x1
- ld1 {v5.8b}, [x0], x1
- ld1 {v6.8b}, [x0], x1
- ld1 {v7.8b}, [x0], x1
- ld1 {v0.8b}, [x0], x1
- ld1 {v1.8b}, [x0], x1
- ld1 {v2.8b}, [x0], x1
- ld1 {v3.8b}, [x0], x1
- ld1 {v4.d}[1], [x0], x1
- ld1 {v5.d}[1], [x0], x1
- ld1 {v6.d}[1], [x0], x1
- ld1 {v7.d}[1], [x0], x1
- ld1 {v0.d}[1], [x0], x1
- ld1 {v1.d}[1], [x0], x1
- ld1 {v2.d}[1], [x0], x1
- ld1 {v3.d}[1], [x0], x1
+ sub x0, x0, #4
+ ld1 {v4.8b}, [x0], x1
+ ld1 {v5.8b}, [x0], x1
+ ld1 {v6.8b}, [x0], x1
+ ld1 {v7.8b}, [x0], x1
+ ld1 {v0.8b}, [x0], x1
+ ld1 {v1.8b}, [x0], x1
+ ld1 {v2.8b}, [x0], x1
+ ld1 {v3.8b}, [x0], x1
+ ld1 {v4.d}[1], [x0], x1
+ ld1 {v5.d}[1], [x0], x1
+ ld1 {v6.d}[1], [x0], x1
+ ld1 {v7.d}[1], [x0], x1
+ ld1 {v0.d}[1], [x0], x1
+ ld1 {v1.d}[1], [x0], x1
+ ld1 {v2.d}[1], [x0], x1
+ ld1 {v3.d}[1], [x0], x1
- transpose_8x16B v4, v5, v6, v7, v0, v1, v2, v3, v21, v23
+ transpose_8x16B v4, v5, v6, v7, v0, v1, v2, v3, v21, v23
- h264_loop_filter_luma_intra
+ h264_loop_filter_luma_intra
- transpose_8x16B v4, v5, v6, v7, v0, v1, v2, v3, v21, v23
+ transpose_8x16B v4, v5, v6, v7, v0, v1, v2, v3, v21, v23
- sub x0, x0, x1, lsl #4
- st1 {v4.8b}, [x0], x1
- st1 {v5.8b}, [x0], x1
- st1 {v6.8b}, [x0], x1
- st1 {v7.8b}, [x0], x1
- st1 {v0.8b}, [x0], x1
- st1 {v1.8b}, [x0], x1
- st1 {v2.8b}, [x0], x1
- st1 {v3.8b}, [x0], x1
- st1 {v4.d}[1], [x0], x1
- st1 {v5.d}[1], [x0], x1
- st1 {v6.d}[1], [x0], x1
- st1 {v7.d}[1], [x0], x1
- st1 {v0.d}[1], [x0], x1
- st1 {v1.d}[1], [x0], x1
- st1 {v2.d}[1], [x0], x1
- st1 {v3.d}[1], [x0], x1
+ sub x0, x0, x1, lsl #4
+ st1 {v4.8b}, [x0], x1
+ st1 {v5.8b}, [x0], x1
+ st1 {v6.8b}, [x0], x1
+ st1 {v7.8b}, [x0], x1
+ st1 {v0.8b}, [x0], x1
+ st1 {v1.8b}, [x0], x1
+ st1 {v2.8b}, [x0], x1
+ st1 {v3.8b}, [x0], x1
+ st1 {v4.d}[1], [x0], x1
+ st1 {v5.d}[1], [x0], x1
+ st1 {v6.d}[1], [x0], x1
+ st1 {v7.d}[1], [x0], x1
+ st1 {v0.d}[1], [x0], x1
+ st1 {v1.d}[1], [x0], x1
+ st1 {v2.d}[1], [x0], x1
+ st1 {v3.d}[1], [x0], x1
9:
- ret
+ ret
endfunc
.macro h264_loop_filter_chroma
- dup v22.8B, w2 // alpha
- dup v23.8B, w3 // beta
- uxtl v24.8H, v24.8B
- uabd v26.8B, v16.8B, v0.8B // abs(p0 - q0)
- uabd v28.8B, v18.8B, v16.8B // abs(p1 - p0)
- uabd v30.8B, v2.8B, v0.8B // abs(q1 - q0)
- cmhi v26.8B, v22.8B, v26.8B // < alpha
- cmhi v28.8B, v23.8B, v28.8B // < beta
- cmhi v30.8B, v23.8B, v30.8B // < beta
- uxtl v4.8H, v0.8B
- and v26.8B, v26.8B, v28.8B
- usubw v4.8H, v4.8H, v16.8B
- and v26.8B, v26.8B, v30.8B
- shl v4.8H, v4.8H, #2
+ dup v22.8b, w2 // alpha
+ dup v23.8b, w3 // beta
+ uxtl v24.8h, v24.8b
+ uabd v26.8b, v16.8b, v0.8b // abs(p0 - q0)
+ uabd v28.8b, v18.8b, v16.8b // abs(p1 - p0)
+ uabd v30.8b, v2.8b, v0.8b // abs(q1 - q0)
+ cmhi v26.8b, v22.8b, v26.8b // < alpha
+ cmhi v28.8b, v23.8b, v28.8b // < beta
+ cmhi v30.8b, v23.8b, v30.8b // < beta
+ uxtl v4.8h, v0.8b
+ and v26.8b, v26.8b, v28.8b
+ usubw v4.8h, v4.8h, v16.8b
+ and v26.8b, v26.8b, v30.8b
+ shl v4.8h, v4.8h, #2
mov x8, v26.d[0]
- sli v24.8H, v24.8H, #8
- uaddw v4.8H, v4.8H, v18.8B
+ sli v24.8h, v24.8h, #8
+ uaddw v4.8h, v4.8h, v18.8b
cbz x8, 9f
- usubw v4.8H, v4.8H, v2.8B
- rshrn v4.8B, v4.8H, #3
- smin v4.8B, v4.8B, v24.8B
- neg v25.8B, v24.8B
- smax v4.8B, v4.8B, v25.8B
- uxtl v22.8H, v0.8B
- and v4.8B, v4.8B, v26.8B
- uxtl v28.8H, v16.8B
- saddw v28.8H, v28.8H, v4.8B
- ssubw v22.8H, v22.8H, v4.8B
- sqxtun v16.8B, v28.8H
- sqxtun v0.8B, v22.8H
+ usubw v4.8h, v4.8h, v2.8b
+ rshrn v4.8b, v4.8h, #3
+ smin v4.8b, v4.8b, v24.8b
+ neg v25.8b, v24.8b
+ smax v4.8b, v4.8b, v25.8b
+ uxtl v22.8h, v0.8b
+ and v4.8b, v4.8b, v26.8b
+ uxtl v28.8h, v16.8b
+ saddw v28.8h, v28.8h, v4.8b
+ ssubw v22.8h, v22.8h, v4.8b
+ sqxtun v16.8b, v28.8h
+ sqxtun v0.8b, v22.8h
.endm
function ff_h264_v_loop_filter_chroma_neon, export=1
@@ -417,16 +417,16 @@ function ff_h264_v_loop_filter_chroma_neon, export=1
sxtw x1, w1
sub x0, x0, x1, lsl #1
- ld1 {v18.8B}, [x0], x1
- ld1 {v16.8B}, [x0], x1
- ld1 {v0.8B}, [x0], x1
- ld1 {v2.8B}, [x0]
+ ld1 {v18.8b}, [x0], x1
+ ld1 {v16.8b}, [x0], x1
+ ld1 {v0.8b}, [x0], x1
+ ld1 {v2.8b}, [x0]
h264_loop_filter_chroma
sub x0, x0, x1, lsl #1
- st1 {v16.8B}, [x0], x1
- st1 {v0.8B}, [x0], x1
+ st1 {v16.8b}, [x0], x1
+ st1 {v0.8b}, [x0], x1
9:
ret
endfunc
@@ -437,14 +437,14 @@ function ff_h264_h_loop_filter_chroma_neon, export=1
sub x0, x0, #2
h_loop_filter_chroma420:
- ld1 {v18.S}[0], [x0], x1
- ld1 {v16.S}[0], [x0], x1
- ld1 {v0.S}[0], [x0], x1
- ld1 {v2.S}[0], [x0], x1
- ld1 {v18.S}[1], [x0], x1
- ld1 {v16.S}[1], [x0], x1
- ld1 {v0.S}[1], [x0], x1
- ld1 {v2.S}[1], [x0], x1
+ ld1 {v18.s}[0], [x0], x1
+ ld1 {v16.s}[0], [x0], x1
+ ld1 {v0.s}[0], [x0], x1
+ ld1 {v2.s}[0], [x0], x1
+ ld1 {v18.s}[1], [x0], x1
+ ld1 {v16.s}[1], [x0], x1
+ ld1 {v0.s}[1], [x0], x1
+ ld1 {v2.s}[1], [x0], x1
transpose_4x8B v18, v16, v0, v2, v28, v29, v30, v31
@@ -453,14 +453,14 @@ h_loop_filter_chroma420:
transpose_4x8B v18, v16, v0, v2, v28, v29, v30, v31
sub x0, x0, x1, lsl #3
- st1 {v18.S}[0], [x0], x1
- st1 {v16.S}[0], [x0], x1
- st1 {v0.S}[0], [x0], x1
- st1 {v2.S}[0], [x0], x1
- st1 {v18.S}[1], [x0], x1
- st1 {v16.S}[1], [x0], x1
- st1 {v0.S}[1], [x0], x1
- st1 {v2.S}[1], [x0], x1
+ st1 {v18.s}[0], [x0], x1
+ st1 {v16.s}[0], [x0], x1
+ st1 {v0.s}[0], [x0], x1
+ st1 {v2.s}[0], [x0], x1
+ st1 {v18.s}[1], [x0], x1
+ st1 {v16.s}[1], [x0], x1
+ st1 {v0.s}[1], [x0], x1
+ st1 {v2.s}[1], [x0], x1
9:
ret
endfunc
@@ -480,212 +480,212 @@ function ff_h264_h_loop_filter_chroma422_neon, export=1
endfunc
.macro h264_loop_filter_chroma_intra
- uabd v26.8b, v16.8b, v17.8b // abs(p0 - q0)
- uabd v27.8b, v18.8b, v16.8b // abs(p1 - p0)
- uabd v28.8b, v19.8b, v17.8b // abs(q1 - q0)
- cmhi v26.8b, v30.8b, v26.8b // < alpha
- cmhi v27.8b, v31.8b, v27.8b // < beta
- cmhi v28.8b, v31.8b, v28.8b // < beta
- and v26.8b, v26.8b, v27.8b
- and v26.8b, v26.8b, v28.8b
- mov x2, v26.d[0]
+ uabd v26.8b, v16.8b, v17.8b // abs(p0 - q0)
+ uabd v27.8b, v18.8b, v16.8b // abs(p1 - p0)
+ uabd v28.8b, v19.8b, v17.8b // abs(q1 - q0)
+ cmhi v26.8b, v30.8b, v26.8b // < alpha
+ cmhi v27.8b, v31.8b, v27.8b // < beta
+ cmhi v28.8b, v31.8b, v28.8b // < beta
+ and v26.8b, v26.8b, v27.8b
+ and v26.8b, v26.8b, v28.8b
+ mov x2, v26.d[0]
- ushll v4.8h, v18.8b, #1
- ushll v6.8h, v19.8b, #1
- cbz x2, 9f
- uaddl v20.8h, v16.8b, v19.8b
- uaddl v22.8h, v17.8b, v18.8b
- add v20.8h, v20.8h, v4.8h
- add v22.8h, v22.8h, v6.8h
- uqrshrn v24.8b, v20.8h, #2
- uqrshrn v25.8b, v22.8h, #2
- bit v16.8b, v24.8b, v26.8b
- bit v17.8b, v25.8b, v26.8b
+ ushll v4.8h, v18.8b, #1
+ ushll v6.8h, v19.8b, #1
+ cbz x2, 9f
+ uaddl v20.8h, v16.8b, v19.8b
+ uaddl v22.8h, v17.8b, v18.8b
+ add v20.8h, v20.8h, v4.8h
+ add v22.8h, v22.8h, v6.8h
+ uqrshrn v24.8b, v20.8h, #2
+ uqrshrn v25.8b, v22.8h, #2
+ bit v16.8b, v24.8b, v26.8b
+ bit v17.8b, v25.8b, v26.8b
.endm
function ff_h264_v_loop_filter_chroma_intra_neon, export=1
- h264_loop_filter_start_intra
+ h264_loop_filter_start_intra
- sub x0, x0, x1, lsl #1
- ld1 {v18.8b}, [x0], x1
- ld1 {v16.8b}, [x0], x1
- ld1 {v17.8b}, [x0], x1
- ld1 {v19.8b}, [x0]
+ sub x0, x0, x1, lsl #1
+ ld1 {v18.8b}, [x0], x1
+ ld1 {v16.8b}, [x0], x1
+ ld1 {v17.8b}, [x0], x1
+ ld1 {v19.8b}, [x0]
- h264_loop_filter_chroma_intra
+ h264_loop_filter_chroma_intra
- sub x0, x0, x1, lsl #1
- st1 {v16.8b}, [x0], x1
- st1 {v17.8b}, [x0], x1
+ sub x0, x0, x1, lsl #1
+ st1 {v16.8b}, [x0], x1
+ st1 {v17.8b}, [x0], x1
9:
- ret
+ ret
endfunc
function ff_h264_h_loop_filter_chroma_mbaff_intra_neon, export=1
- h264_loop_filter_start_intra
+ h264_loop_filter_start_intra
- sub x4, x0, #2
- sub x0, x0, #1
- ld1 {v18.8b}, [x4], x1
- ld1 {v16.8b}, [x4], x1
- ld1 {v17.8b}, [x4], x1
- ld1 {v19.8b}, [x4], x1
+ sub x4, x0, #2
+ sub x0, x0, #1
+ ld1 {v18.8b}, [x4], x1
+ ld1 {v16.8b}, [x4], x1
+ ld1 {v17.8b}, [x4], x1
+ ld1 {v19.8b}, [x4], x1
- transpose_4x8B v18, v16, v17, v19, v26, v27, v28, v29
+ transpose_4x8B v18, v16, v17, v19, v26, v27, v28, v29
- h264_loop_filter_chroma_intra
+ h264_loop_filter_chroma_intra
- st2 {v16.b,v17.b}[0], [x0], x1
- st2 {v16.b,v17.b}[1], [x0], x1
- st2 {v16.b,v17.b}[2], [x0], x1
- st2 {v16.b,v17.b}[3], [x0], x1
+ st2 {v16.b,v17.b}[0], [x0], x1
+ st2 {v16.b,v17.b}[1], [x0], x1
+ st2 {v16.b,v17.b}[2], [x0], x1
+ st2 {v16.b,v17.b}[3], [x0], x1
9:
- ret
+ ret
endfunc
function ff_h264_h_loop_filter_chroma_intra_neon, export=1
- h264_loop_filter_start_intra
+ h264_loop_filter_start_intra
- sub x4, x0, #2
- sub x0, x0, #1
+ sub x4, x0, #2
+ sub x0, x0, #1
h_loop_filter_chroma420_intra:
- ld1 {v18.8b}, [x4], x1
- ld1 {v16.8b}, [x4], x1
- ld1 {v17.8b}, [x4], x1
- ld1 {v19.8b}, [x4], x1
- ld1 {v18.s}[1], [x4], x1
- ld1 {v16.s}[1], [x4], x1
- ld1 {v17.s}[1], [x4], x1
- ld1 {v19.s}[1], [x4], x1
+ ld1 {v18.8b}, [x4], x1
+ ld1 {v16.8b}, [x4], x1
+ ld1 {v17.8b}, [x4], x1
+ ld1 {v19.8b}, [x4], x1
+ ld1 {v18.s}[1], [x4], x1
+ ld1 {v16.s}[1], [x4], x1
+ ld1 {v17.s}[1], [x4], x1
+ ld1 {v19.s}[1], [x4], x1
- transpose_4x8B v18, v16, v17, v19, v26, v27, v28, v29
+ transpose_4x8B v18, v16, v17, v19, v26, v27, v28, v29
- h264_loop_filter_chroma_intra
+ h264_loop_filter_chroma_intra
- st2 {v16.b,v17.b}[0], [x0], x1
- st2 {v16.b,v17.b}[1], [x0], x1
- st2 {v16.b,v17.b}[2], [x0], x1
- st2 {v16.b,v17.b}[3], [x0], x1
- st2 {v16.b,v17.b}[4], [x0], x1
- st2 {v16.b,v17.b}[5], [x0], x1
- st2 {v16.b,v17.b}[6], [x0], x1
- st2 {v16.b,v17.b}[7], [x0], x1
+ st2 {v16.b,v17.b}[0], [x0], x1
+ st2 {v16.b,v17.b}[1], [x0], x1
+ st2 {v16.b,v17.b}[2], [x0], x1
+ st2 {v16.b,v17.b}[3], [x0], x1
+ st2 {v16.b,v17.b}[4], [x0], x1
+ st2 {v16.b,v17.b}[5], [x0], x1
+ st2 {v16.b,v17.b}[6], [x0], x1
+ st2 {v16.b,v17.b}[7], [x0], x1
9:
- ret
+ ret
endfunc
function ff_h264_h_loop_filter_chroma422_intra_neon, export=1
- h264_loop_filter_start_intra
- sub x4, x0, #2
- add x5, x0, x1, lsl #3
- sub x0, x0, #1
- mov x7, x30
- bl h_loop_filter_chroma420_intra
- sub x0, x5, #1
- mov x30, x7
- b h_loop_filter_chroma420_intra
+ h264_loop_filter_start_intra
+ sub x4, x0, #2
+ add x5, x0, x1, lsl #3
+ sub x0, x0, #1
+ mov x7, x30
+ bl h_loop_filter_chroma420_intra
+ sub x0, x5, #1
+ mov x30, x7
+ b h_loop_filter_chroma420_intra
endfunc
.macro biweight_16 macs, macd
- dup v0.16B, w5
- dup v1.16B, w6
- mov v4.16B, v16.16B
- mov v6.16B, v16.16B
+ dup v0.16b, w5
+ dup v1.16b, w6
+ mov v4.16b, v16.16b
+ mov v6.16b, v16.16b
1: subs w3, w3, #2
- ld1 {v20.16B}, [x0], x2
- \macd v4.8H, v0.8B, v20.8B
+ ld1 {v20.16b}, [x0], x2
+ \macd v4.8h, v0.8b, v20.8b
\macd\()2 v6.8H, v0.16B, v20.16B
- ld1 {v22.16B}, [x1], x2
- \macs v4.8H, v1.8B, v22.8B
+ ld1 {v22.16b}, [x1], x2
+ \macs v4.8h, v1.8b, v22.8b
\macs\()2 v6.8H, v1.16B, v22.16B
- mov v24.16B, v16.16B
- ld1 {v28.16B}, [x0], x2
- mov v26.16B, v16.16B
- \macd v24.8H, v0.8B, v28.8B
+ mov v24.16b, v16.16b
+ ld1 {v28.16b}, [x0], x2
+ mov v26.16b, v16.16b
+ \macd v24.8h, v0.8b, v28.8b
\macd\()2 v26.8H, v0.16B, v28.16B
- ld1 {v30.16B}, [x1], x2
- \macs v24.8H, v1.8B, v30.8B
+ ld1 {v30.16b}, [x1], x2
+ \macs v24.8h, v1.8b, v30.8b
\macs\()2 v26.8H, v1.16B, v30.16B
- sshl v4.8H, v4.8H, v18.8H
- sshl v6.8H, v6.8H, v18.8H
- sqxtun v4.8B, v4.8H
- sqxtun2 v4.16B, v6.8H
- sshl v24.8H, v24.8H, v18.8H
- sshl v26.8H, v26.8H, v18.8H
- sqxtun v24.8B, v24.8H
- sqxtun2 v24.16B, v26.8H
- mov v6.16B, v16.16B
- st1 {v4.16B}, [x7], x2
- mov v4.16B, v16.16B
- st1 {v24.16B}, [x7], x2
+ sshl v4.8h, v4.8h, v18.8h
+ sshl v6.8h, v6.8h, v18.8h
+ sqxtun v4.8b, v4.8h
+ sqxtun2 v4.16b, v6.8h
+ sshl v24.8h, v24.8h, v18.8h
+ sshl v26.8h, v26.8h, v18.8h
+ sqxtun v24.8b, v24.8h
+ sqxtun2 v24.16b, v26.8h
+ mov v6.16b, v16.16b
+ st1 {v4.16b}, [x7], x2
+ mov v4.16b, v16.16b
+ st1 {v24.16b}, [x7], x2
b.ne 1b
ret
.endm
.macro biweight_8 macs, macd
- dup v0.8B, w5
- dup v1.8B, w6
- mov v2.16B, v16.16B
- mov v20.16B, v16.16B
+ dup v0.8b, w5
+ dup v1.8b, w6
+ mov v2.16b, v16.16b
+ mov v20.16b, v16.16b
1: subs w3, w3, #2
- ld1 {v4.8B}, [x0], x2
- \macd v2.8H, v0.8B, v4.8B
- ld1 {v5.8B}, [x1], x2
- \macs v2.8H, v1.8B, v5.8B
- ld1 {v6.8B}, [x0], x2
- \macd v20.8H, v0.8B, v6.8B
- ld1 {v7.8B}, [x1], x2
- \macs v20.8H, v1.8B, v7.8B
- sshl v2.8H, v2.8H, v18.8H
- sqxtun v2.8B, v2.8H
- sshl v20.8H, v20.8H, v18.8H
- sqxtun v4.8B, v20.8H
- mov v20.16B, v16.16B
- st1 {v2.8B}, [x7], x2
- mov v2.16B, v16.16B
- st1 {v4.8B}, [x7], x2
+ ld1 {v4.8b}, [x0], x2
+ \macd v2.8h, v0.8b, v4.8b
+ ld1 {v5.8b}, [x1], x2
+ \macs v2.8h, v1.8b, v5.8b
+ ld1 {v6.8b}, [x0], x2
+ \macd v20.8h, v0.8b, v6.8b
+ ld1 {v7.8b}, [x1], x2
+ \macs v20.8h, v1.8b, v7.8b
+ sshl v2.8h, v2.8h, v18.8h
+ sqxtun v2.8b, v2.8h
+ sshl v20.8h, v20.8h, v18.8h
+ sqxtun v4.8b, v20.8h
+ mov v20.16b, v16.16b
+ st1 {v2.8b}, [x7], x2
+ mov v2.16b, v16.16b
+ st1 {v4.8b}, [x7], x2
b.ne 1b
ret
.endm
.macro biweight_4 macs, macd
- dup v0.8B, w5
- dup v1.8B, w6
- mov v2.16B, v16.16B
- mov v20.16B,v16.16B
+ dup v0.8b, w5
+ dup v1.8b, w6
+ mov v2.16b, v16.16b
+ mov v20.16b,v16.16b
1: subs w3, w3, #4
- ld1 {v4.S}[0], [x0], x2
- ld1 {v4.S}[1], [x0], x2
- \macd v2.8H, v0.8B, v4.8B
- ld1 {v5.S}[0], [x1], x2
- ld1 {v5.S}[1], [x1], x2
- \macs v2.8H, v1.8B, v5.8B
+ ld1 {v4.s}[0], [x0], x2
+ ld1 {v4.s}[1], [x0], x2
+ \macd v2.8h, v0.8b, v4.8b
+ ld1 {v5.s}[0], [x1], x2
+ ld1 {v5.s}[1], [x1], x2
+ \macs v2.8h, v1.8b, v5.8b
b.lt 2f
- ld1 {v6.S}[0], [x0], x2
- ld1 {v6.S}[1], [x0], x2
- \macd v20.8H, v0.8B, v6.8B
- ld1 {v7.S}[0], [x1], x2
- ld1 {v7.S}[1], [x1], x2
- \macs v20.8H, v1.8B, v7.8B
- sshl v2.8H, v2.8H, v18.8H
- sqxtun v2.8B, v2.8H
- sshl v20.8H, v20.8H, v18.8H
- sqxtun v4.8B, v20.8H
- mov v20.16B, v16.16B
- st1 {v2.S}[0], [x7], x2
- st1 {v2.S}[1], [x7], x2
- mov v2.16B, v16.16B
- st1 {v4.S}[0], [x7], x2
- st1 {v4.S}[1], [x7], x2
+ ld1 {v6.s}[0], [x0], x2
+ ld1 {v6.s}[1], [x0], x2
+ \macd v20.8h, v0.8b, v6.8b
+ ld1 {v7.s}[0], [x1], x2
+ ld1 {v7.s}[1], [x1], x2
+ \macs v20.8h, v1.8b, v7.8b
+ sshl v2.8h, v2.8h, v18.8h
+ sqxtun v2.8b, v2.8h
+ sshl v20.8h, v20.8h, v18.8h
+ sqxtun v4.8b, v20.8h
+ mov v20.16b, v16.16b
+ st1 {v2.s}[0], [x7], x2
+ st1 {v2.s}[1], [x7], x2
+ mov v2.16b, v16.16b
+ st1 {v4.s}[0], [x7], x2
+ st1 {v4.s}[1], [x7], x2
b.ne 1b
ret
-2: sshl v2.8H, v2.8H, v18.8H
- sqxtun v2.8B, v2.8H
- st1 {v2.S}[0], [x7], x2
- st1 {v2.S}[1], [x7], x2
+2: sshl v2.8h, v2.8h, v18.8h
+ sqxtun v2.8b, v2.8h
+ st1 {v2.s}[0], [x7], x2
+ st1 {v2.s}[1], [x7], x2
ret
.endm
@@ -696,10 +696,10 @@ function ff_biweight_h264_pixels_\w\()_neon, export=1
add w7, w7, #1
eor w8, w8, w6, lsr #30
orr w7, w7, #1
- dup v18.8H, w4
+ dup v18.8h, w4
lsl w7, w7, w4
- not v18.16B, v18.16B
- dup v16.8H, w7
+ not v18.16b, v18.16b
+ dup v16.8h, w7
mov x7, x0
cbz w8, 10f
subs w8, w8, #1
@@ -723,78 +723,78 @@ endfunc
biweight_func 4
.macro weight_16 add
- dup v0.16B, w4
+ dup v0.16b, w4
1: subs w2, w2, #2
- ld1 {v20.16B}, [x0], x1
- umull v4.8H, v0.8B, v20.8B
- umull2 v6.8H, v0.16B, v20.16B
- ld1 {v28.16B}, [x0], x1
- umull v24.8H, v0.8B, v28.8B
- umull2 v26.8H, v0.16B, v28.16B
- \add v4.8H, v16.8H, v4.8H
- srshl v4.8H, v4.8H, v18.8H
- \add v6.8H, v16.8H, v6.8H
- srshl v6.8H, v6.8H, v18.8H
- sqxtun v4.8B, v4.8H
- sqxtun2 v4.16B, v6.8H
- \add v24.8H, v16.8H, v24.8H
- srshl v24.8H, v24.8H, v18.8H
- \add v26.8H, v16.8H, v26.8H
- srshl v26.8H, v26.8H, v18.8H
- sqxtun v24.8B, v24.8H
- sqxtun2 v24.16B, v26.8H
- st1 {v4.16B}, [x5], x1
- st1 {v24.16B}, [x5], x1
+ ld1 {v20.16b}, [x0], x1
+ umull v4.8h, v0.8b, v20.8b
+ umull2 v6.8h, v0.16b, v20.16b
+ ld1 {v28.16b}, [x0], x1
+ umull v24.8h, v0.8b, v28.8b
+ umull2 v26.8h, v0.16b, v28.16b
+ \add v4.8h, v16.8h, v4.8h
+ srshl v4.8h, v4.8h, v18.8h
+ \add v6.8h, v16.8h, v6.8h
+ srshl v6.8h, v6.8h, v18.8h
+ sqxtun v4.8b, v4.8h
+ sqxtun2 v4.16b, v6.8h
+ \add v24.8h, v16.8h, v24.8h
+ srshl v24.8h, v24.8h, v18.8h
+ \add v26.8h, v16.8h, v26.8h
+ srshl v26.8h, v26.8h, v18.8h
+ sqxtun v24.8b, v24.8h
+ sqxtun2 v24.16b, v26.8h
+ st1 {v4.16b}, [x5], x1
+ st1 {v24.16b}, [x5], x1
b.ne 1b
ret
.endm
.macro weight_8 add
- dup v0.8B, w4
+ dup v0.8b, w4
1: subs w2, w2, #2
- ld1 {v4.8B}, [x0], x1
- umull v2.8H, v0.8B, v4.8B
- ld1 {v6.8B}, [x0], x1
- umull v20.8H, v0.8B, v6.8B
- \add v2.8H, v16.8H, v2.8H
- srshl v2.8H, v2.8H, v18.8H
- sqxtun v2.8B, v2.8H
- \add v20.8H, v16.8H, v20.8H
- srshl v20.8H, v20.8H, v18.8H
- sqxtun v4.8B, v20.8H
- st1 {v2.8B}, [x5], x1
- st1 {v4.8B}, [x5], x1
+ ld1 {v4.8b}, [x0], x1
+ umull v2.8h, v0.8b, v4.8b
+ ld1 {v6.8b}, [x0], x1
+ umull v20.8h, v0.8b, v6.8b
+ \add v2.8h, v16.8h, v2.8h
+ srshl v2.8h, v2.8h, v18.8h
+ sqxtun v2.8b, v2.8h
+ \add v20.8h, v16.8h, v20.8h
+ srshl v20.8h, v20.8h, v18.8h
+ sqxtun v4.8b, v20.8h
+ st1 {v2.8b}, [x5], x1
+ st1 {v4.8b}, [x5], x1
b.ne 1b
ret
.endm
.macro weight_4 add
- dup v0.8B, w4
+ dup v0.8b, w4
1: subs w2, w2, #4
- ld1 {v4.S}[0], [x0], x1
- ld1 {v4.S}[1], [x0], x1
- umull v2.8H, v0.8B, v4.8B
+ ld1 {v4.s}[0], [x0], x1
+ ld1 {v4.s}[1], [x0], x1
+ umull v2.8h, v0.8b, v4.8b
b.lt 2f
- ld1 {v6.S}[0], [x0], x1
- ld1 {v6.S}[1], [x0], x1
- umull v20.8H, v0.8B, v6.8B
- \add v2.8H, v16.8H, v2.8H
- srshl v2.8H, v2.8H, v18.8H
- sqxtun v2.8B, v2.8H
- \add v20.8H, v16.8H, v20.8H
- srshl v20.8H, v20.8h, v18.8H
- sqxtun v4.8B, v20.8H
- st1 {v2.S}[0], [x5], x1
- st1 {v2.S}[1], [x5], x1
- st1 {v4.S}[0], [x5], x1
- st1 {v4.S}[1], [x5], x1
+ ld1 {v6.s}[0], [x0], x1
+ ld1 {v6.s}[1], [x0], x1
+ umull v20.8h, v0.8b, v6.8b
+ \add v2.8h, v16.8h, v2.8h
+ srshl v2.8h, v2.8h, v18.8h
+ sqxtun v2.8b, v2.8h
+ \add v20.8h, v16.8h, v20.8h
+ srshl v20.8h, v20.8h, v18.8h
+ sqxtun v4.8b, v20.8h
+ st1 {v2.s}[0], [x5], x1
+ st1 {v2.s}[1], [x5], x1
+ st1 {v4.s}[0], [x5], x1
+ st1 {v4.s}[1], [x5], x1
b.ne 1b
ret
-2: \add v2.8H, v16.8H, v2.8H
- srshl v2.8H, v2.8H, v18.8H
- sqxtun v2.8B, v2.8H
- st1 {v2.S}[0], [x5], x1
- st1 {v2.S}[1], [x5], x1
+2: \add v2.8h, v16.8h, v2.8h
+ srshl v2.8h, v2.8h, v18.8h
+ sqxtun v2.8b, v2.8h
+ st1 {v2.s}[0], [x5], x1
+ st1 {v2.s}[1], [x5], x1
ret
.endm
@@ -804,18 +804,18 @@ function ff_weight_h264_pixels_\w\()_neon, export=1
cmp w3, #1
mov w6, #1
lsl w5, w5, w3
- dup v16.8H, w5
+ dup v16.8h, w5
mov x5, x0
b.le 20f
sub w6, w6, w3
- dup v18.8H, w6
+ dup v18.8h, w6
cmp w4, #0
b.lt 10f
weight_\w shadd
10: neg w4, w4
weight_\w shsub
20: neg w6, w3
- dup v18.8H, w6
+ dup v18.8h, w6
cmp w4, #0
b.lt 10f
weight_\w add
diff --git a/libavcodec/aarch64/h264qpel_neon.S b/libavcodec/aarch64/h264qpel_neon.S
index d27cfac494..adfb189c58 100644
--- a/libavcodec/aarch64/h264qpel_neon.S
+++ b/libavcodec/aarch64/h264qpel_neon.S
@@ -27,114 +27,114 @@
.macro lowpass_const r
movz \r, #20, lsl #16
movk \r, #5
- mov v6.S[0], \r
+ mov v6.s[0], \r
.endm
//trashes v0-v5
.macro lowpass_8 r0, r1, r2, r3, d0, d1, narrow=1
- ext v2.8B, \r0\().8B, \r1\().8B, #2
- ext v3.8B, \r0\().8B, \r1\().8B, #3
- uaddl v2.8H, v2.8B, v3.8B
- ext v4.8B, \r0\().8B, \r1\().8B, #1
- ext v5.8B, \r0\().8B, \r1\().8B, #4
- uaddl v4.8H, v4.8B, v5.8B
- ext v1.8B, \r0\().8B, \r1\().8B, #5
- uaddl \d0\().8H, \r0\().8B, v1.8B
- ext v0.8B, \r2\().8B, \r3\().8B, #2
- mla \d0\().8H, v2.8H, v6.H[1]
- ext v1.8B, \r2\().8B, \r3\().8B, #3
- uaddl v0.8H, v0.8B, v1.8B
- ext v1.8B, \r2\().8B, \r3\().8B, #1
- mls \d0\().8H, v4.8H, v6.H[0]
- ext v3.8B, \r2\().8B, \r3\().8B, #4
- uaddl v1.8H, v1.8B, v3.8B
- ext v2.8B, \r2\().8B, \r3\().8B, #5
- uaddl \d1\().8H, \r2\().8B, v2.8B
- mla \d1\().8H, v0.8H, v6.H[1]
- mls \d1\().8H, v1.8H, v6.H[0]
+ ext v2.8b, \r0\().8b, \r1\().8b, #2
+ ext v3.8b, \r0\().8b, \r1\().8b, #3
+ uaddl v2.8h, v2.8b, v3.8b
+ ext v4.8b, \r0\().8b, \r1\().8b, #1
+ ext v5.8b, \r0\().8b, \r1\().8b, #4
+ uaddl v4.8h, v4.8b, v5.8b
+ ext v1.8b, \r0\().8b, \r1\().8b, #5
+ uaddl \d0\().8h, \r0\().8b, v1.8b
+ ext v0.8b, \r2\().8b, \r3\().8b, #2
+ mla \d0\().8h, v2.8h, v6.h[1]
+ ext v1.8b, \r2\().8b, \r3\().8b, #3
+ uaddl v0.8h, v0.8b, v1.8b
+ ext v1.8b, \r2\().8b, \r3\().8b, #1
+ mls \d0\().8h, v4.8h, v6.h[0]
+ ext v3.8b, \r2\().8b, \r3\().8b, #4
+ uaddl v1.8h, v1.8b, v3.8b
+ ext v2.8b, \r2\().8b, \r3\().8b, #5
+ uaddl \d1\().8h, \r2\().8b, v2.8b
+ mla \d1\().8h, v0.8h, v6.h[1]
+ mls \d1\().8h, v1.8h, v6.h[0]
.if \narrow
- sqrshrun \d0\().8B, \d0\().8H, #5
- sqrshrun \d1\().8B, \d1\().8H, #5
+ sqrshrun \d0\().8b, \d0\().8h, #5
+ sqrshrun \d1\().8b, \d1\().8h, #5
.endif
.endm
//trashes v0-v5, v7, v30-v31
.macro lowpass_8H r0, r1
- ext v0.16B, \r0\().16B, \r0\().16B, #2
- ext v1.16B, \r0\().16B, \r0\().16B, #3
- uaddl v0.8H, v0.8B, v1.8B
- ext v2.16B, \r0\().16B, \r0\().16B, #1
- ext v3.16B, \r0\().16B, \r0\().16B, #4
- uaddl v2.8H, v2.8B, v3.8B
- ext v30.16B, \r0\().16B, \r0\().16B, #5
- uaddl \r0\().8H, \r0\().8B, v30.8B
- ext v4.16B, \r1\().16B, \r1\().16B, #2
- mla \r0\().8H, v0.8H, v6.H[1]
- ext v5.16B, \r1\().16B, \r1\().16B, #3
- uaddl v4.8H, v4.8B, v5.8B
- ext v7.16B, \r1\().16B, \r1\().16B, #1
- mls \r0\().8H, v2.8H, v6.H[0]
- ext v0.16B, \r1\().16B, \r1\().16B, #4
- uaddl v7.8H, v7.8B, v0.8B
- ext v31.16B, \r1\().16B, \r1\().16B, #5
- uaddl \r1\().8H, \r1\().8B, v31.8B
- mla \r1\().8H, v4.8H, v6.H[1]
- mls \r1\().8H, v7.8H, v6.H[0]
+ ext v0.16b, \r0\().16b, \r0\().16b, #2
+ ext v1.16b, \r0\().16b, \r0\().16b, #3
+ uaddl v0.8h, v0.8b, v1.8b
+ ext v2.16b, \r0\().16b, \r0\().16b, #1
+ ext v3.16b, \r0\().16b, \r0\().16b, #4
+ uaddl v2.8h, v2.8b, v3.8b
+ ext v30.16b, \r0\().16b, \r0\().16b, #5
+ uaddl \r0\().8h, \r0\().8b, v30.8b
+ ext v4.16b, \r1\().16b, \r1\().16b, #2
+ mla \r0\().8h, v0.8h, v6.h[1]
+ ext v5.16b, \r1\().16b, \r1\().16b, #3
+ uaddl v4.8h, v4.8b, v5.8b
+ ext v7.16b, \r1\().16b, \r1\().16b, #1
+ mls \r0\().8h, v2.8h, v6.h[0]
+ ext v0.16b, \r1\().16b, \r1\().16b, #4
+ uaddl v7.8h, v7.8b, v0.8b
+ ext v31.16b, \r1\().16b, \r1\().16b, #5
+ uaddl \r1\().8h, \r1\().8b, v31.8b
+ mla \r1\().8h, v4.8h, v6.h[1]
+ mls \r1\().8h, v7.8h, v6.h[0]
.endm
// trashes v2-v5, v30
.macro lowpass_8_1 r0, r1, d0, narrow=1
- ext v2.8B, \r0\().8B, \r1\().8B, #2
- ext v3.8B, \r0\().8B, \r1\().8B, #3
- uaddl v2.8H, v2.8B, v3.8B
- ext v4.8B, \r0\().8B, \r1\().8B, #1
- ext v5.8B, \r0\().8B, \r1\().8B, #4
- uaddl v4.8H, v4.8B, v5.8B
- ext v30.8B, \r0\().8B, \r1\().8B, #5
- uaddl \d0\().8H, \r0\().8B, v30.8B
- mla \d0\().8H, v2.8H, v6.H[1]
- mls \d0\().8H, v4.8H, v6.H[0]
+ ext v2.8b, \r0\().8b, \r1\().8b, #2
+ ext v3.8b, \r0\().8b, \r1\().8b, #3
+ uaddl v2.8h, v2.8b, v3.8b
+ ext v4.8b, \r0\().8b, \r1\().8b, #1
+ ext v5.8b, \r0\().8b, \r1\().8b, #4
+ uaddl v4.8h, v4.8b, v5.8b
+ ext v30.8b, \r0\().8b, \r1\().8b, #5
+ uaddl \d0\().8h, \r0\().8b, v30.8b
+ mla \d0\().8h, v2.8h, v6.h[1]
+ mls \d0\().8h, v4.8h, v6.h[0]
.if \narrow
- sqrshrun \d0\().8B, \d0\().8H, #5
+ sqrshrun \d0\().8b, \d0\().8h, #5
.endif
.endm
// trashed v0-v7
.macro lowpass_8.16 r0, r1, r2
- ext v1.16B, \r0\().16B, \r1\().16B, #4
- ext v0.16B, \r0\().16B, \r1\().16B, #6
- saddl v5.4S, v1.4H, v0.4H
- ext v2.16B, \r0\().16B, \r1\().16B, #2
- saddl2 v1.4S, v1.8H, v0.8H
- ext v3.16B, \r0\().16B, \r1\().16B, #8
- saddl v6.4S, v2.4H, v3.4H
- ext \r1\().16B, \r0\().16B, \r1\().16B, #10
- saddl2 v2.4S, v2.8H, v3.8H
- saddl v0.4S, \r0\().4H, \r1\().4H
- saddl2 v4.4S, \r0\().8H, \r1\().8H
+ ext v1.16b, \r0\().16b, \r1\().16b, #4
+ ext v0.16b, \r0\().16b, \r1\().16b, #6
+ saddl v5.4s, v1.4h, v0.4h
+ ext v2.16b, \r0\().16b, \r1\().16b, #2
+ saddl2 v1.4s, v1.8h, v0.8h
+ ext v3.16b, \r0\().16b, \r1\().16b, #8
+ saddl v6.4s, v2.4h, v3.4h
+ ext \r1\().16b, \r0\().16b, \r1\().16b, #10
+ saddl2 v2.4s, v2.8h, v3.8h
+ saddl v0.4s, \r0\().4h, \r1\().4h
+ saddl2 v4.4s, \r0\().8h, \r1\().8h
- shl v3.4S, v5.4S, #4
- shl v5.4S, v5.4S, #2
- shl v7.4S, v6.4S, #2
- add v5.4S, v5.4S, v3.4S
- add v6.4S, v6.4S, v7.4S
+ shl v3.4s, v5.4s, #4
+ shl v5.4s, v5.4s, #2
+ shl v7.4s, v6.4s, #2
+ add v5.4s, v5.4s, v3.4s
+ add v6.4s, v6.4s, v7.4s
- shl v3.4S, v1.4S, #4
- shl v1.4S, v1.4S, #2
- shl v7.4S, v2.4S, #2
- add v1.4S, v1.4S, v3.4S
- add v2.4S, v2.4S, v7.4S
+ shl v3.4s, v1.4s, #4
+ shl v1.4s, v1.4s, #2
+ shl v7.4s, v2.4s, #2
+ add v1.4s, v1.4s, v3.4s
+ add v2.4s, v2.4s, v7.4s
- add v5.4S, v5.4S, v0.4S
- sub v5.4S, v5.4S, v6.4S
+ add v5.4s, v5.4s, v0.4s
+ sub v5.4s, v5.4s, v6.4s
- add v1.4S, v1.4S, v4.4S
- sub v1.4S, v1.4S, v2.4S
+ add v1.4s, v1.4s, v4.4s
+ sub v1.4s, v1.4s, v2.4s
- rshrn v5.4H, v5.4S, #10
- rshrn2 v5.8H, v1.4S, #10
+ rshrn v5.4h, v5.4s, #10
+ rshrn2 v5.8h, v1.4s, #10
- sqxtun \r2\().8B, v5.8H
+ sqxtun \r2\().8b, v5.8h
.endm
function put_h264_qpel16_h_lowpass_neon_packed
@@ -163,19 +163,19 @@ function \type\()_h264_qpel16_h_lowpass_neon
endfunc
function \type\()_h264_qpel8_h_lowpass_neon
-1: ld1 {v28.8B, v29.8B}, [x1], x2
- ld1 {v16.8B, v17.8B}, [x1], x2
+1: ld1 {v28.8b, v29.8b}, [x1], x2
+ ld1 {v16.8b, v17.8b}, [x1], x2
subs x12, x12, #2
lowpass_8 v28, v29, v16, v17, v28, v16
.ifc \type,avg
- ld1 {v2.8B}, [x0], x3
- urhadd v28.8B, v28.8B, v2.8B
- ld1 {v3.8B}, [x0]
- urhadd v16.8B, v16.8B, v3.8B
+ ld1 {v2.8b}, [x0], x3
+ urhadd v28.8b, v28.8b, v2.8b
+ ld1 {v3.8b}, [x0]
+ urhadd v16.8b, v16.8b, v3.8b
sub x0, x0, x3
.endif
- st1 {v28.8B}, [x0], x3
- st1 {v16.8B}, [x0], x3
+ st1 {v28.8b}, [x0], x3
+ st1 {v16.8b}, [x0], x3
b.ne 1b
ret
endfunc
@@ -200,23 +200,23 @@ function \type\()_h264_qpel16_h_lowpass_l2_neon
endfunc
function \type\()_h264_qpel8_h_lowpass_l2_neon
-1: ld1 {v26.8B, v27.8B}, [x1], x2
- ld1 {v16.8B, v17.8B}, [x1], x2
- ld1 {v28.8B}, [x3], x2
- ld1 {v29.8B}, [x3], x2
+1: ld1 {v26.8b, v27.8b}, [x1], x2
+ ld1 {v16.8b, v17.8b}, [x1], x2
+ ld1 {v28.8b}, [x3], x2
+ ld1 {v29.8b}, [x3], x2
subs x12, x12, #2
lowpass_8 v26, v27, v16, v17, v26, v27
- urhadd v26.8B, v26.8B, v28.8B
- urhadd v27.8B, v27.8B, v29.8B
+ urhadd v26.8b, v26.8b, v28.8b
+ urhadd v27.8b, v27.8b, v29.8b
.ifc \type,avg
- ld1 {v2.8B}, [x0], x2
- urhadd v26.8B, v26.8B, v2.8B
- ld1 {v3.8B}, [x0]
- urhadd v27.8B, v27.8B, v3.8B
+ ld1 {v2.8b}, [x0], x2
+ urhadd v26.8b, v26.8b, v2.8b
+ ld1 {v3.8b}, [x0]
+ urhadd v27.8b, v27.8b, v3.8b
sub x0, x0, x2
.endif
- st1 {v26.8B}, [x0], x2
- st1 {v27.8B}, [x0], x2
+ st1 {v26.8b}, [x0], x2
+ st1 {v27.8b}, [x0], x2
b.ne 1b
ret
endfunc
@@ -257,19 +257,19 @@ function \type\()_h264_qpel16_v_lowpass_neon
endfunc
function \type\()_h264_qpel8_v_lowpass_neon
- ld1 {v16.8B}, [x1], x3
- ld1 {v18.8B}, [x1], x3
- ld1 {v20.8B}, [x1], x3
- ld1 {v22.8B}, [x1], x3
- ld1 {v24.8B}, [x1], x3
- ld1 {v26.8B}, [x1], x3
- ld1 {v28.8B}, [x1], x3
- ld1 {v30.8B}, [x1], x3
- ld1 {v17.8B}, [x1], x3
- ld1 {v19.8B}, [x1], x3
- ld1 {v21.8B}, [x1], x3
- ld1 {v23.8B}, [x1], x3
- ld1 {v25.8B}, [x1]
+ ld1 {v16.8b}, [x1], x3
+ ld1 {v18.8b}, [x1], x3
+ ld1 {v20.8b}, [x1], x3
+ ld1 {v22.8b}, [x1], x3
+ ld1 {v24.8b}, [x1], x3
+ ld1 {v26.8b}, [x1], x3
+ ld1 {v28.8b}, [x1], x3
+ ld1 {v30.8b}, [x1], x3
+ ld1 {v17.8b}, [x1], x3
+ ld1 {v19.8b}, [x1], x3
+ ld1 {v21.8b}, [x1], x3
+ ld1 {v23.8b}, [x1], x3
+ ld1 {v25.8b}, [x1]
transpose_8x8B v16, v18, v20, v22, v24, v26, v28, v30, v0, v1
transpose_8x8B v17, v19, v21, v23, v25, v27, v29, v31, v0, v1
@@ -280,33 +280,33 @@ function \type\()_h264_qpel8_v_lowpass_neon
transpose_8x8B v16, v17, v18, v19, v20, v21, v22, v23, v0, v1
.ifc \type,avg
- ld1 {v24.8B}, [x0], x2
- urhadd v16.8B, v16.8B, v24.8B
- ld1 {v25.8B}, [x0], x2
- urhadd v17.8B, v17.8B, v25.8B
- ld1 {v26.8B}, [x0], x2
- urhadd v18.8B, v18.8B, v26.8B
- ld1 {v27.8B}, [x0], x2
- urhadd v19.8B, v19.8B, v27.8B
- ld1 {v28.8B}, [x0], x2
- urhadd v20.8B, v20.8B, v28.8B
- ld1 {v29.8B}, [x0], x2
- urhadd v21.8B, v21.8B, v29.8B
- ld1 {v30.8B}, [x0], x2
- urhadd v22.8B, v22.8B, v30.8B
- ld1 {v31.8B}, [x0], x2
- urhadd v23.8B, v23.8B, v31.8B
+ ld1 {v24.8b}, [x0], x2
+ urhadd v16.8b, v16.8b, v24.8b
+ ld1 {v25.8b}, [x0], x2
+ urhadd v17.8b, v17.8b, v25.8b
+ ld1 {v26.8b}, [x0], x2
+ urhadd v18.8b, v18.8b, v26.8b
+ ld1 {v27.8b}, [x0], x2
+ urhadd v19.8b, v19.8b, v27.8b
+ ld1 {v28.8b}, [x0], x2
+ urhadd v20.8b, v20.8b, v28.8b
+ ld1 {v29.8b}, [x0], x2
+ urhadd v21.8b, v21.8b, v29.8b
+ ld1 {v30.8b}, [x0], x2
+ urhadd v22.8b, v22.8b, v30.8b
+ ld1 {v31.8b}, [x0], x2
+ urhadd v23.8b, v23.8b, v31.8b
sub x0, x0, x2, lsl #3
.endif
- st1 {v16.8B}, [x0], x2
- st1 {v17.8B}, [x0], x2
- st1 {v18.8B}, [x0], x2
- st1 {v19.8B}, [x0], x2
- st1 {v20.8B}, [x0], x2
- st1 {v21.8B}, [x0], x2
- st1 {v22.8B}, [x0], x2
- st1 {v23.8B}, [x0], x2
+ st1 {v16.8b}, [x0], x2
+ st1 {v17.8b}, [x0], x2
+ st1 {v18.8b}, [x0], x2
+ st1 {v19.8b}, [x0], x2
+ st1 {v20.8b}, [x0], x2
+ st1 {v21.8b}, [x0], x2
+ st1 {v22.8b}, [x0], x2
+ st1 {v23.8b}, [x0], x2
ret
endfunc
@@ -334,19 +334,19 @@ function \type\()_h264_qpel16_v_lowpass_l2_neon
endfunc
function \type\()_h264_qpel8_v_lowpass_l2_neon
- ld1 {v16.8B}, [x1], x3
- ld1 {v18.8B}, [x1], x3
- ld1 {v20.8B}, [x1], x3
- ld1 {v22.8B}, [x1], x3
- ld1 {v24.8B}, [x1], x3
- ld1 {v26.8B}, [x1], x3
- ld1 {v28.8B}, [x1], x3
- ld1 {v30.8B}, [x1], x3
- ld1 {v17.8B}, [x1], x3
- ld1 {v19.8B}, [x1], x3
- ld1 {v21.8B}, [x1], x3
- ld1 {v23.8B}, [x1], x3
- ld1 {v25.8B}, [x1]
+ ld1 {v16.8b}, [x1], x3
+ ld1 {v18.8b}, [x1], x3
+ ld1 {v20.8b}, [x1], x3
+ ld1 {v22.8b}, [x1], x3
+ ld1 {v24.8b}, [x1], x3
+ ld1 {v26.8b}, [x1], x3
+ ld1 {v28.8b}, [x1], x3
+ ld1 {v30.8b}, [x1], x3
+ ld1 {v17.8b}, [x1], x3
+ ld1 {v19.8b}, [x1], x3
+ ld1 {v21.8b}, [x1], x3
+ ld1 {v23.8b}, [x1], x3
+ ld1 {v25.8b}, [x1]
transpose_8x8B v16, v18, v20, v22, v24, v26, v28, v30, v0, v1
transpose_8x8B v17, v19, v21, v23, v25, v27, v29, v31, v0, v1
@@ -356,51 +356,51 @@ function \type\()_h264_qpel8_v_lowpass_l2_neon
lowpass_8 v28, v29, v30, v31, v22, v23
transpose_8x8B v16, v17, v18, v19, v20, v21, v22, v23, v0, v1
- ld1 {v24.8B}, [x12], x2
- ld1 {v25.8B}, [x12], x2
- ld1 {v26.8B}, [x12], x2
- ld1 {v27.8B}, [x12], x2
- ld1 {v28.8B}, [x12], x2
- urhadd v16.8B, v24.8B, v16.8B
- urhadd v17.8B, v25.8B, v17.8B
- ld1 {v29.8B}, [x12], x2
- urhadd v18.8B, v26.8B, v18.8B
- urhadd v19.8B, v27.8B, v19.8B
- ld1 {v30.8B}, [x12], x2
- urhadd v20.8B, v28.8B, v20.8B
- urhadd v21.8B, v29.8B, v21.8B
- ld1 {v31.8B}, [x12], x2
- urhadd v22.8B, v30.8B, v22.8B
- urhadd v23.8B, v31.8B, v23.8B
+ ld1 {v24.8b}, [x12], x2
+ ld1 {v25.8b}, [x12], x2
+ ld1 {v26.8b}, [x12], x2
+ ld1 {v27.8b}, [x12], x2
+ ld1 {v28.8b}, [x12], x2
+ urhadd v16.8b, v24.8b, v16.8b
+ urhadd v17.8b, v25.8b, v17.8b
+ ld1 {v29.8b}, [x12], x2
+ urhadd v18.8b, v26.8b, v18.8b
+ urhadd v19.8b, v27.8b, v19.8b
+ ld1 {v30.8b}, [x12], x2
+ urhadd v20.8b, v28.8b, v20.8b
+ urhadd v21.8b, v29.8b, v21.8b
+ ld1 {v31.8b}, [x12], x2
+ urhadd v22.8b, v30.8b, v22.8b
+ urhadd v23.8b, v31.8b, v23.8b
.ifc \type,avg
- ld1 {v24.8B}, [x0], x3
- urhadd v16.8B, v16.8B, v24.8B
- ld1 {v25.8B}, [x0], x3
- urhadd v17.8B, v17.8B, v25.8B
- ld1 {v26.8B}, [x0], x3
- urhadd v18.8B, v18.8B, v26.8B
- ld1 {v27.8B}, [x0], x3
- urhadd v19.8B, v19.8B, v27.8B
- ld1 {v28.8B}, [x0], x3
- urhadd v20.8B, v20.8B, v28.8B
- ld1 {v29.8B}, [x0], x3
- urhadd v21.8B, v21.8B, v29.8B
- ld1 {v30.8B}, [x0], x3
- urhadd v22.8B, v22.8B, v30.8B
- ld1 {v31.8B}, [x0], x3
- urhadd v23.8B, v23.8B, v31.8B
+ ld1 {v24.8b}, [x0], x3
+ urhadd v16.8b, v16.8b, v24.8b
+ ld1 {v25.8b}, [x0], x3
+ urhadd v17.8b, v17.8b, v25.8b
+ ld1 {v26.8b}, [x0], x3
+ urhadd v18.8b, v18.8b, v26.8b
+ ld1 {v27.8b}, [x0], x3
+ urhadd v19.8b, v19.8b, v27.8b
+ ld1 {v28.8b}, [x0], x3
+ urhadd v20.8b, v20.8b, v28.8b
+ ld1 {v29.8b}, [x0], x3
+ urhadd v21.8b, v21.8b, v29.8b
+ ld1 {v30.8b}, [x0], x3
+ urhadd v22.8b, v22.8b, v30.8b
+ ld1 {v31.8b}, [x0], x3
+ urhadd v23.8b, v23.8b, v31.8b
sub x0, x0, x3, lsl #3
.endif
- st1 {v16.8B}, [x0], x3
- st1 {v17.8B}, [x0], x3
- st1 {v18.8B}, [x0], x3
- st1 {v19.8B}, [x0], x3
- st1 {v20.8B}, [x0], x3
- st1 {v21.8B}, [x0], x3
- st1 {v22.8B}, [x0], x3
- st1 {v23.8B}, [x0], x3
+ st1 {v16.8b}, [x0], x3
+ st1 {v17.8b}, [x0], x3
+ st1 {v18.8b}, [x0], x3
+ st1 {v19.8b}, [x0], x3
+ st1 {v20.8b}, [x0], x3
+ st1 {v21.8b}, [x0], x3
+ st1 {v22.8b}, [x0], x3
+ st1 {v23.8b}, [x0], x3
ret
endfunc
@@ -411,19 +411,19 @@ endfunc
function put_h264_qpel8_hv_lowpass_neon_top
lowpass_const w12
- ld1 {v16.8H}, [x1], x3
- ld1 {v17.8H}, [x1], x3
- ld1 {v18.8H}, [x1], x3
- ld1 {v19.8H}, [x1], x3
- ld1 {v20.8H}, [x1], x3
- ld1 {v21.8H}, [x1], x3
- ld1 {v22.8H}, [x1], x3
- ld1 {v23.8H}, [x1], x3
- ld1 {v24.8H}, [x1], x3
- ld1 {v25.8H}, [x1], x3
- ld1 {v26.8H}, [x1], x3
- ld1 {v27.8H}, [x1], x3
- ld1 {v28.8H}, [x1]
+ ld1 {v16.8h}, [x1], x3
+ ld1 {v17.8h}, [x1], x3
+ ld1 {v18.8h}, [x1], x3
+ ld1 {v19.8h}, [x1], x3
+ ld1 {v20.8h}, [x1], x3
+ ld1 {v21.8h}, [x1], x3
+ ld1 {v22.8h}, [x1], x3
+ ld1 {v23.8h}, [x1], x3
+ ld1 {v24.8h}, [x1], x3
+ ld1 {v25.8h}, [x1], x3
+ ld1 {v26.8h}, [x1], x3
+ ld1 {v27.8h}, [x1], x3
+ ld1 {v28.8h}, [x1]
lowpass_8H v16, v17
lowpass_8H v18, v19
lowpass_8H v20, v21
@@ -447,7 +447,7 @@ function put_h264_qpel8_hv_lowpass_neon_top
lowpass_8.16 v22, v30, v22
lowpass_8.16 v23, v31, v23
- transpose_8x8B v16, v17, v18, v19, v20, v21, v22, v23, v0, v1
+ transpose_8x8B v16, v17, v18, v19, v20, v21, v22, v23, v0, v1
ret
endfunc
@@ -457,33 +457,33 @@ function \type\()_h264_qpel8_hv_lowpass_neon
mov x10, x30
bl put_h264_qpel8_hv_lowpass_neon_top
.ifc \type,avg
- ld1 {v0.8B}, [x0], x2
- urhadd v16.8B, v16.8B, v0.8B
- ld1 {v1.8B}, [x0], x2
- urhadd v17.8B, v17.8B, v1.8B
- ld1 {v2.8B}, [x0], x2
- urhadd v18.8B, v18.8B, v2.8B
- ld1 {v3.8B}, [x0], x2
- urhadd v19.8B, v19.8B, v3.8B
- ld1 {v4.8B}, [x0], x2
- urhadd v20.8B, v20.8B, v4.8B
- ld1 {v5.8B}, [x0], x2
- urhadd v21.8B, v21.8B, v5.8B
- ld1 {v6.8B}, [x0], x2
- urhadd v22.8B, v22.8B, v6.8B
- ld1 {v7.8B}, [x0], x2
- urhadd v23.8B, v23.8B, v7.8B
+ ld1 {v0.8b}, [x0], x2
+ urhadd v16.8b, v16.8b, v0.8b
+ ld1 {v1.8b}, [x0], x2
+ urhadd v17.8b, v17.8b, v1.8b
+ ld1 {v2.8b}, [x0], x2
+ urhadd v18.8b, v18.8b, v2.8b
+ ld1 {v3.8b}, [x0], x2
+ urhadd v19.8b, v19.8b, v3.8b
+ ld1 {v4.8b}, [x0], x2
+ urhadd v20.8b, v20.8b, v4.8b
+ ld1 {v5.8b}, [x0], x2
+ urhadd v21.8b, v21.8b, v5.8b
+ ld1 {v6.8b}, [x0], x2
+ urhadd v22.8b, v22.8b, v6.8b
+ ld1 {v7.8b}, [x0], x2
+ urhadd v23.8b, v23.8b, v7.8b
sub x0, x0, x2, lsl #3
.endif
- st1 {v16.8B}, [x0], x2
- st1 {v17.8B}, [x0], x2
- st1 {v18.8B}, [x0], x2
- st1 {v19.8B}, [x0], x2
- st1 {v20.8B}, [x0], x2
- st1 {v21.8B}, [x0], x2
- st1 {v22.8B}, [x0], x2
- st1 {v23.8B}, [x0], x2
+ st1 {v16.8b}, [x0], x2
+ st1 {v17.8b}, [x0], x2
+ st1 {v18.8b}, [x0], x2
+ st1 {v19.8b}, [x0], x2
+ st1 {v20.8b}, [x0], x2
+ st1 {v21.8b}, [x0], x2
+ st1 {v22.8b}, [x0], x2
+ st1 {v23.8b}, [x0], x2
ret x10
endfunc
@@ -497,45 +497,45 @@ function \type\()_h264_qpel8_hv_lowpass_l2_neon
mov x10, x30
bl put_h264_qpel8_hv_lowpass_neon_top
- ld1 {v0.8B, v1.8B}, [x2], #16
- ld1 {v2.8B, v3.8B}, [x2], #16
- urhadd v0.8B, v0.8B, v16.8B
- urhadd v1.8B, v1.8B, v17.8B
- ld1 {v4.8B, v5.8B}, [x2], #16
- urhadd v2.8B, v2.8B, v18.8B
- urhadd v3.8B, v3.8B, v19.8B
- ld1 {v6.8B, v7.8B}, [x2], #16
- urhadd v4.8B, v4.8B, v20.8B
- urhadd v5.8B, v5.8B, v21.8B
- urhadd v6.8B, v6.8B, v22.8B
- urhadd v7.8B, v7.8B, v23.8B
+ ld1 {v0.8b, v1.8b}, [x2], #16
+ ld1 {v2.8b, v3.8b}, [x2], #16
+ urhadd v0.8b, v0.8b, v16.8b
+ urhadd v1.8b, v1.8b, v17.8b
+ ld1 {v4.8b, v5.8b}, [x2], #16
+ urhadd v2.8b, v2.8b, v18.8b
+ urhadd v3.8b, v3.8b, v19.8b
+ ld1 {v6.8b, v7.8b}, [x2], #16
+ urhadd v4.8b, v4.8b, v20.8b
+ urhadd v5.8b, v5.8b, v21.8b
+ urhadd v6.8b, v6.8b, v22.8b
+ urhadd v7.8b, v7.8b, v23.8b
.ifc \type,avg
- ld1 {v16.8B}, [x0], x3
- urhadd v0.8B, v0.8B, v16.8B
- ld1 {v17.8B}, [x0], x3
- urhadd v1.8B, v1.8B, v17.8B
- ld1 {v18.8B}, [x0], x3
- urhadd v2.8B, v2.8B, v18.8B
- ld1 {v19.8B}, [x0], x3
- urhadd v3.8B, v3.8B, v19.8B
- ld1 {v20.8B}, [x0], x3
- urhadd v4.8B, v4.8B, v20.8B
- ld1 {v21.8B}, [x0], x3
- urhadd v5.8B, v5.8B, v21.8B
- ld1 {v22.8B}, [x0], x3
- urhadd v6.8B, v6.8B, v22.8B
- ld1 {v23.8B}, [x0], x3
- urhadd v7.8B, v7.8B, v23.8B
+ ld1 {v16.8b}, [x0], x3
+ urhadd v0.8b, v0.8b, v16.8b
+ ld1 {v17.8b}, [x0], x3
+ urhadd v1.8b, v1.8b, v17.8b
+ ld1 {v18.8b}, [x0], x3
+ urhadd v2.8b, v2.8b, v18.8b
+ ld1 {v19.8b}, [x0], x3
+ urhadd v3.8b, v3.8b, v19.8b
+ ld1 {v20.8b}, [x0], x3
+ urhadd v4.8b, v4.8b, v20.8b
+ ld1 {v21.8b}, [x0], x3
+ urhadd v5.8b, v5.8b, v21.8b
+ ld1 {v22.8b}, [x0], x3
+ urhadd v6.8b, v6.8b, v22.8b
+ ld1 {v23.8b}, [x0], x3
+ urhadd v7.8b, v7.8b, v23.8b
sub x0, x0, x3, lsl #3
.endif
- st1 {v0.8B}, [x0], x3
- st1 {v1.8B}, [x0], x3
- st1 {v2.8B}, [x0], x3
- st1 {v3.8B}, [x0], x3
- st1 {v4.8B}, [x0], x3
- st1 {v5.8B}, [x0], x3
- st1 {v6.8B}, [x0], x3
- st1 {v7.8B}, [x0], x3
+ st1 {v0.8b}, [x0], x3
+ st1 {v1.8b}, [x0], x3
+ st1 {v2.8b}, [x0], x3
+ st1 {v3.8b}, [x0], x3
+ st1 {v4.8b}, [x0], x3
+ st1 {v5.8b}, [x0], x3
+ st1 {v6.8b}, [x0], x3
+ st1 {v7.8b}, [x0], x3
ret x10
endfunc
@@ -579,8 +579,8 @@ function \type\()_h264_qpel16_hv_lowpass_l2_neon
endfunc
.endm
- h264_qpel16_hv put
- h264_qpel16_hv avg
+ h264_qpel16_hv put
+ h264_qpel16_hv avg
.macro h264_qpel8 type
function ff_\type\()_h264_qpel8_mc10_neon, export=1
@@ -758,8 +758,8 @@ function ff_\type\()_h264_qpel8_mc33_neon, export=1
endfunc
.endm
- h264_qpel8 put
- h264_qpel8 avg
+ h264_qpel8 put
+ h264_qpel8 avg
.macro h264_qpel16 type
function ff_\type\()_h264_qpel16_mc10_neon, export=1
@@ -930,5 +930,5 @@ function ff_\type\()_h264_qpel16_mc33_neon, export=1
endfunc
.endm
- h264_qpel16 put
- h264_qpel16 avg
+ h264_qpel16 put
+ h264_qpel16 avg
diff --git a/libavcodec/aarch64/hpeldsp_neon.S b/libavcodec/aarch64/hpeldsp_neon.S
index a491c173bb..e7c1549c40 100644
--- a/libavcodec/aarch64/hpeldsp_neon.S
+++ b/libavcodec/aarch64/hpeldsp_neon.S
@@ -26,295 +26,295 @@
.if \avg
mov x12, x0
.endif
-1: ld1 {v0.16B}, [x1], x2
- ld1 {v1.16B}, [x1], x2
- ld1 {v2.16B}, [x1], x2
- ld1 {v3.16B}, [x1], x2
+1: ld1 {v0.16b}, [x1], x2
+ ld1 {v1.16b}, [x1], x2
+ ld1 {v2.16b}, [x1], x2
+ ld1 {v3.16b}, [x1], x2
.if \avg
- ld1 {v4.16B}, [x12], x2
- urhadd v0.16B, v0.16B, v4.16B
- ld1 {v5.16B}, [x12], x2
- urhadd v1.16B, v1.16B, v5.16B
- ld1 {v6.16B}, [x12], x2
- urhadd v2.16B, v2.16B, v6.16B
- ld1 {v7.16B}, [x12], x2
- urhadd v3.16B, v3.16B, v7.16B
+ ld1 {v4.16b}, [x12], x2
+ urhadd v0.16b, v0.16b, v4.16b
+ ld1 {v5.16b}, [x12], x2
+ urhadd v1.16b, v1.16b, v5.16b
+ ld1 {v6.16b}, [x12], x2
+ urhadd v2.16b, v2.16b, v6.16b
+ ld1 {v7.16b}, [x12], x2
+ urhadd v3.16b, v3.16b, v7.16b
.endif
subs w3, w3, #4
- st1 {v0.16B}, [x0], x2
- st1 {v1.16B}, [x0], x2
- st1 {v2.16B}, [x0], x2
- st1 {v3.16B}, [x0], x2
+ st1 {v0.16b}, [x0], x2
+ st1 {v1.16b}, [x0], x2
+ st1 {v2.16b}, [x0], x2
+ st1 {v3.16b}, [x0], x2
b.ne 1b
ret
.endm
.macro pixels16_x2 rnd=1, avg=0
-1: ld1 {v0.16B, v1.16B}, [x1], x2
- ld1 {v2.16B, v3.16B}, [x1], x2
+1: ld1 {v0.16b, v1.16b}, [x1], x2
+ ld1 {v2.16b, v3.16b}, [x1], x2
subs w3, w3, #2
- ext v1.16B, v0.16B, v1.16B, #1
- avg v0.16B, v0.16B, v1.16B
- ext v3.16B, v2.16B, v3.16B, #1
- avg v2.16B, v2.16B, v3.16B
+ ext v1.16b, v0.16b, v1.16b, #1
+ avg v0.16b, v0.16b, v1.16b
+ ext v3.16b, v2.16b, v3.16b, #1
+ avg v2.16b, v2.16b, v3.16b
.if \avg
- ld1 {v1.16B}, [x0], x2
- ld1 {v3.16B}, [x0]
- urhadd v0.16B, v0.16B, v1.16B
- urhadd v2.16B, v2.16B, v3.16B
+ ld1 {v1.16b}, [x0], x2
+ ld1 {v3.16b}, [x0]
+ urhadd v0.16b, v0.16b, v1.16b
+ urhadd v2.16b, v2.16b, v3.16b
sub x0, x0, x2
.endif
- st1 {v0.16B}, [x0], x2
- st1 {v2.16B}, [x0], x2
+ st1 {v0.16b}, [x0], x2
+ st1 {v2.16b}, [x0], x2
b.ne 1b
ret
.endm
.macro pixels16_y2 rnd=1, avg=0
sub w3, w3, #2
- ld1 {v0.16B}, [x1], x2
- ld1 {v1.16B}, [x1], x2
+ ld1 {v0.16b}, [x1], x2
+ ld1 {v1.16b}, [x1], x2
1: subs w3, w3, #2
- avg v2.16B, v0.16B, v1.16B
- ld1 {v0.16B}, [x1], x2
- avg v3.16B, v0.16B, v1.16B
- ld1 {v1.16B}, [x1], x2
+ avg v2.16b, v0.16b, v1.16b
+ ld1 {v0.16b}, [x1], x2
+ avg v3.16b, v0.16b, v1.16b
+ ld1 {v1.16b}, [x1], x2
.if \avg
- ld1 {v4.16B}, [x0], x2
- ld1 {v5.16B}, [x0]
- urhadd v2.16B, v2.16B, v4.16B
- urhadd v3.16B, v3.16B, v5.16B
+ ld1 {v4.16b}, [x0], x2
+ ld1 {v5.16b}, [x0]
+ urhadd v2.16b, v2.16b, v4.16b
+ urhadd v3.16b, v3.16b, v5.16b
sub x0, x0, x2
.endif
- st1 {v2.16B}, [x0], x2
- st1 {v3.16B}, [x0], x2
+ st1 {v2.16b}, [x0], x2
+ st1 {v3.16b}, [x0], x2
b.ne 1b
- avg v2.16B, v0.16B, v1.16B
- ld1 {v0.16B}, [x1], x2
- avg v3.16B, v0.16B, v1.16B
+ avg v2.16b, v0.16b, v1.16b
+ ld1 {v0.16b}, [x1], x2
+ avg v3.16b, v0.16b, v1.16b
.if \avg
- ld1 {v4.16B}, [x0], x2
- ld1 {v5.16B}, [x0]
- urhadd v2.16B, v2.16B, v4.16B
- urhadd v3.16B, v3.16B, v5.16B
+ ld1 {v4.16b}, [x0], x2
+ ld1 {v5.16b}, [x0]
+ urhadd v2.16b, v2.16b, v4.16b
+ urhadd v3.16b, v3.16b, v5.16b
sub x0, x0, x2
.endif
- st1 {v2.16B}, [x0], x2
- st1 {v3.16B}, [x0], x2
+ st1 {v2.16b}, [x0], x2
+ st1 {v3.16b}, [x0], x2
ret
.endm
.macro pixels16_xy2 rnd=1, avg=0
sub w3, w3, #2
- ld1 {v0.16B, v1.16B}, [x1], x2
- ld1 {v4.16B, v5.16B}, [x1], x2
+ ld1 {v0.16b, v1.16b}, [x1], x2
+ ld1 {v4.16b, v5.16b}, [x1], x2
NRND movi v26.8H, #1
- ext v1.16B, v0.16B, v1.16B, #1
- ext v5.16B, v4.16B, v5.16B, #1
- uaddl v16.8H, v0.8B, v1.8B
- uaddl2 v20.8H, v0.16B, v1.16B
- uaddl v18.8H, v4.8B, v5.8B
- uaddl2 v22.8H, v4.16B, v5.16B
+ ext v1.16b, v0.16b, v1.16b, #1
+ ext v5.16b, v4.16b, v5.16b, #1
+ uaddl v16.8h, v0.8b, v1.8b
+ uaddl2 v20.8h, v0.16b, v1.16b
+ uaddl v18.8h, v4.8b, v5.8b
+ uaddl2 v22.8h, v4.16b, v5.16b
1: subs w3, w3, #2
- ld1 {v0.16B, v1.16B}, [x1], x2
- add v24.8H, v16.8H, v18.8H
+ ld1 {v0.16b, v1.16b}, [x1], x2
+ add v24.8h, v16.8h, v18.8h
NRND add v24.8H, v24.8H, v26.8H
- ext v30.16B, v0.16B, v1.16B, #1
- add v1.8H, v20.8H, v22.8H
- mshrn v28.8B, v24.8H, #2
+ ext v30.16b, v0.16b, v1.16b, #1
+ add v1.8h, v20.8h, v22.8h
+ mshrn v28.8b, v24.8h, #2
NRND add v1.8H, v1.8H, v26.8H
- mshrn2 v28.16B, v1.8H, #2
+ mshrn2 v28.16b, v1.8h, #2
.if \avg
- ld1 {v16.16B}, [x0]
- urhadd v28.16B, v28.16B, v16.16B
+ ld1 {v16.16b}, [x0]
+ urhadd v28.16b, v28.16b, v16.16b
.endif
- uaddl v16.8H, v0.8B, v30.8B
- ld1 {v2.16B, v3.16B}, [x1], x2
- uaddl2 v20.8H, v0.16B, v30.16B
- st1 {v28.16B}, [x0], x2
- add v24.8H, v16.8H, v18.8H
+ uaddl v16.8h, v0.8b, v30.8b
+ ld1 {v2.16b, v3.16b}, [x1], x2
+ uaddl2 v20.8h, v0.16b, v30.16b
+ st1 {v28.16b}, [x0], x2
+ add v24.8h, v16.8h, v18.8h
NRND add v24.8H, v24.8H, v26.8H
- ext v3.16B, v2.16B, v3.16B, #1
- add v0.8H, v20.8H, v22.8H
- mshrn v30.8B, v24.8H, #2
+ ext v3.16b, v2.16b, v3.16b, #1
+ add v0.8h, v20.8h, v22.8h
+ mshrn v30.8b, v24.8h, #2
NRND add v0.8H, v0.8H, v26.8H
- mshrn2 v30.16B, v0.8H, #2
+ mshrn2 v30.16b, v0.8h, #2
.if \avg
- ld1 {v18.16B}, [x0]
- urhadd v30.16B, v30.16B, v18.16B
+ ld1 {v18.16b}, [x0]
+ urhadd v30.16b, v30.16b, v18.16b
.endif
- uaddl v18.8H, v2.8B, v3.8B
- uaddl2 v22.8H, v2.16B, v3.16B
- st1 {v30.16B}, [x0], x2
+ uaddl v18.8h, v2.8b, v3.8b
+ uaddl2 v22.8h, v2.16b, v3.16b
+ st1 {v30.16b}, [x0], x2
b.gt 1b
- ld1 {v0.16B, v1.16B}, [x1], x2
- add v24.8H, v16.8H, v18.8H
+ ld1 {v0.16b, v1.16b}, [x1], x2
+ add v24.8h, v16.8h, v18.8h
NRND add v24.8H, v24.8H, v26.8H
- ext v30.16B, v0.16B, v1.16B, #1
- add v1.8H, v20.8H, v22.8H
- mshrn v28.8B, v24.8H, #2
+ ext v30.16b, v0.16b, v1.16b, #1
+ add v1.8h, v20.8h, v22.8h
+ mshrn v28.8b, v24.8h, #2
NRND add v1.8H, v1.8H, v26.8H
- mshrn2 v28.16B, v1.8H, #2
+ mshrn2 v28.16b, v1.8h, #2
.if \avg
- ld1 {v16.16B}, [x0]
- urhadd v28.16B, v28.16B, v16.16B
+ ld1 {v16.16b}, [x0]
+ urhadd v28.16b, v28.16b, v16.16b
.endif
- uaddl v16.8H, v0.8B, v30.8B
- uaddl2 v20.8H, v0.16B, v30.16B
- st1 {v28.16B}, [x0], x2
- add v24.8H, v16.8H, v18.8H
+ uaddl v16.8h, v0.8b, v30.8b
+ uaddl2 v20.8h, v0.16b, v30.16b
+ st1 {v28.16b}, [x0], x2
+ add v24.8h, v16.8h, v18.8h
NRND add v24.8H, v24.8H, v26.8H
- add v0.8H, v20.8H, v22.8H
- mshrn v30.8B, v24.8H, #2
+ add v0.8h, v20.8h, v22.8h
+ mshrn v30.8b, v24.8h, #2
NRND add v0.8H, v0.8H, v26.8H
- mshrn2 v30.16B, v0.8H, #2
+ mshrn2 v30.16b, v0.8h, #2
.if \avg
- ld1 {v18.16B}, [x0]
- urhadd v30.16B, v30.16B, v18.16B
+ ld1 {v18.16b}, [x0]
+ urhadd v30.16b, v30.16b, v18.16b
.endif
- st1 {v30.16B}, [x0], x2
+ st1 {v30.16b}, [x0], x2
ret
.endm
.macro pixels8 rnd=1, avg=0
-1: ld1 {v0.8B}, [x1], x2
- ld1 {v1.8B}, [x1], x2
- ld1 {v2.8B}, [x1], x2
- ld1 {v3.8B}, [x1], x2
+1: ld1 {v0.8b}, [x1], x2
+ ld1 {v1.8b}, [x1], x2
+ ld1 {v2.8b}, [x1], x2
+ ld1 {v3.8b}, [x1], x2
.if \avg
- ld1 {v4.8B}, [x0], x2
- urhadd v0.8B, v0.8B, v4.8B
- ld1 {v5.8B}, [x0], x2
- urhadd v1.8B, v1.8B, v5.8B
- ld1 {v6.8B}, [x0], x2
- urhadd v2.8B, v2.8B, v6.8B
- ld1 {v7.8B}, [x0], x2
- urhadd v3.8B, v3.8B, v7.8B
+ ld1 {v4.8b}, [x0], x2
+ urhadd v0.8b, v0.8b, v4.8b
+ ld1 {v5.8b}, [x0], x2
+ urhadd v1.8b, v1.8b, v5.8b
+ ld1 {v6.8b}, [x0], x2
+ urhadd v2.8b, v2.8b, v6.8b
+ ld1 {v7.8b}, [x0], x2
+ urhadd v3.8b, v3.8b, v7.8b
sub x0, x0, x2, lsl #2
.endif
subs w3, w3, #4
- st1 {v0.8B}, [x0], x2
- st1 {v1.8B}, [x0], x2
- st1 {v2.8B}, [x0], x2
- st1 {v3.8B}, [x0], x2
+ st1 {v0.8b}, [x0], x2
+ st1 {v1.8b}, [x0], x2
+ st1 {v2.8b}, [x0], x2
+ st1 {v3.8b}, [x0], x2
b.ne 1b
ret
.endm
.macro pixels8_x2 rnd=1, avg=0
-1: ld1 {v0.8B, v1.8B}, [x1], x2
- ext v1.8B, v0.8B, v1.8B, #1
- ld1 {v2.8B, v3.8B}, [x1], x2
- ext v3.8B, v2.8B, v3.8B, #1
+1: ld1 {v0.8b, v1.8b}, [x1], x2
+ ext v1.8b, v0.8b, v1.8b, #1
+ ld1 {v2.8b, v3.8b}, [x1], x2
+ ext v3.8b, v2.8b, v3.8b, #1
subs w3, w3, #2
- avg v0.8B, v0.8B, v1.8B
- avg v2.8B, v2.8B, v3.8B
+ avg v0.8b, v0.8b, v1.8b
+ avg v2.8b, v2.8b, v3.8b
.if \avg
- ld1 {v4.8B}, [x0], x2
- ld1 {v5.8B}, [x0]
- urhadd v0.8B, v0.8B, v4.8B
- urhadd v2.8B, v2.8B, v5.8B
+ ld1 {v4.8b}, [x0], x2
+ ld1 {v5.8b}, [x0]
+ urhadd v0.8b, v0.8b, v4.8b
+ urhadd v2.8b, v2.8b, v5.8b
sub x0, x0, x2
.endif
- st1 {v0.8B}, [x0], x2
- st1 {v2.8B}, [x0], x2
+ st1 {v0.8b}, [x0], x2
+ st1 {v2.8b}, [x0], x2
b.ne 1b
ret
.endm
.macro pixels8_y2 rnd=1, avg=0
sub w3, w3, #2
- ld1 {v0.8B}, [x1], x2
- ld1 {v1.8B}, [x1], x2
+ ld1 {v0.8b}, [x1], x2
+ ld1 {v1.8b}, [x1], x2
1: subs w3, w3, #2
- avg v4.8B, v0.8B, v1.8B
- ld1 {v0.8B}, [x1], x2
- avg v5.8B, v0.8B, v1.8B
- ld1 {v1.8B}, [x1], x2
+ avg v4.8b, v0.8b, v1.8b
+ ld1 {v0.8b}, [x1], x2
+ avg v5.8b, v0.8b, v1.8b
+ ld1 {v1.8b}, [x1], x2
.if \avg
- ld1 {v2.8B}, [x0], x2
- ld1 {v3.8B}, [x0]
- urhadd v4.8B, v4.8B, v2.8B
- urhadd v5.8B, v5.8B, v3.8B
+ ld1 {v2.8b}, [x0], x2
+ ld1 {v3.8b}, [x0]
+ urhadd v4.8b, v4.8b, v2.8b
+ urhadd v5.8b, v5.8b, v3.8b
sub x0, x0, x2
.endif
- st1 {v4.8B}, [x0], x2
- st1 {v5.8B}, [x0], x2
+ st1 {v4.8b}, [x0], x2
+ st1 {v5.8b}, [x0], x2
b.ne 1b
- avg v4.8B, v0.8B, v1.8B
- ld1 {v0.8B}, [x1], x2
- avg v5.8B, v0.8B, v1.8B
+ avg v4.8b, v0.8b, v1.8b
+ ld1 {v0.8b}, [x1], x2
+ avg v5.8b, v0.8b, v1.8b
.if \avg
- ld1 {v2.8B}, [x0], x2
- ld1 {v3.8B}, [x0]
- urhadd v4.8B, v4.8B, v2.8B
- urhadd v5.8B, v5.8B, v3.8B
+ ld1 {v2.8b}, [x0], x2
+ ld1 {v3.8b}, [x0]
+ urhadd v4.8b, v4.8b, v2.8b
+ urhadd v5.8b, v5.8b, v3.8b
sub x0, x0, x2
.endif
- st1 {v4.8B}, [x0], x2
- st1 {v5.8B}, [x0], x2
+ st1 {v4.8b}, [x0], x2
+ st1 {v5.8b}, [x0], x2
ret
.endm
.macro pixels8_xy2 rnd=1, avg=0
sub w3, w3, #2
- ld1 {v0.16B}, [x1], x2
- ld1 {v1.16B}, [x1], x2
+ ld1 {v0.16b}, [x1], x2
+ ld1 {v1.16b}, [x1], x2
NRND movi v19.8H, #1
- ext v4.16B, v0.16B, v4.16B, #1
- ext v6.16B, v1.16B, v6.16B, #1
- uaddl v16.8H, v0.8B, v4.8B
- uaddl v17.8H, v1.8B, v6.8B
+ ext v4.16b, v0.16b, v4.16b, #1
+ ext v6.16b, v1.16b, v6.16b, #1
+ uaddl v16.8h, v0.8b, v4.8b
+ uaddl v17.8h, v1.8b, v6.8b
1: subs w3, w3, #2
- ld1 {v0.16B}, [x1], x2
- add v18.8H, v16.8H, v17.8H
- ext v4.16B, v0.16B, v4.16B, #1
+ ld1 {v0.16b}, [x1], x2
+ add v18.8h, v16.8h, v17.8h
+ ext v4.16b, v0.16b, v4.16b, #1
NRND add v18.8H, v18.8H, v19.8H
- uaddl v16.8H, v0.8B, v4.8B
- mshrn v5.8B, v18.8H, #2
- ld1 {v1.16B}, [x1], x2
- add v18.8H, v16.8H, v17.8H
+ uaddl v16.8h, v0.8b, v4.8b
+ mshrn v5.8b, v18.8h, #2
+ ld1 {v1.16b}, [x1], x2
+ add v18.8h, v16.8h, v17.8h
.if \avg
- ld1 {v7.8B}, [x0]
- urhadd v5.8B, v5.8B, v7.8B
+ ld1 {v7.8b}, [x0]
+ urhadd v5.8b, v5.8b, v7.8b
.endif
NRND add v18.8H, v18.8H, v19.8H
- st1 {v5.8B}, [x0], x2
- mshrn v7.8B, v18.8H, #2
+ st1 {v5.8b}, [x0], x2
+ mshrn v7.8b, v18.8h, #2
.if \avg
- ld1 {v5.8B}, [x0]
- urhadd v7.8B, v7.8B, v5.8B
+ ld1 {v5.8b}, [x0]
+ urhadd v7.8b, v7.8b, v5.8b
.endif
- ext v6.16B, v1.16B, v6.16B, #1
- uaddl v17.8H, v1.8B, v6.8B
- st1 {v7.8B}, [x0], x2
+ ext v6.16b, v1.16b, v6.16b, #1
+ uaddl v17.8h, v1.8b, v6.8b
+ st1 {v7.8b}, [x0], x2
b.gt 1b
- ld1 {v0.16B}, [x1], x2
- add v18.8H, v16.8H, v17.8H
- ext v4.16B, v0.16B, v4.16B, #1
+ ld1 {v0.16b}, [x1], x2
+ add v18.8h, v16.8h, v17.8h
+ ext v4.16b, v0.16b, v4.16b, #1
NRND add v18.8H, v18.8H, v19.8H
- uaddl v16.8H, v0.8B, v4.8B
- mshrn v5.8B, v18.8H, #2
- add v18.8H, v16.8H, v17.8H
+ uaddl v16.8h, v0.8b, v4.8b
+ mshrn v5.8b, v18.8h, #2
+ add v18.8h, v16.8h, v17.8h
.if \avg
- ld1 {v7.8B}, [x0]
- urhadd v5.8B, v5.8B, v7.8B
+ ld1 {v7.8b}, [x0]
+ urhadd v5.8b, v5.8b, v7.8b
.endif
NRND add v18.8H, v18.8H, v19.8H
- st1 {v5.8B}, [x0], x2
- mshrn v7.8B, v18.8H, #2
+ st1 {v5.8b}, [x0], x2
+ mshrn v7.8b, v18.8h, #2
.if \avg
- ld1 {v5.8B}, [x0]
- urhadd v7.8B, v7.8B, v5.8B
+ ld1 {v5.8b}, [x0]
+ urhadd v7.8b, v7.8b, v5.8b
.endif
- st1 {v7.8B}, [x0], x2
+ st1 {v7.8b}, [x0], x2
ret
.endm
diff --git a/libavcodec/aarch64/neon.S b/libavcodec/aarch64/neon.S
index 0fddbecae3..237890f61e 100644
--- a/libavcodec/aarch64/neon.S
+++ b/libavcodec/aarch64/neon.S
@@ -17,133 +17,133 @@
*/
.macro transpose_8x8B r0, r1, r2, r3, r4, r5, r6, r7, r8, r9
- trn1 \r8\().8B, \r0\().8B, \r1\().8B
- trn2 \r9\().8B, \r0\().8B, \r1\().8B
- trn1 \r1\().8B, \r2\().8B, \r3\().8B
- trn2 \r3\().8B, \r2\().8B, \r3\().8B
- trn1 \r0\().8B, \r4\().8B, \r5\().8B
- trn2 \r5\().8B, \r4\().8B, \r5\().8B
- trn1 \r2\().8B, \r6\().8B, \r7\().8B
- trn2 \r7\().8B, \r6\().8B, \r7\().8B
+ trn1 \r8\().8b, \r0\().8b, \r1\().8b
+ trn2 \r9\().8b, \r0\().8b, \r1\().8b
+ trn1 \r1\().8b, \r2\().8b, \r3\().8b
+ trn2 \r3\().8b, \r2\().8b, \r3\().8b
+ trn1 \r0\().8b, \r4\().8b, \r5\().8b
+ trn2 \r5\().8b, \r4\().8b, \r5\().8b
+ trn1 \r2\().8b, \r6\().8b, \r7\().8b
+ trn2 \r7\().8b, \r6\().8b, \r7\().8b
- trn1 \r4\().4H, \r0\().4H, \r2\().4H
- trn2 \r2\().4H, \r0\().4H, \r2\().4H
- trn1 \r6\().4H, \r5\().4H, \r7\().4H
- trn2 \r7\().4H, \r5\().4H, \r7\().4H
- trn1 \r5\().4H, \r9\().4H, \r3\().4H
- trn2 \r9\().4H, \r9\().4H, \r3\().4H
- trn1 \r3\().4H, \r8\().4H, \r1\().4H
- trn2 \r8\().4H, \r8\().4H, \r1\().4H
+ trn1 \r4\().4h, \r0\().4h, \r2\().4h
+ trn2 \r2\().4h, \r0\().4h, \r2\().4h
+ trn1 \r6\().4h, \r5\().4h, \r7\().4h
+ trn2 \r7\().4h, \r5\().4h, \r7\().4h
+ trn1 \r5\().4h, \r9\().4h, \r3\().4h
+ trn2 \r9\().4h, \r9\().4h, \r3\().4h
+ trn1 \r3\().4h, \r8\().4h, \r1\().4h
+ trn2 \r8\().4h, \r8\().4h, \r1\().4h
- trn1 \r0\().2S, \r3\().2S, \r4\().2S
- trn2 \r4\().2S, \r3\().2S, \r4\().2S
+ trn1 \r0\().2s, \r3\().2s, \r4\().2s
+ trn2 \r4\().2s, \r3\().2s, \r4\().2s
- trn1 \r1\().2S, \r5\().2S, \r6\().2S
- trn2 \r5\().2S, \r5\().2S, \r6\().2S
+ trn1 \r1\().2s, \r5\().2s, \r6\().2s
+ trn2 \r5\().2s, \r5\().2s, \r6\().2s
- trn2 \r6\().2S, \r8\().2S, \r2\().2S
- trn1 \r2\().2S, \r8\().2S, \r2\().2S
+ trn2 \r6\().2s, \r8\().2s, \r2\().2s
+ trn1 \r2\().2s, \r8\().2s, \r2\().2s
- trn1 \r3\().2S, \r9\().2S, \r7\().2S
- trn2 \r7\().2S, \r9\().2S, \r7\().2S
+ trn1 \r3\().2s, \r9\().2s, \r7\().2s
+ trn2 \r7\().2s, \r9\().2s, \r7\().2s
.endm
.macro transpose_8x16B r0, r1, r2, r3, r4, r5, r6, r7, t0, t1
- trn1 \t0\().16B, \r0\().16B, \r1\().16B
- trn2 \t1\().16B, \r0\().16B, \r1\().16B
- trn1 \r1\().16B, \r2\().16B, \r3\().16B
- trn2 \r3\().16B, \r2\().16B, \r3\().16B
- trn1 \r0\().16B, \r4\().16B, \r5\().16B
- trn2 \r5\().16B, \r4\().16B, \r5\().16B
- trn1 \r2\().16B, \r6\().16B, \r7\().16B
- trn2 \r7\().16B, \r6\().16B, \r7\().16B
+ trn1 \t0\().16b, \r0\().16b, \r1\().16b
+ trn2 \t1\().16b, \r0\().16b, \r1\().16b
+ trn1 \r1\().16b, \r2\().16b, \r3\().16b
+ trn2 \r3\().16b, \r2\().16b, \r3\().16b
+ trn1 \r0\().16b, \r4\().16b, \r5\().16b
+ trn2 \r5\().16b, \r4\().16b, \r5\().16b
+ trn1 \r2\().16b, \r6\().16b, \r7\().16b
+ trn2 \r7\().16b, \r6\().16b, \r7\().16b
- trn1 \r4\().8H, \r0\().8H, \r2\().8H
- trn2 \r2\().8H, \r0\().8H, \r2\().8H
- trn1 \r6\().8H, \r5\().8H, \r7\().8H
- trn2 \r7\().8H, \r5\().8H, \r7\().8H
- trn1 \r5\().8H, \t1\().8H, \r3\().8H
- trn2 \t1\().8H, \t1\().8H, \r3\().8H
- trn1 \r3\().8H, \t0\().8H, \r1\().8H
- trn2 \t0\().8H, \t0\().8H, \r1\().8H
+ trn1 \r4\().8h, \r0\().8h, \r2\().8h
+ trn2 \r2\().8h, \r0\().8h, \r2\().8h
+ trn1 \r6\().8h, \r5\().8h, \r7\().8h
+ trn2 \r7\().8h, \r5\().8h, \r7\().8h
+ trn1 \r5\().8h, \t1\().8h, \r3\().8h
+ trn2 \t1\().8h, \t1\().8h, \r3\().8h
+ trn1 \r3\().8h, \t0\().8h, \r1\().8h
+ trn2 \t0\().8h, \t0\().8h, \r1\().8h
- trn1 \r0\().4S, \r3\().4S, \r4\().4S
- trn2 \r4\().4S, \r3\().4S, \r4\().4S
+ trn1 \r0\().4s, \r3\().4s, \r4\().4s
+ trn2 \r4\().4s, \r3\().4s, \r4\().4s
- trn1 \r1\().4S, \r5\().4S, \r6\().4S
- trn2 \r5\().4S, \r5\().4S, \r6\().4S
+ trn1 \r1\().4s, \r5\().4s, \r6\().4s
+ trn2 \r5\().4s, \r5\().4s, \r6\().4s
- trn2 \r6\().4S, \t0\().4S, \r2\().4S
- trn1 \r2\().4S, \t0\().4S, \r2\().4S
+ trn2 \r6\().4s, \t0\().4s, \r2\().4s
+ trn1 \r2\().4s, \t0\().4s, \r2\().4s
- trn1 \r3\().4S, \t1\().4S, \r7\().4S
- trn2 \r7\().4S, \t1\().4S, \r7\().4S
+ trn1 \r3\().4s, \t1\().4s, \r7\().4s
+ trn2 \r7\().4s, \t1\().4s, \r7\().4s
.endm
.macro transpose_4x16B r0, r1, r2, r3, t4, t5, t6, t7
- trn1 \t4\().16B, \r0\().16B, \r1\().16B
- trn2 \t5\().16B, \r0\().16B, \r1\().16B
- trn1 \t6\().16B, \r2\().16B, \r3\().16B
- trn2 \t7\().16B, \r2\().16B, \r3\().16B
+ trn1 \t4\().16b, \r0\().16b, \r1\().16b
+ trn2 \t5\().16b, \r0\().16b, \r1\().16b
+ trn1 \t6\().16b, \r2\().16b, \r3\().16b
+ trn2 \t7\().16b, \r2\().16b, \r3\().16b
- trn1 \r0\().8H, \t4\().8H, \t6\().8H
- trn2 \r2\().8H, \t4\().8H, \t6\().8H
- trn1 \r1\().8H, \t5\().8H, \t7\().8H
- trn2 \r3\().8H, \t5\().8H, \t7\().8H
+ trn1 \r0\().8h, \t4\().8h, \t6\().8h
+ trn2 \r2\().8h, \t4\().8h, \t6\().8h
+ trn1 \r1\().8h, \t5\().8h, \t7\().8h
+ trn2 \r3\().8h, \t5\().8h, \t7\().8h
.endm
.macro transpose_4x8B r0, r1, r2, r3, t4, t5, t6, t7
- trn1 \t4\().8B, \r0\().8B, \r1\().8B
- trn2 \t5\().8B, \r0\().8B, \r1\().8B
- trn1 \t6\().8B, \r2\().8B, \r3\().8B
- trn2 \t7\().8B, \r2\().8B, \r3\().8B
+ trn1 \t4\().8b, \r0\().8b, \r1\().8b
+ trn2 \t5\().8b, \r0\().8b, \r1\().8b
+ trn1 \t6\().8b, \r2\().8b, \r3\().8b
+ trn2 \t7\().8b, \r2\().8b, \r3\().8b
- trn1 \r0\().4H, \t4\().4H, \t6\().4H
- trn2 \r2\().4H, \t4\().4H, \t6\().4H
- trn1 \r1\().4H, \t5\().4H, \t7\().4H
- trn2 \r3\().4H, \t5\().4H, \t7\().4H
+ trn1 \r0\().4h, \t4\().4h, \t6\().4h
+ trn2 \r2\().4h, \t4\().4h, \t6\().4h
+ trn1 \r1\().4h, \t5\().4h, \t7\().4h
+ trn2 \r3\().4h, \t5\().4h, \t7\().4h
.endm
.macro transpose_4x4H r0, r1, r2, r3, r4, r5, r6, r7
- trn1 \r4\().4H, \r0\().4H, \r1\().4H
- trn2 \r5\().4H, \r0\().4H, \r1\().4H
- trn1 \r6\().4H, \r2\().4H, \r3\().4H
- trn2 \r7\().4H, \r2\().4H, \r3\().4H
- trn1 \r0\().2S, \r4\().2S, \r6\().2S
- trn2 \r2\().2S, \r4\().2S, \r6\().2S
- trn1 \r1\().2S, \r5\().2S, \r7\().2S
- trn2 \r3\().2S, \r5\().2S, \r7\().2S
+ trn1 \r4\().4h, \r0\().4h, \r1\().4h
+ trn2 \r5\().4h, \r0\().4h, \r1\().4h
+ trn1 \r6\().4h, \r2\().4h, \r3\().4h
+ trn2 \r7\().4h, \r2\().4h, \r3\().4h
+ trn1 \r0\().2s, \r4\().2s, \r6\().2s
+ trn2 \r2\().2s, \r4\().2s, \r6\().2s
+ trn1 \r1\().2s, \r5\().2s, \r7\().2s
+ trn2 \r3\().2s, \r5\().2s, \r7\().2s
.endm
.macro transpose_8x8H r0, r1, r2, r3, r4, r5, r6, r7, r8, r9
- trn1 \r8\().8H, \r0\().8H, \r1\().8H
- trn2 \r9\().8H, \r0\().8H, \r1\().8H
- trn1 \r1\().8H, \r2\().8H, \r3\().8H
- trn2 \r3\().8H, \r2\().8H, \r3\().8H
- trn1 \r0\().8H, \r4\().8H, \r5\().8H
- trn2 \r5\().8H, \r4\().8H, \r5\().8H
- trn1 \r2\().8H, \r6\().8H, \r7\().8H
- trn2 \r7\().8H, \r6\().8H, \r7\().8H
+ trn1 \r8\().8h, \r0\().8h, \r1\().8h
+ trn2 \r9\().8h, \r0\().8h, \r1\().8h
+ trn1 \r1\().8h, \r2\().8h, \r3\().8h
+ trn2 \r3\().8h, \r2\().8h, \r3\().8h
+ trn1 \r0\().8h, \r4\().8h, \r5\().8h
+ trn2 \r5\().8h, \r4\().8h, \r5\().8h
+ trn1 \r2\().8h, \r6\().8h, \r7\().8h
+ trn2 \r7\().8h, \r6\().8h, \r7\().8h
- trn1 \r4\().4S, \r0\().4S, \r2\().4S
- trn2 \r2\().4S, \r0\().4S, \r2\().4S
- trn1 \r6\().4S, \r5\().4S, \r7\().4S
- trn2 \r7\().4S, \r5\().4S, \r7\().4S
- trn1 \r5\().4S, \r9\().4S, \r3\().4S
- trn2 \r9\().4S, \r9\().4S, \r3\().4S
- trn1 \r3\().4S, \r8\().4S, \r1\().4S
- trn2 \r8\().4S, \r8\().4S, \r1\().4S
+ trn1 \r4\().4s, \r0\().4s, \r2\().4s
+ trn2 \r2\().4s, \r0\().4s, \r2\().4s
+ trn1 \r6\().4s, \r5\().4s, \r7\().4s
+ trn2 \r7\().4s, \r5\().4s, \r7\().4s
+ trn1 \r5\().4s, \r9\().4s, \r3\().4s
+ trn2 \r9\().4s, \r9\().4s, \r3\().4s
+ trn1 \r3\().4s, \r8\().4s, \r1\().4s
+ trn2 \r8\().4s, \r8\().4s, \r1\().4s
- trn1 \r0\().2D, \r3\().2D, \r4\().2D
- trn2 \r4\().2D, \r3\().2D, \r4\().2D
+ trn1 \r0\().2d, \r3\().2d, \r4\().2d
+ trn2 \r4\().2d, \r3\().2d, \r4\().2d
- trn1 \r1\().2D, \r5\().2D, \r6\().2D
- trn2 \r5\().2D, \r5\().2D, \r6\().2D
+ trn1 \r1\().2d, \r5\().2d, \r6\().2d
+ trn2 \r5\().2d, \r5\().2d, \r6\().2d
- trn2 \r6\().2D, \r8\().2D, \r2\().2D
- trn1 \r2\().2D, \r8\().2D, \r2\().2D
+ trn2 \r6\().2d, \r8\().2d, \r2\().2d
+ trn1 \r2\().2d, \r8\().2d, \r2\().2d
- trn1 \r3\().2D, \r9\().2D, \r7\().2D
- trn2 \r7\().2D, \r9\().2D, \r7\().2D
+ trn1 \r3\().2d, \r9\().2d, \r7\().2d
+ trn2 \r7\().2d, \r9\().2d, \r7\().2d
.endm
diff --git a/libavcodec/aarch64/opusdsp_neon.S b/libavcodec/aarch64/opusdsp_neon.S
index 46c2be0874..e933151ab4 100644
--- a/libavcodec/aarch64/opusdsp_neon.S
+++ b/libavcodec/aarch64/opusdsp_neon.S
@@ -33,81 +33,81 @@ const tab_x2, align=4
endconst
function ff_opus_deemphasis_neon, export=1
- movrel x4, tab_st
- ld1 {v4.4s}, [x4]
- movrel x4, tab_x0
- ld1 {v5.4s}, [x4]
- movrel x4, tab_x1
- ld1 {v6.4s}, [x4]
- movrel x4, tab_x2
- ld1 {v7.4s}, [x4]
+ movrel x4, tab_st
+ ld1 {v4.4s}, [x4]
+ movrel x4, tab_x0
+ ld1 {v5.4s}, [x4]
+ movrel x4, tab_x1
+ ld1 {v6.4s}, [x4]
+ movrel x4, tab_x2
+ ld1 {v7.4s}, [x4]
- fmul v0.4s, v4.4s, v0.s[0]
+ fmul v0.4s, v4.4s, v0.s[0]
-1: ld1 {v1.4s, v2.4s}, [x1], #32
+1: ld1 {v1.4s, v2.4s}, [x1], #32
- fmla v0.4s, v5.4s, v1.s[0]
- fmul v3.4s, v7.4s, v2.s[2]
+ fmla v0.4s, v5.4s, v1.s[0]
+ fmul v3.4s, v7.4s, v2.s[2]
- fmla v0.4s, v6.4s, v1.s[1]
- fmla v3.4s, v6.4s, v2.s[1]
+ fmla v0.4s, v6.4s, v1.s[1]
+ fmla v3.4s, v6.4s, v2.s[1]
- fmla v0.4s, v7.4s, v1.s[2]
- fmla v3.4s, v5.4s, v2.s[0]
+ fmla v0.4s, v7.4s, v1.s[2]
+ fmla v3.4s, v5.4s, v2.s[0]
- fadd v1.4s, v1.4s, v0.4s
- fadd v2.4s, v2.4s, v3.4s
+ fadd v1.4s, v1.4s, v0.4s
+ fadd v2.4s, v2.4s, v3.4s
- fmla v2.4s, v4.4s, v1.s[3]
+ fmla v2.4s, v4.4s, v1.s[3]
- st1 {v1.4s, v2.4s}, [x0], #32
- fmul v0.4s, v4.4s, v2.s[3]
+ st1 {v1.4s, v2.4s}, [x0], #32
+ fmul v0.4s, v4.4s, v2.s[3]
- subs w2, w2, #8
- b.gt 1b
+ subs w2, w2, #8
+ b.gt 1b
- mov s0, v2.s[3]
+ mov s0, v2.s[3]
ret
endfunc
function ff_opus_postfilter_neon, export=1
- ld1 {v0.4s}, [x2]
- dup v1.4s, v0.s[1]
- dup v2.4s, v0.s[2]
- dup v0.4s, v0.s[0]
+ ld1 {v0.4s}, [x2]
+ dup v1.4s, v0.s[1]
+ dup v2.4s, v0.s[2]
+ dup v0.4s, v0.s[0]
- add w1, w1, #2
- sub x1, x0, x1, lsl #2
+ add w1, w1, #2
+ sub x1, x0, x1, lsl #2
- ld1 {v3.4s}, [x1]
- fmul v3.4s, v3.4s, v2.4s
+ ld1 {v3.4s}, [x1]
+ fmul v3.4s, v3.4s, v2.4s
-1: add x1, x1, #4
- ld1 {v4.4s}, [x1]
- add x1, x1, #4
- ld1 {v5.4s}, [x1]
- add x1, x1, #4
- ld1 {v6.4s}, [x1]
- add x1, x1, #4
- ld1 {v7.4s}, [x1]
+1: add x1, x1, #4
+ ld1 {v4.4s}, [x1]
+ add x1, x1, #4
+ ld1 {v5.4s}, [x1]
+ add x1, x1, #4
+ ld1 {v6.4s}, [x1]
+ add x1, x1, #4
+ ld1 {v7.4s}, [x1]
- fmla v3.4s, v7.4s, v2.4s
- fadd v6.4s, v6.4s, v4.4s
+ fmla v3.4s, v7.4s, v2.4s
+ fadd v6.4s, v6.4s, v4.4s
- ld1 {v4.4s}, [x0]
- fmla v4.4s, v5.4s, v0.4s
+ ld1 {v4.4s}, [x0]
+ fmla v4.4s, v5.4s, v0.4s
- fmul v6.4s, v6.4s, v1.4s
- fadd v6.4s, v6.4s, v3.4s
+ fmul v6.4s, v6.4s, v1.4s
+ fadd v6.4s, v6.4s, v3.4s
- fadd v4.4s, v4.4s, v6.4s
- fmul v3.4s, v7.4s, v2.4s
+ fadd v4.4s, v4.4s, v6.4s
+ fmul v3.4s, v7.4s, v2.4s
- st1 {v4.4s}, [x0], #16
+ st1 {v4.4s}, [x0], #16
- subs w3, w3, #4
- b.gt 1b
+ subs w3, w3, #4
+ b.gt 1b
ret
endfunc
diff --git a/libavcodec/aarch64/sbrdsp_neon.S b/libavcodec/aarch64/sbrdsp_neon.S
index d23717e760..1fdde6ccb6 100644
--- a/libavcodec/aarch64/sbrdsp_neon.S
+++ b/libavcodec/aarch64/sbrdsp_neon.S
@@ -46,49 +46,49 @@ function ff_sbr_sum64x5_neon, export=1
add x3, x0, #192*4
add x4, x0, #256*4
mov x5, #64
-1: ld1 {v0.4S}, [x0]
- ld1 {v1.4S}, [x1], #16
- fadd v0.4S, v0.4S, v1.4S
- ld1 {v2.4S}, [x2], #16
- fadd v0.4S, v0.4S, v2.4S
- ld1 {v3.4S}, [x3], #16
- fadd v0.4S, v0.4S, v3.4S
- ld1 {v4.4S}, [x4], #16
- fadd v0.4S, v0.4S, v4.4S
- st1 {v0.4S}, [x0], #16
+1: ld1 {v0.4s}, [x0]
+ ld1 {v1.4s}, [x1], #16
+ fadd v0.4s, v0.4s, v1.4s
+ ld1 {v2.4s}, [x2], #16
+ fadd v0.4s, v0.4s, v2.4s
+ ld1 {v3.4s}, [x3], #16
+ fadd v0.4s, v0.4s, v3.4s
+ ld1 {v4.4s}, [x4], #16
+ fadd v0.4s, v0.4s, v4.4s
+ st1 {v0.4s}, [x0], #16
subs x5, x5, #4
b.gt 1b
ret
endfunc
function ff_sbr_sum_square_neon, export=1
- movi v0.4S, #0
-1: ld1 {v1.4S}, [x0], #16
- fmla v0.4S, v1.4S, v1.4S
+ movi v0.4s, #0
+1: ld1 {v1.4s}, [x0], #16
+ fmla v0.4s, v1.4s, v1.4s
subs w1, w1, #2
b.gt 1b
- faddp v0.4S, v0.4S, v0.4S
- faddp v0.4S, v0.4S, v0.4S
+ faddp v0.4s, v0.4s, v0.4s
+ faddp v0.4s, v0.4s, v0.4s
ret
endfunc
function ff_sbr_neg_odd_64_neon, export=1
mov x1, x0
- movi v5.4S, #1<<7, lsl #24
- ld2 {v0.4S, v1.4S}, [x0], #32
- eor v1.16B, v1.16B, v5.16B
- ld2 {v2.4S, v3.4S}, [x0], #32
+ movi v5.4s, #1<<7, lsl #24
+ ld2 {v0.4s, v1.4s}, [x0], #32
+ eor v1.16b, v1.16b, v5.16b
+ ld2 {v2.4s, v3.4s}, [x0], #32
.rept 3
- st2 {v0.4S, v1.4S}, [x1], #32
- eor v3.16B, v3.16B, v5.16B
- ld2 {v0.4S, v1.4S}, [x0], #32
- st2 {v2.4S, v3.4S}, [x1], #32
- eor v1.16B, v1.16B, v5.16B
- ld2 {v2.4S, v3.4S}, [x0], #32
+ st2 {v0.4s, v1.4s}, [x1], #32
+ eor v3.16b, v3.16b, v5.16b
+ ld2 {v0.4s, v1.4s}, [x0], #32
+ st2 {v2.4s, v3.4s}, [x1], #32
+ eor v1.16b, v1.16b, v5.16b
+ ld2 {v2.4s, v3.4s}, [x0], #32
.endr
- eor v3.16B, v3.16B, v5.16B
- st2 {v0.4S, v1.4S}, [x1], #32
- st2 {v2.4S, v3.4S}, [x1], #32
+ eor v3.16b, v3.16b, v5.16b
+ st2 {v0.4s, v1.4s}, [x1], #32
+ st2 {v2.4s, v3.4s}, [x1], #32
ret
endfunc
@@ -97,26 +97,26 @@ function ff_sbr_qmf_pre_shuffle_neon, export=1
add x2, x0, #64*4
mov x3, #-16
mov x4, #-4
- movi v6.4S, #1<<7, lsl #24
- ld1 {v0.2S}, [x0], #8
- st1 {v0.2S}, [x2], #8
+ movi v6.4s, #1<<7, lsl #24
+ ld1 {v0.2s}, [x0], #8
+ st1 {v0.2s}, [x2], #8
.rept 7
- ld1 {v1.4S}, [x1], x3
- ld1 {v2.4S}, [x0], #16
- eor v1.16B, v1.16B, v6.16B
- rev64 v1.4S, v1.4S
- ext v1.16B, v1.16B, v1.16B, #8
- st2 {v1.4S, v2.4S}, [x2], #32
+ ld1 {v1.4s}, [x1], x3
+ ld1 {v2.4s}, [x0], #16
+ eor v1.16b, v1.16b, v6.16b
+ rev64 v1.4s, v1.4s
+ ext v1.16b, v1.16b, v1.16b, #8
+ st2 {v1.4s, v2.4s}, [x2], #32
.endr
add x1, x1, #8
- ld1 {v1.2S}, [x1], x4
- ld1 {v2.2S}, [x0], #8
- ld1 {v1.S}[3], [x1]
- ld1 {v2.S}[2], [x0]
- eor v1.16B, v1.16B, v6.16B
- rev64 v1.4S, v1.4S
- st2 {v1.2S, v2.2S}, [x2], #16
- st2 {v1.S, v2.S}[2], [x2]
+ ld1 {v1.2s}, [x1], x4
+ ld1 {v2.2s}, [x0], #8
+ ld1 {v1.s}[3], [x1]
+ ld1 {v2.s}[2], [x0]
+ eor v1.16b, v1.16b, v6.16b
+ rev64 v1.4s, v1.4s
+ st2 {v1.2s, v2.2s}, [x2], #16
+ st2 {v1.s, v2.s}[2], [x2]
ret
endfunc
@@ -124,13 +124,13 @@ function ff_sbr_qmf_post_shuffle_neon, export=1
add x2, x1, #60*4
mov x3, #-16
mov x4, #32
- movi v6.4S, #1<<7, lsl #24
-1: ld1 {v0.4S}, [x2], x3
- ld1 {v1.4S}, [x1], #16
- eor v0.16B, v0.16B, v6.16B
- rev64 v0.4S, v0.4S
- ext v0.16B, v0.16B, v0.16B, #8
- st2 {v0.4S, v1.4S}, [x0], #32
+ movi v6.4s, #1<<7, lsl #24
+1: ld1 {v0.4s}, [x2], x3
+ ld1 {v1.4s}, [x1], #16
+ eor v0.16b, v0.16b, v6.16b
+ rev64 v0.4s, v0.4s
+ ext v0.16b, v0.16b, v0.16b, #8
+ st2 {v0.4s, v1.4s}, [x0], #32
subs x4, x4, #4
b.gt 1b
ret
@@ -141,13 +141,13 @@ function ff_sbr_qmf_deint_neg_neon, export=1
add x2, x0, #60*4
mov x3, #-32
mov x4, #32
- movi v2.4S, #1<<7, lsl #24
-1: ld2 {v0.4S, v1.4S}, [x1], x3
- eor v0.16B, v0.16B, v2.16B
- rev64 v1.4S, v1.4S
- ext v1.16B, v1.16B, v1.16B, #8
- st1 {v0.4S}, [x2]
- st1 {v1.4S}, [x0], #16
+ movi v2.4s, #1<<7, lsl #24
+1: ld2 {v0.4s, v1.4s}, [x1], x3
+ eor v0.16b, v0.16b, v2.16b
+ rev64 v1.4s, v1.4s
+ ext v1.16b, v1.16b, v1.16b, #8
+ st1 {v0.4s}, [x2]
+ st1 {v1.4s}, [x0], #16
sub x2, x2, #16
subs x4, x4, #4
b.gt 1b
@@ -159,16 +159,16 @@ function ff_sbr_qmf_deint_bfly_neon, export=1
add x3, x0, #124*4
mov x4, #64
mov x5, #-16
-1: ld1 {v0.4S}, [x1], #16
- ld1 {v1.4S}, [x2], x5
- rev64 v2.4S, v0.4S
- ext v2.16B, v2.16B, v2.16B, #8
- rev64 v3.4S, v1.4S
- ext v3.16B, v3.16B, v3.16B, #8
- fadd v1.4S, v1.4S, v2.4S
- fsub v0.4S, v0.4S, v3.4S
- st1 {v0.4S}, [x0], #16
- st1 {v1.4S}, [x3], x5
+1: ld1 {v0.4s}, [x1], #16
+ ld1 {v1.4s}, [x2], x5
+ rev64 v2.4s, v0.4s
+ ext v2.16b, v2.16b, v2.16b, #8
+ rev64 v3.4s, v1.4s
+ ext v3.16b, v3.16b, v3.16b, #8
+ fadd v1.4s, v1.4s, v2.4s
+ fsub v0.4s, v0.4s, v3.4s
+ st1 {v0.4s}, [x0], #16
+ st1 {v1.4s}, [x3], x5
subs x4, x4, #4
b.gt 1b
ret
@@ -178,32 +178,32 @@ function ff_sbr_hf_gen_neon, export=1
sxtw x4, w4
sxtw x5, w5
movrel x6, factors
- ld1 {v7.4S}, [x6]
- dup v1.4S, v0.S[0]
- mov v2.8B, v1.8B
- mov v2.S[2], v7.S[0]
- mov v2.S[3], v7.S[0]
- fmul v1.4S, v1.4S, v2.4S
- ld1 {v0.D}[0], [x3]
- ld1 {v0.D}[1], [x2]
- fmul v0.4S, v0.4S, v1.4S
- fmul v1.4S, v0.4S, v7.4S
- rev64 v0.4S, v0.4S
+ ld1 {v7.4s}, [x6]
+ dup v1.4s, v0.s[0]
+ mov v2.8b, v1.8b
+ mov v2.s[2], v7.s[0]
+ mov v2.s[3], v7.s[0]
+ fmul v1.4s, v1.4s, v2.4s
+ ld1 {v0.d}[0], [x3]
+ ld1 {v0.d}[1], [x2]
+ fmul v0.4s, v0.4s, v1.4s
+ fmul v1.4s, v0.4s, v7.4s
+ rev64 v0.4s, v0.4s
sub x7, x5, x4
add x0, x0, x4, lsl #3
add x1, x1, x4, lsl #3
sub x1, x1, #16
-1: ld1 {v2.4S}, [x1], #16
- ld1 {v3.2S}, [x1]
- fmul v4.4S, v2.4S, v1.4S
- fmul v5.4S, v2.4S, v0.4S
- faddp v4.4S, v4.4S, v4.4S
- faddp v5.4S, v5.4S, v5.4S
- faddp v4.4S, v4.4S, v4.4S
- faddp v5.4S, v5.4S, v5.4S
- mov v4.S[1], v5.S[0]
- fadd v4.2S, v4.2S, v3.2S
- st1 {v4.2S}, [x0], #8
+1: ld1 {v2.4s}, [x1], #16
+ ld1 {v3.2s}, [x1]
+ fmul v4.4s, v2.4s, v1.4s
+ fmul v5.4s, v2.4s, v0.4s
+ faddp v4.4s, v4.4s, v4.4s
+ faddp v5.4s, v5.4s, v5.4s
+ faddp v4.4s, v4.4s, v4.4s
+ faddp v5.4s, v5.4s, v5.4s
+ mov v4.s[1], v5.s[0]
+ fadd v4.2s, v4.2s, v3.2s
+ st1 {v4.2s}, [x0], #8
sub x1, x1, #8
subs x7, x7, #1
b.gt 1b
@@ -215,10 +215,10 @@ function ff_sbr_hf_g_filt_neon, export=1
sxtw x4, w4
mov x5, #40*2*4
add x1, x1, x4, lsl #3
-1: ld1 {v0.2S}, [x1], x5
- ld1 {v1.S}[0], [x2], #4
- fmul v2.4S, v0.4S, v1.S[0]
- st1 {v2.2S}, [x0], #8
+1: ld1 {v0.2s}, [x1], x5
+ ld1 {v1.s}[0], [x2], #4
+ fmul v2.4s, v0.4s, v1.s[0]
+ st1 {v2.2s}, [x0], #8
subs x3, x3, #1
b.gt 1b
ret
@@ -227,46 +227,46 @@ endfunc
function ff_sbr_autocorrelate_neon, export=1
mov x2, #38
movrel x3, factors
- ld1 {v0.4S}, [x3]
- movi v1.4S, #0
- movi v2.4S, #0
- movi v3.4S, #0
- ld1 {v4.2S}, [x0], #8
- ld1 {v5.2S}, [x0], #8
- fmul v16.2S, v4.2S, v4.2S
- fmul v17.2S, v5.2S, v4.S[0]
- fmul v18.2S, v5.2S, v4.S[1]
-1: ld1 {v5.D}[1], [x0], #8
- fmla v1.2S, v4.2S, v4.2S
- fmla v2.4S, v5.4S, v4.S[0]
- fmla v3.4S, v5.4S, v4.S[1]
- mov v4.D[0], v5.D[0]
- mov v5.D[0], v5.D[1]
+ ld1 {v0.4s}, [x3]
+ movi v1.4s, #0
+ movi v2.4s, #0
+ movi v3.4s, #0
+ ld1 {v4.2s}, [x0], #8
+ ld1 {v5.2s}, [x0], #8
+ fmul v16.2s, v4.2s, v4.2s
+ fmul v17.2s, v5.2s, v4.s[0]
+ fmul v18.2s, v5.2s, v4.s[1]
+1: ld1 {v5.d}[1], [x0], #8
+ fmla v1.2s, v4.2s, v4.2s
+ fmla v2.4s, v5.4s, v4.s[0]
+ fmla v3.4s, v5.4s, v4.s[1]
+ mov v4.d[0], v5.d[0]
+ mov v5.d[0], v5.d[1]
subs x2, x2, #1
b.gt 1b
- fmul v19.2S, v4.2S, v4.2S
- fmul v20.2S, v5.2S, v4.S[0]
- fmul v21.2S, v5.2S, v4.S[1]
- fadd v22.4S, v2.4S, v20.4S
- fsub v22.4S, v22.4S, v17.4S
- fadd v23.4S, v3.4S, v21.4S
- fsub v23.4S, v23.4S, v18.4S
- rev64 v23.4S, v23.4S
- fmul v23.4S, v23.4S, v0.4S
- fadd v22.4S, v22.4S, v23.4S
- st1 {v22.4S}, [x1], #16
- fadd v23.2S, v1.2S, v19.2S
- fsub v23.2S, v23.2S, v16.2S
- faddp v23.2S, v23.2S, v23.2S
- st1 {v23.S}[0], [x1]
+ fmul v19.2s, v4.2s, v4.2s
+ fmul v20.2s, v5.2s, v4.s[0]
+ fmul v21.2s, v5.2s, v4.s[1]
+ fadd v22.4s, v2.4s, v20.4s
+ fsub v22.4s, v22.4s, v17.4s
+ fadd v23.4s, v3.4s, v21.4s
+ fsub v23.4s, v23.4s, v18.4s
+ rev64 v23.4s, v23.4s
+ fmul v23.4s, v23.4s, v0.4s
+ fadd v22.4s, v22.4s, v23.4s
+ st1 {v22.4s}, [x1], #16
+ fadd v23.2s, v1.2s, v19.2s
+ fsub v23.2s, v23.2s, v16.2s
+ faddp v23.2s, v23.2s, v23.2s
+ st1 {v23.s}[0], [x1]
add x1, x1, #8
- rev64 v3.2S, v3.2S
- fmul v3.2S, v3.2S, v0.2S
- fadd v2.2S, v2.2S, v3.2S
- st1 {v2.2S}, [x1]
+ rev64 v3.2s, v3.2s
+ fmul v3.2s, v3.2s, v0.2s
+ fadd v2.2s, v2.2s, v3.2s
+ st1 {v2.2s}, [x1]
add x1, x1, #16
- faddp v1.2S, v1.2S, v1.2S
- st1 {v1.S}[0], [x1]
+ faddp v1.2s, v1.2s, v1.2s
+ st1 {v1.s}[0], [x1]
ret
endfunc
@@ -278,25 +278,25 @@ endfunc
1: and x3, x3, #0x1ff
add x8, x7, x3, lsl #3
add x3, x3, #2
- ld1 {v2.4S}, [x0]
- ld1 {v3.2S}, [x1], #8
- ld1 {v4.2S}, [x2], #8
- ld1 {v5.4S}, [x8]
- mov v6.16B, v2.16B
- zip1 v3.4S, v3.4S, v3.4S
- zip1 v4.4S, v4.4S, v4.4S
- fmla v6.4S, v1.4S, v3.4S
- fmla v2.4S, v5.4S, v4.4S
- fcmeq v7.4S, v3.4S, #0
- bif v2.16B, v6.16B, v7.16B
- st1 {v2.4S}, [x0], #16
+ ld1 {v2.4s}, [x0]
+ ld1 {v3.2s}, [x1], #8
+ ld1 {v4.2s}, [x2], #8
+ ld1 {v5.4s}, [x8]
+ mov v6.16b, v2.16b
+ zip1 v3.4s, v3.4s, v3.4s
+ zip1 v4.4s, v4.4s, v4.4s
+ fmla v6.4s, v1.4s, v3.4s
+ fmla v2.4s, v5.4s, v4.4s
+ fcmeq v7.4s, v3.4s, #0
+ bif v2.16b, v6.16b, v7.16b
+ st1 {v2.4s}, [x0], #16
subs x5, x5, #2
b.gt 1b
.endm
function ff_sbr_hf_apply_noise_0_neon, export=1
movrel x9, phi_noise_0
- ld1 {v1.4S}, [x9]
+ ld1 {v1.4s}, [x9]
apply_noise_common
ret
endfunc
@@ -305,14 +305,14 @@ function ff_sbr_hf_apply_noise_1_neon, export=1
movrel x9, phi_noise_1
and x4, x4, #1
add x9, x9, x4, lsl #4
- ld1 {v1.4S}, [x9]
+ ld1 {v1.4s}, [x9]
apply_noise_common
ret
endfunc
function ff_sbr_hf_apply_noise_2_neon, export=1
movrel x9, phi_noise_2
- ld1 {v1.4S}, [x9]
+ ld1 {v1.4s}, [x9]
apply_noise_common
ret
endfunc
@@ -321,7 +321,7 @@ function ff_sbr_hf_apply_noise_3_neon, export=1
movrel x9, phi_noise_3
and x4, x4, #1
add x9, x9, x4, lsl #4
- ld1 {v1.4S}, [x9]
+ ld1 {v1.4s}, [x9]
apply_noise_common
ret
endfunc
diff --git a/libavcodec/aarch64/simple_idct_neon.S b/libavcodec/aarch64/simple_idct_neon.S
index 5e4d021a97..be5eb37138 100644
--- a/libavcodec/aarch64/simple_idct_neon.S
+++ b/libavcodec/aarch64/simple_idct_neon.S
@@ -54,7 +54,7 @@ endconst
prfm pldl1keep, [\data]
mov x10, x30
movrel x3, idct_coeff_neon
- ld1 {v0.2D}, [x3]
+ ld1 {v0.2d}, [x3]
.endm
.macro idct_end
@@ -74,146 +74,146 @@ endconst
.endm
.macro idct_col4_top y1, y2, y3, y4, i, l
- smull\i v7.4S, \y3\l, z2
- smull\i v16.4S, \y3\l, z6
- smull\i v17.4S, \y2\l, z1
- add v19.4S, v23.4S, v7.4S
- smull\i v18.4S, \y2\l, z3
- add v20.4S, v23.4S, v16.4S
- smull\i v5.4S, \y2\l, z5
- sub v21.4S, v23.4S, v16.4S
- smull\i v6.4S, \y2\l, z7
- sub v22.4S, v23.4S, v7.4S
+ smull\i v7.4s, \y3\l, z2
+ smull\i v16.4s, \y3\l, z6
+ smull\i v17.4s, \y2\l, z1
+ add v19.4s, v23.4s, v7.4s
+ smull\i v18.4s, \y2\l, z3
+ add v20.4s, v23.4s, v16.4s
+ smull\i v5.4s, \y2\l, z5
+ sub v21.4s, v23.4s, v16.4s
+ smull\i v6.4s, \y2\l, z7
+ sub v22.4s, v23.4s, v7.4s
- smlal\i v17.4S, \y4\l, z3
- smlsl\i v18.4S, \y4\l, z7
- smlsl\i v5.4S, \y4\l, z1
- smlsl\i v6.4S, \y4\l, z5
+ smlal\i v17.4s, \y4\l, z3
+ smlsl\i v18.4s, \y4\l, z7
+ smlsl\i v5.4s, \y4\l, z1
+ smlsl\i v6.4s, \y4\l, z5
.endm
.macro idct_row4_neon y1, y2, y3, y4, pass
- ld1 {\y1\().2D,\y2\().2D}, [x2], #32
- movi v23.4S, #1<<2, lsl #8
- orr v5.16B, \y1\().16B, \y2\().16B
- ld1 {\y3\().2D,\y4\().2D}, [x2], #32
- orr v6.16B, \y3\().16B, \y4\().16B
- orr v5.16B, v5.16B, v6.16B
- mov x3, v5.D[1]
- smlal v23.4S, \y1\().4H, z4
+ ld1 {\y1\().2d,\y2\().2d}, [x2], #32
+ movi v23.4s, #1<<2, lsl #8
+ orr v5.16b, \y1\().16b, \y2\().16b
+ ld1 {\y3\().2d,\y4\().2d}, [x2], #32
+ orr v6.16b, \y3\().16b, \y4\().16b
+ orr v5.16b, v5.16b, v6.16b
+ mov x3, v5.d[1]
+ smlal v23.4s, \y1\().4h, z4
- idct_col4_top \y1, \y2, \y3, \y4, 1, .4H
+ idct_col4_top \y1, \y2, \y3, \y4, 1, .4h
cmp x3, #0
b.eq \pass\()f
- smull2 v7.4S, \y1\().8H, z4
- smlal2 v17.4S, \y2\().8H, z5
- smlsl2 v18.4S, \y2\().8H, z1
- smull2 v16.4S, \y3\().8H, z2
- smlal2 v5.4S, \y2\().8H, z7
- add v19.4S, v19.4S, v7.4S
- sub v20.4S, v20.4S, v7.4S
- sub v21.4S, v21.4S, v7.4S
- add v22.4S, v22.4S, v7.4S
- smlal2 v6.4S, \y2\().8H, z3
- smull2 v7.4S, \y3\().8H, z6
- smlal2 v17.4S, \y4\().8H, z7
- smlsl2 v18.4S, \y4\().8H, z5
- smlal2 v5.4S, \y4\().8H, z3
- smlsl2 v6.4S, \y4\().8H, z1
- add v19.4S, v19.4S, v7.4S
- sub v20.4S, v20.4S, v16.4S
- add v21.4S, v21.4S, v16.4S
- sub v22.4S, v22.4S, v7.4S
+ smull2 v7.4s, \y1\().8h, z4
+ smlal2 v17.4s, \y2\().8h, z5
+ smlsl2 v18.4s, \y2\().8h, z1
+ smull2 v16.4s, \y3\().8h, z2
+ smlal2 v5.4s, \y2\().8h, z7
+ add v19.4s, v19.4s, v7.4s
+ sub v20.4s, v20.4s, v7.4s
+ sub v21.4s, v21.4s, v7.4s
+ add v22.4s, v22.4s, v7.4s
+ smlal2 v6.4s, \y2\().8h, z3
+ smull2 v7.4s, \y3\().8h, z6
+ smlal2 v17.4s, \y4\().8h, z7
+ smlsl2 v18.4s, \y4\().8h, z5
+ smlal2 v5.4s, \y4\().8h, z3
+ smlsl2 v6.4s, \y4\().8h, z1
+ add v19.4s, v19.4s, v7.4s
+ sub v20.4s, v20.4s, v16.4s
+ add v21.4s, v21.4s, v16.4s
+ sub v22.4s, v22.4s, v7.4s
\pass: add \y3\().4S, v19.4S, v17.4S
- add \y4\().4S, v20.4S, v18.4S
- shrn \y1\().4H, \y3\().4S, #ROW_SHIFT
- shrn \y2\().4H, \y4\().4S, #ROW_SHIFT
- add v7.4S, v21.4S, v5.4S
- add v16.4S, v22.4S, v6.4S
- shrn \y3\().4H, v7.4S, #ROW_SHIFT
- shrn \y4\().4H, v16.4S, #ROW_SHIFT
- sub v22.4S, v22.4S, v6.4S
- sub v19.4S, v19.4S, v17.4S
- sub v21.4S, v21.4S, v5.4S
- shrn2 \y1\().8H, v22.4S, #ROW_SHIFT
- sub v20.4S, v20.4S, v18.4S
- shrn2 \y2\().8H, v21.4S, #ROW_SHIFT
- shrn2 \y3\().8H, v20.4S, #ROW_SHIFT
- shrn2 \y4\().8H, v19.4S, #ROW_SHIFT
+ add \y4\().4s, v20.4s, v18.4s
+ shrn \y1\().4h, \y3\().4s, #ROW_SHIFT
+ shrn \y2\().4h, \y4\().4s, #ROW_SHIFT
+ add v7.4s, v21.4s, v5.4s
+ add v16.4s, v22.4s, v6.4s
+ shrn \y3\().4h, v7.4s, #ROW_SHIFT
+ shrn \y4\().4h, v16.4s, #ROW_SHIFT
+ sub v22.4s, v22.4s, v6.4s
+ sub v19.4s, v19.4s, v17.4s
+ sub v21.4s, v21.4s, v5.4s
+ shrn2 \y1\().8h, v22.4s, #ROW_SHIFT
+ sub v20.4s, v20.4s, v18.4s
+ shrn2 \y2\().8h, v21.4s, #ROW_SHIFT
+ shrn2 \y3\().8h, v20.4s, #ROW_SHIFT
+ shrn2 \y4\().8h, v19.4s, #ROW_SHIFT
- trn1 v16.8H, \y1\().8H, \y2\().8H
- trn2 v17.8H, \y1\().8H, \y2\().8H
- trn1 v18.8H, \y3\().8H, \y4\().8H
- trn2 v19.8H, \y3\().8H, \y4\().8H
- trn1 \y1\().4S, v16.4S, v18.4S
- trn1 \y2\().4S, v17.4S, v19.4S
- trn2 \y3\().4S, v16.4S, v18.4S
- trn2 \y4\().4S, v17.4S, v19.4S
+ trn1 v16.8h, \y1\().8h, \y2\().8h
+ trn2 v17.8h, \y1\().8h, \y2\().8h
+ trn1 v18.8h, \y3\().8h, \y4\().8h
+ trn2 v19.8h, \y3\().8h, \y4\().8h
+ trn1 \y1\().4s, v16.4s, v18.4s
+ trn1 \y2\().4s, v17.4s, v19.4s
+ trn2 \y3\().4s, v16.4s, v18.4s
+ trn2 \y4\().4s, v17.4s, v19.4s
.endm
.macro declare_idct_col4_neon i, l
function idct_col4_neon\i
- dup v23.4H, z4c
+ dup v23.4h, z4c
.if \i == 1
- add v23.4H, v23.4H, v24.4H
+ add v23.4h, v23.4h, v24.4h
.else
- mov v5.D[0], v24.D[1]
- add v23.4H, v23.4H, v5.4H
+ mov v5.d[0], v24.d[1]
+ add v23.4h, v23.4h, v5.4h
.endif
- smull v23.4S, v23.4H, z4
+ smull v23.4s, v23.4h, z4
idct_col4_top v24, v25, v26, v27, \i, \l
- mov x4, v28.D[\i - 1]
- mov x5, v29.D[\i - 1]
+ mov x4, v28.d[\i - 1]
+ mov x5, v29.d[\i - 1]
cmp x4, #0
b.eq 1f
- smull\i v7.4S, v28\l, z4
- add v19.4S, v19.4S, v7.4S
- sub v20.4S, v20.4S, v7.4S
- sub v21.4S, v21.4S, v7.4S
- add v22.4S, v22.4S, v7.4S
+ smull\i v7.4s, v28\l, z4
+ add v19.4s, v19.4s, v7.4s
+ sub v20.4s, v20.4s, v7.4s
+ sub v21.4s, v21.4s, v7.4s
+ add v22.4s, v22.4s, v7.4s
-1: mov x4, v30.D[\i - 1]
+1: mov x4, v30.d[\i - 1]
cmp x5, #0
b.eq 2f
- smlal\i v17.4S, v29\l, z5
- smlsl\i v18.4S, v29\l, z1
- smlal\i v5.4S, v29\l, z7
- smlal\i v6.4S, v29\l, z3
+ smlal\i v17.4s, v29\l, z5
+ smlsl\i v18.4s, v29\l, z1
+ smlal\i v5.4s, v29\l, z7
+ smlal\i v6.4s, v29\l, z3
-2: mov x5, v31.D[\i - 1]
+2: mov x5, v31.d[\i - 1]
cmp x4, #0
b.eq 3f
- smull\i v7.4S, v30\l, z6
- smull\i v16.4S, v30\l, z2
- add v19.4S, v19.4S, v7.4S
- sub v22.4S, v22.4S, v7.4S
- sub v20.4S, v20.4S, v16.4S
- add v21.4S, v21.4S, v16.4S
+ smull\i v7.4s, v30\l, z6
+ smull\i v16.4s, v30\l, z2
+ add v19.4s, v19.4s, v7.4s
+ sub v22.4s, v22.4s, v7.4s
+ sub v20.4s, v20.4s, v16.4s
+ add v21.4s, v21.4s, v16.4s
3: cmp x5, #0
b.eq 4f
- smlal\i v17.4S, v31\l, z7
- smlsl\i v18.4S, v31\l, z5
- smlal\i v5.4S, v31\l, z3
- smlsl\i v6.4S, v31\l, z1
+ smlal\i v17.4s, v31\l, z7
+ smlsl\i v18.4s, v31\l, z5
+ smlal\i v5.4s, v31\l, z3
+ smlsl\i v6.4s, v31\l, z1
-4: addhn v7.4H, v19.4S, v17.4S
- addhn2 v7.8H, v20.4S, v18.4S
- subhn v18.4H, v20.4S, v18.4S
- subhn2 v18.8H, v19.4S, v17.4S
+4: addhn v7.4h, v19.4s, v17.4s
+ addhn2 v7.8h, v20.4s, v18.4s
+ subhn v18.4h, v20.4s, v18.4s
+ subhn2 v18.8h, v19.4s, v17.4s
- addhn v16.4H, v21.4S, v5.4S
- addhn2 v16.8H, v22.4S, v6.4S
- subhn v17.4H, v22.4S, v6.4S
- subhn2 v17.8H, v21.4S, v5.4S
+ addhn v16.4h, v21.4s, v5.4s
+ addhn2 v16.8h, v22.4s, v6.4s
+ subhn v17.4h, v22.4s, v6.4s
+ subhn2 v17.8h, v21.4s, v5.4s
ret
endfunc
@@ -229,33 +229,33 @@ function ff_simple_idct_put_neon, export=1
idct_row4_neon v28, v29, v30, v31, 2
bl idct_col4_neon1
- sqshrun v1.8B, v7.8H, #COL_SHIFT-16
- sqshrun2 v1.16B, v16.8H, #COL_SHIFT-16
- sqshrun v3.8B, v17.8H, #COL_SHIFT-16
- sqshrun2 v3.16B, v18.8H, #COL_SHIFT-16
+ sqshrun v1.8b, v7.8h, #COL_SHIFT-16
+ sqshrun2 v1.16b, v16.8h, #COL_SHIFT-16
+ sqshrun v3.8b, v17.8h, #COL_SHIFT-16
+ sqshrun2 v3.16b, v18.8h, #COL_SHIFT-16
bl idct_col4_neon2
- sqshrun v2.8B, v7.8H, #COL_SHIFT-16
- sqshrun2 v2.16B, v16.8H, #COL_SHIFT-16
- sqshrun v4.8B, v17.8H, #COL_SHIFT-16
- sqshrun2 v4.16B, v18.8H, #COL_SHIFT-16
+ sqshrun v2.8b, v7.8h, #COL_SHIFT-16
+ sqshrun2 v2.16b, v16.8h, #COL_SHIFT-16
+ sqshrun v4.8b, v17.8h, #COL_SHIFT-16
+ sqshrun2 v4.16b, v18.8h, #COL_SHIFT-16
- zip1 v16.4S, v1.4S, v2.4S
- zip2 v17.4S, v1.4S, v2.4S
+ zip1 v16.4s, v1.4s, v2.4s
+ zip2 v17.4s, v1.4s, v2.4s
- st1 {v16.D}[0], [x0], x1
- st1 {v16.D}[1], [x0], x1
+ st1 {v16.d}[0], [x0], x1
+ st1 {v16.d}[1], [x0], x1
- zip1 v18.4S, v3.4S, v4.4S
- zip2 v19.4S, v3.4S, v4.4S
+ zip1 v18.4s, v3.4s, v4.4s
+ zip2 v19.4s, v3.4s, v4.4s
- st1 {v17.D}[0], [x0], x1
- st1 {v17.D}[1], [x0], x1
- st1 {v18.D}[0], [x0], x1
- st1 {v18.D}[1], [x0], x1
- st1 {v19.D}[0], [x0], x1
- st1 {v19.D}[1], [x0], x1
+ st1 {v17.d}[0], [x0], x1
+ st1 {v17.d}[1], [x0], x1
+ st1 {v18.d}[0], [x0], x1
+ st1 {v18.d}[1], [x0], x1
+ st1 {v19.d}[0], [x0], x1
+ st1 {v19.d}[1], [x0], x1
idct_end
endfunc
@@ -267,59 +267,59 @@ function ff_simple_idct_add_neon, export=1
idct_row4_neon v28, v29, v30, v31, 2
bl idct_col4_neon1
- sshr v1.8H, v7.8H, #COL_SHIFT-16
- sshr v2.8H, v16.8H, #COL_SHIFT-16
- sshr v3.8H, v17.8H, #COL_SHIFT-16
- sshr v4.8H, v18.8H, #COL_SHIFT-16
+ sshr v1.8h, v7.8h, #COL_SHIFT-16
+ sshr v2.8h, v16.8h, #COL_SHIFT-16
+ sshr v3.8h, v17.8h, #COL_SHIFT-16
+ sshr v4.8h, v18.8h, #COL_SHIFT-16
bl idct_col4_neon2
- sshr v7.8H, v7.8H, #COL_SHIFT-16
- sshr v16.8H, v16.8H, #COL_SHIFT-16
- sshr v17.8H, v17.8H, #COL_SHIFT-16
- sshr v18.8H, v18.8H, #COL_SHIFT-16
+ sshr v7.8h, v7.8h, #COL_SHIFT-16
+ sshr v16.8h, v16.8h, #COL_SHIFT-16
+ sshr v17.8h, v17.8h, #COL_SHIFT-16
+ sshr v18.8h, v18.8h, #COL_SHIFT-16
mov x9, x0
- ld1 {v19.D}[0], [x0], x1
- zip1 v23.2D, v1.2D, v7.2D
- zip2 v24.2D, v1.2D, v7.2D
- ld1 {v19.D}[1], [x0], x1
- zip1 v25.2D, v2.2D, v16.2D
- zip2 v26.2D, v2.2D, v16.2D
- ld1 {v20.D}[0], [x0], x1
- zip1 v27.2D, v3.2D, v17.2D
- zip2 v28.2D, v3.2D, v17.2D
- ld1 {v20.D}[1], [x0], x1
- zip1 v29.2D, v4.2D, v18.2D
- zip2 v30.2D, v4.2D, v18.2D
- ld1 {v21.D}[0], [x0], x1
- uaddw v23.8H, v23.8H, v19.8B
- uaddw2 v24.8H, v24.8H, v19.16B
- ld1 {v21.D}[1], [x0], x1
- sqxtun v23.8B, v23.8H
- sqxtun2 v23.16B, v24.8H
- ld1 {v22.D}[0], [x0], x1
- uaddw v24.8H, v25.8H, v20.8B
- uaddw2 v25.8H, v26.8H, v20.16B
- ld1 {v22.D}[1], [x0], x1
- sqxtun v24.8B, v24.8H
- sqxtun2 v24.16B, v25.8H
- st1 {v23.D}[0], [x9], x1
- uaddw v25.8H, v27.8H, v21.8B
- uaddw2 v26.8H, v28.8H, v21.16B
- st1 {v23.D}[1], [x9], x1
- sqxtun v25.8B, v25.8H
- sqxtun2 v25.16B, v26.8H
- st1 {v24.D}[0], [x9], x1
- uaddw v26.8H, v29.8H, v22.8B
- uaddw2 v27.8H, v30.8H, v22.16B
- st1 {v24.D}[1], [x9], x1
- sqxtun v26.8B, v26.8H
- sqxtun2 v26.16B, v27.8H
- st1 {v25.D}[0], [x9], x1
- st1 {v25.D}[1], [x9], x1
- st1 {v26.D}[0], [x9], x1
- st1 {v26.D}[1], [x9], x1
+ ld1 {v19.d}[0], [x0], x1
+ zip1 v23.2d, v1.2d, v7.2d
+ zip2 v24.2d, v1.2d, v7.2d
+ ld1 {v19.d}[1], [x0], x1
+ zip1 v25.2d, v2.2d, v16.2d
+ zip2 v26.2d, v2.2d, v16.2d
+ ld1 {v20.d}[0], [x0], x1
+ zip1 v27.2d, v3.2d, v17.2d
+ zip2 v28.2d, v3.2d, v17.2d
+ ld1 {v20.d}[1], [x0], x1
+ zip1 v29.2d, v4.2d, v18.2d
+ zip2 v30.2d, v4.2d, v18.2d
+ ld1 {v21.d}[0], [x0], x1
+ uaddw v23.8h, v23.8h, v19.8b
+ uaddw2 v24.8h, v24.8h, v19.16b
+ ld1 {v21.d}[1], [x0], x1
+ sqxtun v23.8b, v23.8h
+ sqxtun2 v23.16b, v24.8h
+ ld1 {v22.d}[0], [x0], x1
+ uaddw v24.8h, v25.8h, v20.8b
+ uaddw2 v25.8h, v26.8h, v20.16b
+ ld1 {v22.d}[1], [x0], x1
+ sqxtun v24.8b, v24.8h
+ sqxtun2 v24.16b, v25.8h
+ st1 {v23.d}[0], [x9], x1
+ uaddw v25.8h, v27.8h, v21.8b
+ uaddw2 v26.8h, v28.8h, v21.16b
+ st1 {v23.d}[1], [x9], x1
+ sqxtun v25.8b, v25.8h
+ sqxtun2 v25.16b, v26.8h
+ st1 {v24.d}[0], [x9], x1
+ uaddw v26.8h, v29.8h, v22.8b
+ uaddw2 v27.8h, v30.8h, v22.16b
+ st1 {v24.d}[1], [x9], x1
+ sqxtun v26.8b, v26.8h
+ sqxtun2 v26.16b, v27.8h
+ st1 {v25.d}[0], [x9], x1
+ st1 {v25.d}[1], [x9], x1
+ st1 {v26.d}[0], [x9], x1
+ st1 {v26.d}[1], [x9], x1
idct_end
endfunc
@@ -333,30 +333,30 @@ function ff_simple_idct_neon, export=1
sub x2, x2, #128
bl idct_col4_neon1
- sshr v1.8H, v7.8H, #COL_SHIFT-16
- sshr v2.8H, v16.8H, #COL_SHIFT-16
- sshr v3.8H, v17.8H, #COL_SHIFT-16
- sshr v4.8H, v18.8H, #COL_SHIFT-16
+ sshr v1.8h, v7.8h, #COL_SHIFT-16
+ sshr v2.8h, v16.8h, #COL_SHIFT-16
+ sshr v3.8h, v17.8h, #COL_SHIFT-16
+ sshr v4.8h, v18.8h, #COL_SHIFT-16
bl idct_col4_neon2
- sshr v7.8H, v7.8H, #COL_SHIFT-16
- sshr v16.8H, v16.8H, #COL_SHIFT-16
- sshr v17.8H, v17.8H, #COL_SHIFT-16
- sshr v18.8H, v18.8H, #COL_SHIFT-16
+ sshr v7.8h, v7.8h, #COL_SHIFT-16
+ sshr v16.8h, v16.8h, #COL_SHIFT-16
+ sshr v17.8h, v17.8h, #COL_SHIFT-16
+ sshr v18.8h, v18.8h, #COL_SHIFT-16
- zip1 v23.2D, v1.2D, v7.2D
- zip2 v24.2D, v1.2D, v7.2D
- st1 {v23.2D,v24.2D}, [x2], #32
- zip1 v25.2D, v2.2D, v16.2D
- zip2 v26.2D, v2.2D, v16.2D
- st1 {v25.2D,v26.2D}, [x2], #32
- zip1 v27.2D, v3.2D, v17.2D
- zip2 v28.2D, v3.2D, v17.2D
- st1 {v27.2D,v28.2D}, [x2], #32
- zip1 v29.2D, v4.2D, v18.2D
- zip2 v30.2D, v4.2D, v18.2D
- st1 {v29.2D,v30.2D}, [x2], #32
+ zip1 v23.2d, v1.2d, v7.2d
+ zip2 v24.2d, v1.2d, v7.2d
+ st1 {v23.2d,v24.2d}, [x2], #32
+ zip1 v25.2d, v2.2d, v16.2d
+ zip2 v26.2d, v2.2d, v16.2d
+ st1 {v25.2d,v26.2d}, [x2], #32
+ zip1 v27.2d, v3.2d, v17.2d
+ zip2 v28.2d, v3.2d, v17.2d
+ st1 {v27.2d,v28.2d}, [x2], #32
+ zip1 v29.2d, v4.2d, v18.2d
+ zip2 v30.2d, v4.2d, v18.2d
+ st1 {v29.2d,v30.2d}, [x2], #32
idct_end
endfunc
diff --git a/libavcodec/aarch64/vp8dsp_neon.S b/libavcodec/aarch64/vp8dsp_neon.S
index 4bbf16d1a4..e385293ba7 100644
--- a/libavcodec/aarch64/vp8dsp_neon.S
+++ b/libavcodec/aarch64/vp8dsp_neon.S
@@ -330,32 +330,32 @@ endfunc
// v17: hev
// convert to signed value:
- eor v3.16b, v3.16b, v21.16b // PS0 = P0 ^ 0x80
- eor v4.16b, v4.16b, v21.16b // QS0 = Q0 ^ 0x80
+ eor v3.16b, v3.16b, v21.16b // PS0 = P0 ^ 0x80
+ eor v4.16b, v4.16b, v21.16b // QS0 = Q0 ^ 0x80
- movi v20.8h, #3
- ssubl v18.8h, v4.8b, v3.8b // QS0 - PS0
- ssubl2 v19.8h, v4.16b, v3.16b // (widened to 16bit)
- eor v2.16b, v2.16b, v21.16b // PS1 = P1 ^ 0x80
- eor v5.16b, v5.16b, v21.16b // QS1 = Q1 ^ 0x80
- mul v18.8h, v18.8h, v20.8h // w = 3 * (QS0 - PS0)
- mul v19.8h, v19.8h, v20.8h
+ movi v20.8h, #3
+ ssubl v18.8h, v4.8b, v3.8b // QS0 - PS0
+ ssubl2 v19.8h, v4.16b, v3.16b // (widened to 16bit)
+ eor v2.16b, v2.16b, v21.16b // PS1 = P1 ^ 0x80
+ eor v5.16b, v5.16b, v21.16b // QS1 = Q1 ^ 0x80
+ mul v18.8h, v18.8h, v20.8h // w = 3 * (QS0 - PS0)
+ mul v19.8h, v19.8h, v20.8h
- sqsub v20.16b, v2.16b, v5.16b // clamp(PS1-QS1)
- movi v22.16b, #4
- movi v23.16b, #3
+ sqsub v20.16b, v2.16b, v5.16b // clamp(PS1-QS1)
+ movi v22.16b, #4
+ movi v23.16b, #3
.if \inner
- and v20.16b, v20.16b, v17.16b // if(hev) w += clamp(PS1-QS1)
+ and v20.16b, v20.16b, v17.16b // if(hev) w += clamp(PS1-QS1)
.endif
- saddw v18.8h, v18.8h, v20.8b // w += clamp(PS1-QS1)
- saddw2 v19.8h, v19.8h, v20.16b
- sqxtn v18.8b, v18.8h // narrow result back into v18
- sqxtn2 v18.16b, v19.8h
+ saddw v18.8h, v18.8h, v20.8b // w += clamp(PS1-QS1)
+ saddw2 v19.8h, v19.8h, v20.16b
+ sqxtn v18.8b, v18.8h // narrow result back into v18
+ sqxtn2 v18.16b, v19.8h
.if !\inner && !\simple
- eor v1.16b, v1.16b, v21.16b // PS2 = P2 ^ 0x80
- eor v6.16b, v6.16b, v21.16b // QS2 = Q2 ^ 0x80
+ eor v1.16b, v1.16b, v21.16b // PS2 = P2 ^ 0x80
+ eor v6.16b, v6.16b, v21.16b // QS2 = Q2 ^ 0x80
.endif
- and v18.16b, v18.16b, v16.16b // w &= normal_limit
+ and v18.16b, v18.16b, v16.16b // w &= normal_limit
// registers used at this point..
// v0 -> P3 (don't corrupt)
@@ -375,44 +375,44 @@ endfunc
// P0 = s2u(PS0 + c2);
.if \simple
- sqadd v19.16b, v18.16b, v22.16b // c1 = clamp((w&hev)+4)
- sqadd v20.16b, v18.16b, v23.16b // c2 = clamp((w&hev)+3)
- sshr v19.16b, v19.16b, #3 // c1 >>= 3
- sshr v20.16b, v20.16b, #3 // c2 >>= 3
- sqsub v4.16b, v4.16b, v19.16b // QS0 = clamp(QS0-c1)
- sqadd v3.16b, v3.16b, v20.16b // PS0 = clamp(PS0+c2)
- eor v4.16b, v4.16b, v21.16b // Q0 = QS0 ^ 0x80
- eor v3.16b, v3.16b, v21.16b // P0 = PS0 ^ 0x80
- eor v5.16b, v5.16b, v21.16b // Q1 = QS1 ^ 0x80
- eor v2.16b, v2.16b, v21.16b // P1 = PS1 ^ 0x80
+ sqadd v19.16b, v18.16b, v22.16b // c1 = clamp((w&hev)+4)
+ sqadd v20.16b, v18.16b, v23.16b // c2 = clamp((w&hev)+3)
+ sshr v19.16b, v19.16b, #3 // c1 >>= 3
+ sshr v20.16b, v20.16b, #3 // c2 >>= 3
+ sqsub v4.16b, v4.16b, v19.16b // QS0 = clamp(QS0-c1)
+ sqadd v3.16b, v3.16b, v20.16b // PS0 = clamp(PS0+c2)
+ eor v4.16b, v4.16b, v21.16b // Q0 = QS0 ^ 0x80
+ eor v3.16b, v3.16b, v21.16b // P0 = PS0 ^ 0x80
+ eor v5.16b, v5.16b, v21.16b // Q1 = QS1 ^ 0x80
+ eor v2.16b, v2.16b, v21.16b // P1 = PS1 ^ 0x80
.elseif \inner
// the !is4tap case of filter_common, only used for inner blocks
// c3 = ((c1&~hev) + 1) >> 1;
// Q1 = s2u(QS1 - c3);
// P1 = s2u(PS1 + c3);
- sqadd v19.16b, v18.16b, v22.16b // c1 = clamp((w&hev)+4)
- sqadd v20.16b, v18.16b, v23.16b // c2 = clamp((w&hev)+3)
- sshr v19.16b, v19.16b, #3 // c1 >>= 3
- sshr v20.16b, v20.16b, #3 // c2 >>= 3
- sqsub v4.16b, v4.16b, v19.16b // QS0 = clamp(QS0-c1)
- sqadd v3.16b, v3.16b, v20.16b // PS0 = clamp(PS0+c2)
- bic v19.16b, v19.16b, v17.16b // c1 & ~hev
- eor v4.16b, v4.16b, v21.16b // Q0 = QS0 ^ 0x80
- srshr v19.16b, v19.16b, #1 // c3 >>= 1
- eor v3.16b, v3.16b, v21.16b // P0 = PS0 ^ 0x80
- sqsub v5.16b, v5.16b, v19.16b // QS1 = clamp(QS1-c3)
- sqadd v2.16b, v2.16b, v19.16b // PS1 = clamp(PS1+c3)
- eor v5.16b, v5.16b, v21.16b // Q1 = QS1 ^ 0x80
- eor v2.16b, v2.16b, v21.16b // P1 = PS1 ^ 0x80
+ sqadd v19.16b, v18.16b, v22.16b // c1 = clamp((w&hev)+4)
+ sqadd v20.16b, v18.16b, v23.16b // c2 = clamp((w&hev)+3)
+ sshr v19.16b, v19.16b, #3 // c1 >>= 3
+ sshr v20.16b, v20.16b, #3 // c2 >>= 3
+ sqsub v4.16b, v4.16b, v19.16b // QS0 = clamp(QS0-c1)
+ sqadd v3.16b, v3.16b, v20.16b // PS0 = clamp(PS0+c2)
+ bic v19.16b, v19.16b, v17.16b // c1 & ~hev
+ eor v4.16b, v4.16b, v21.16b // Q0 = QS0 ^ 0x80
+ srshr v19.16b, v19.16b, #1 // c3 >>= 1
+ eor v3.16b, v3.16b, v21.16b // P0 = PS0 ^ 0x80
+ sqsub v5.16b, v5.16b, v19.16b // QS1 = clamp(QS1-c3)
+ sqadd v2.16b, v2.16b, v19.16b // PS1 = clamp(PS1+c3)
+ eor v5.16b, v5.16b, v21.16b // Q1 = QS1 ^ 0x80
+ eor v2.16b, v2.16b, v21.16b // P1 = PS1 ^ 0x80
.else
- and v20.16b, v18.16b, v17.16b // w & hev
- sqadd v19.16b, v20.16b, v22.16b // c1 = clamp((w&hev)+4)
- sqadd v20.16b, v20.16b, v23.16b // c2 = clamp((w&hev)+3)
- sshr v19.16b, v19.16b, #3 // c1 >>= 3
- sshr v20.16b, v20.16b, #3 // c2 >>= 3
- bic v18.16b, v18.16b, v17.16b // w &= ~hev
- sqsub v4.16b, v4.16b, v19.16b // QS0 = clamp(QS0-c1)
- sqadd v3.16b, v3.16b, v20.16b // PS0 = clamp(PS0+c2)
+ and v20.16b, v18.16b, v17.16b // w & hev
+ sqadd v19.16b, v20.16b, v22.16b // c1 = clamp((w&hev)+4)
+ sqadd v20.16b, v20.16b, v23.16b // c2 = clamp((w&hev)+3)
+ sshr v19.16b, v19.16b, #3 // c1 >>= 3
+ sshr v20.16b, v20.16b, #3 // c2 >>= 3
+ bic v18.16b, v18.16b, v17.16b // w &= ~hev
+ sqsub v4.16b, v4.16b, v19.16b // QS0 = clamp(QS0-c1)
+ sqadd v3.16b, v3.16b, v20.16b // PS0 = clamp(PS0+c2)
// filter_mbedge:
// a = clamp((27*w + 63) >> 7);
@@ -424,35 +424,35 @@ endfunc
// a = clamp((9*w + 63) >> 7);
// Q2 = s2u(QS2 - a);
// P2 = s2u(PS2 + a);
- movi v17.8h, #63
- sshll v22.8h, v18.8b, #3
- sshll2 v23.8h, v18.16b, #3
- saddw v22.8h, v22.8h, v18.8b
- saddw2 v23.8h, v23.8h, v18.16b
- add v16.8h, v17.8h, v22.8h
- add v17.8h, v17.8h, v23.8h // 9*w + 63
- add v19.8h, v16.8h, v22.8h
- add v20.8h, v17.8h, v23.8h // 18*w + 63
- add v22.8h, v19.8h, v22.8h
- add v23.8h, v20.8h, v23.8h // 27*w + 63
- sqshrn v16.8b, v16.8h, #7
- sqshrn2 v16.16b, v17.8h, #7 // clamp(( 9*w + 63)>>7)
- sqshrn v19.8b, v19.8h, #7
- sqshrn2 v19.16b, v20.8h, #7 // clamp((18*w + 63)>>7)
- sqshrn v22.8b, v22.8h, #7
- sqshrn2 v22.16b, v23.8h, #7 // clamp((27*w + 63)>>7)
- sqadd v1.16b, v1.16b, v16.16b // PS2 = clamp(PS2+a)
- sqsub v6.16b, v6.16b, v16.16b // QS2 = clamp(QS2-a)
- sqadd v2.16b, v2.16b, v19.16b // PS1 = clamp(PS1+a)
- sqsub v5.16b, v5.16b, v19.16b // QS1 = clamp(QS1-a)
- sqadd v3.16b, v3.16b, v22.16b // PS0 = clamp(PS0+a)
- sqsub v4.16b, v4.16b, v22.16b // QS0 = clamp(QS0-a)
- eor v3.16b, v3.16b, v21.16b // P0 = PS0 ^ 0x80
- eor v4.16b, v4.16b, v21.16b // Q0 = QS0 ^ 0x80
- eor v2.16b, v2.16b, v21.16b // P1 = PS1 ^ 0x80
- eor v5.16b, v5.16b, v21.16b // Q1 = QS1 ^ 0x80
- eor v1.16b, v1.16b, v21.16b // P2 = PS2 ^ 0x80
- eor v6.16b, v6.16b, v21.16b // Q2 = QS2 ^ 0x80
+ movi v17.8h, #63
+ sshll v22.8h, v18.8b, #3
+ sshll2 v23.8h, v18.16b, #3
+ saddw v22.8h, v22.8h, v18.8b
+ saddw2 v23.8h, v23.8h, v18.16b
+ add v16.8h, v17.8h, v22.8h
+ add v17.8h, v17.8h, v23.8h // 9*w + 63
+ add v19.8h, v16.8h, v22.8h
+ add v20.8h, v17.8h, v23.8h // 18*w + 63
+ add v22.8h, v19.8h, v22.8h
+ add v23.8h, v20.8h, v23.8h // 27*w + 63
+ sqshrn v16.8b, v16.8h, #7
+ sqshrn2 v16.16b, v17.8h, #7 // clamp(( 9*w + 63)>>7)
+ sqshrn v19.8b, v19.8h, #7
+ sqshrn2 v19.16b, v20.8h, #7 // clamp((18*w + 63)>>7)
+ sqshrn v22.8b, v22.8h, #7
+ sqshrn2 v22.16b, v23.8h, #7 // clamp((27*w + 63)>>7)
+ sqadd v1.16b, v1.16b, v16.16b // PS2 = clamp(PS2+a)
+ sqsub v6.16b, v6.16b, v16.16b // QS2 = clamp(QS2-a)
+ sqadd v2.16b, v2.16b, v19.16b // PS1 = clamp(PS1+a)
+ sqsub v5.16b, v5.16b, v19.16b // QS1 = clamp(QS1-a)
+ sqadd v3.16b, v3.16b, v22.16b // PS0 = clamp(PS0+a)
+ sqsub v4.16b, v4.16b, v22.16b // QS0 = clamp(QS0-a)
+ eor v3.16b, v3.16b, v21.16b // P0 = PS0 ^ 0x80
+ eor v4.16b, v4.16b, v21.16b // Q0 = QS0 ^ 0x80
+ eor v2.16b, v2.16b, v21.16b // P1 = PS1 ^ 0x80
+ eor v5.16b, v5.16b, v21.16b // Q1 = QS1 ^ 0x80
+ eor v1.16b, v1.16b, v21.16b // P2 = PS2 ^ 0x80
+ eor v6.16b, v6.16b, v21.16b // Q2 = QS2 ^ 0x80
.endif
.endm
@@ -507,48 +507,48 @@ function ff_vp8_v_loop_filter8uv\name\()_neon, export=1
sub x0, x0, x2, lsl #2
sub x1, x1, x2, lsl #2
// Load pixels:
- ld1 {v0.d}[0], [x0], x2 // P3
- ld1 {v0.d}[1], [x1], x2 // P3
- ld1 {v1.d}[0], [x0], x2 // P2
- ld1 {v1.d}[1], [x1], x2 // P2
- ld1 {v2.d}[0], [x0], x2 // P1
- ld1 {v2.d}[1], [x1], x2 // P1
- ld1 {v3.d}[0], [x0], x2 // P0
- ld1 {v3.d}[1], [x1], x2 // P0
- ld1 {v4.d}[0], [x0], x2 // Q0
- ld1 {v4.d}[1], [x1], x2 // Q0
- ld1 {v5.d}[0], [x0], x2 // Q1
- ld1 {v5.d}[1], [x1], x2 // Q1
- ld1 {v6.d}[0], [x0], x2 // Q2
- ld1 {v6.d}[1], [x1], x2 // Q2
- ld1 {v7.d}[0], [x0] // Q3
- ld1 {v7.d}[1], [x1] // Q3
+ ld1 {v0.d}[0], [x0], x2 // P3
+ ld1 {v0.d}[1], [x1], x2 // P3
+ ld1 {v1.d}[0], [x0], x2 // P2
+ ld1 {v1.d}[1], [x1], x2 // P2
+ ld1 {v2.d}[0], [x0], x2 // P1
+ ld1 {v2.d}[1], [x1], x2 // P1
+ ld1 {v3.d}[0], [x0], x2 // P0
+ ld1 {v3.d}[1], [x1], x2 // P0
+ ld1 {v4.d}[0], [x0], x2 // Q0
+ ld1 {v4.d}[1], [x1], x2 // Q0
+ ld1 {v5.d}[0], [x0], x2 // Q1
+ ld1 {v5.d}[1], [x1], x2 // Q1
+ ld1 {v6.d}[0], [x0], x2 // Q2
+ ld1 {v6.d}[1], [x1], x2 // Q2
+ ld1 {v7.d}[0], [x0] // Q3
+ ld1 {v7.d}[1], [x1] // Q3
- dup v22.16b, w3 // flim_E
- dup v23.16b, w4 // flim_I
+ dup v22.16b, w3 // flim_E
+ dup v23.16b, w4 // flim_I
vp8_loop_filter inner=\inner, hev_thresh=w5
// back up to P2: u,v -= stride * 6
- sub x0, x0, x2, lsl #2
- sub x1, x1, x2, lsl #2
- sub x0, x0, x2, lsl #1
- sub x1, x1, x2, lsl #1
+ sub x0, x0, x2, lsl #2
+ sub x1, x1, x2, lsl #2
+ sub x0, x0, x2, lsl #1
+ sub x1, x1, x2, lsl #1
// Store pixels:
- st1 {v1.d}[0], [x0], x2 // P2
- st1 {v1.d}[1], [x1], x2 // P2
- st1 {v2.d}[0], [x0], x2 // P1
- st1 {v2.d}[1], [x1], x2 // P1
- st1 {v3.d}[0], [x0], x2 // P0
- st1 {v3.d}[1], [x1], x2 // P0
- st1 {v4.d}[0], [x0], x2 // Q0
- st1 {v4.d}[1], [x1], x2 // Q0
- st1 {v5.d}[0], [x0], x2 // Q1
- st1 {v5.d}[1], [x1], x2 // Q1
- st1 {v6.d}[0], [x0] // Q2
- st1 {v6.d}[1], [x1] // Q2
+ st1 {v1.d}[0], [x0], x2 // P2
+ st1 {v1.d}[1], [x1], x2 // P2
+ st1 {v2.d}[0], [x0], x2 // P1
+ st1 {v2.d}[1], [x1], x2 // P1
+ st1 {v3.d}[0], [x0], x2 // P0
+ st1 {v3.d}[1], [x1], x2 // P0
+ st1 {v4.d}[0], [x0], x2 // Q0
+ st1 {v4.d}[1], [x1], x2 // Q0
+ st1 {v5.d}[0], [x0], x2 // Q1
+ st1 {v5.d}[1], [x1], x2 // Q1
+ st1 {v6.d}[0], [x0] // Q2
+ st1 {v6.d}[1], [x1] // Q2
ret
endfunc
@@ -579,7 +579,7 @@ function ff_vp8_h_loop_filter16\name\()_neon, export=1
ld1 {v6.d}[1], [x0], x1
ld1 {v7.d}[1], [x0], x1
- transpose_8x16B v0, v1, v2, v3, v4, v5, v6, v7, v30, v31
+ transpose_8x16B v0, v1, v2, v3, v4, v5, v6, v7, v30, v31
dup v22.16b, w2 // flim_E
.if !\simple
@@ -590,7 +590,7 @@ function ff_vp8_h_loop_filter16\name\()_neon, export=1
sub x0, x0, x1, lsl #4 // backup 16 rows
- transpose_8x16B v0, v1, v2, v3, v4, v5, v6, v7, v30, v31
+ transpose_8x16B v0, v1, v2, v3, v4, v5, v6, v7, v30, v31
// Store pixels:
st1 {v0.d}[0], [x0], x1
@@ -624,24 +624,24 @@ function ff_vp8_h_loop_filter8uv\name\()_neon, export=1
sub x1, x1, #4
// Load pixels:
- ld1 {v0.d}[0], [x0], x2 // load u
- ld1 {v0.d}[1], [x1], x2 // load v
- ld1 {v1.d}[0], [x0], x2
- ld1 {v1.d}[1], [x1], x2
- ld1 {v2.d}[0], [x0], x2
- ld1 {v2.d}[1], [x1], x2
- ld1 {v3.d}[0], [x0], x2
- ld1 {v3.d}[1], [x1], x2
- ld1 {v4.d}[0], [x0], x2
- ld1 {v4.d}[1], [x1], x2
- ld1 {v5.d}[0], [x0], x2
- ld1 {v5.d}[1], [x1], x2
- ld1 {v6.d}[0], [x0], x2
- ld1 {v6.d}[1], [x1], x2
- ld1 {v7.d}[0], [x0], x2
- ld1 {v7.d}[1], [x1], x2
+ ld1 {v0.d}[0], [x0], x2 // load u
+ ld1 {v0.d}[1], [x1], x2 // load v
+ ld1 {v1.d}[0], [x0], x2
+ ld1 {v1.d}[1], [x1], x2
+ ld1 {v2.d}[0], [x0], x2
+ ld1 {v2.d}[1], [x1], x2
+ ld1 {v3.d}[0], [x0], x2
+ ld1 {v3.d}[1], [x1], x2
+ ld1 {v4.d}[0], [x0], x2
+ ld1 {v4.d}[1], [x1], x2
+ ld1 {v5.d}[0], [x0], x2
+ ld1 {v5.d}[1], [x1], x2
+ ld1 {v6.d}[0], [x0], x2
+ ld1 {v6.d}[1], [x1], x2
+ ld1 {v7.d}[0], [x0], x2
+ ld1 {v7.d}[1], [x1], x2
- transpose_8x16B v0, v1, v2, v3, v4, v5, v6, v7, v30, v31
+ transpose_8x16B v0, v1, v2, v3, v4, v5, v6, v7, v30, v31
dup v22.16b, w3 // flim_E
dup v23.16b, w4 // flim_I
@@ -651,25 +651,25 @@ function ff_vp8_h_loop_filter8uv\name\()_neon, export=1
sub x0, x0, x2, lsl #3 // backup u 8 rows
sub x1, x1, x2, lsl #3 // backup v 8 rows
- transpose_8x16B v0, v1, v2, v3, v4, v5, v6, v7, v30, v31
+ transpose_8x16B v0, v1, v2, v3, v4, v5, v6, v7, v30, v31
// Store pixels:
- st1 {v0.d}[0], [x0], x2 // load u
- st1 {v0.d}[1], [x1], x2 // load v
- st1 {v1.d}[0], [x0], x2
- st1 {v1.d}[1], [x1], x2
- st1 {v2.d}[0], [x0], x2
- st1 {v2.d}[1], [x1], x2
- st1 {v3.d}[0], [x0], x2
- st1 {v3.d}[1], [x1], x2
- st1 {v4.d}[0], [x0], x2
- st1 {v4.d}[1], [x1], x2
- st1 {v5.d}[0], [x0], x2
- st1 {v5.d}[1], [x1], x2
- st1 {v6.d}[0], [x0], x2
- st1 {v6.d}[1], [x1], x2
- st1 {v7.d}[0], [x0]
- st1 {v7.d}[1], [x1]
+ st1 {v0.d}[0], [x0], x2 // load u
+ st1 {v0.d}[1], [x1], x2 // load v
+ st1 {v1.d}[0], [x0], x2
+ st1 {v1.d}[1], [x1], x2
+ st1 {v2.d}[0], [x0], x2
+ st1 {v2.d}[1], [x1], x2
+ st1 {v3.d}[0], [x0], x2
+ st1 {v3.d}[1], [x1], x2
+ st1 {v4.d}[0], [x0], x2
+ st1 {v4.d}[1], [x1], x2
+ st1 {v5.d}[0], [x0], x2
+ st1 {v5.d}[1], [x1], x2
+ st1 {v6.d}[0], [x0], x2
+ st1 {v6.d}[1], [x1], x2
+ st1 {v7.d}[0], [x0]
+ st1 {v7.d}[1], [x1]
ret
diff --git a/libavcodec/arm/int_neon.S b/libavcodec/arm/int_neon.S
index 72c4c77c45..3c58b72842 100644
--- a/libavcodec/arm/int_neon.S
+++ b/libavcodec/arm/int_neon.S
@@ -48,4 +48,3 @@ function ff_scalarproduct_int16_neon, export=1
vmov.32 r0, d3[0]
bx lr
endfunc
-
diff --git a/libavcodec/cljrdec.c b/libavcodec/cljrdec.c
index 4b187f8cf3..50d11b43bb 100644
--- a/libavcodec/cljrdec.c
+++ b/libavcodec/cljrdec.c
@@ -91,4 +91,3 @@ AVCodec ff_cljr_decoder = {
.decode = decode_frame,
.capabilities = AV_CODEC_CAP_DR1,
};
-
diff --git a/libavcodec/dv.c b/libavcodec/dv.c
index 9c75cfd877..a426b157cd 100644
--- a/libavcodec/dv.c
+++ b/libavcodec/dv.c
@@ -259,4 +259,3 @@ av_cold int ff_dvvideo_init(AVCodecContext *avctx)
return 0;
}
-
diff --git a/libavcodec/dv_profile.c b/libavcodec/dv_profile.c
index 66505c886b..7155e077e6 100644
--- a/libavcodec/dv_profile.c
+++ b/libavcodec/dv_profile.c
@@ -338,4 +338,3 @@ const AVDVProfile *av_dv_codec_profile2(int width, int height,
return p;
}
-
diff --git a/libavcodec/ffv1_template.c b/libavcodec/ffv1_template.c
index c5f61b0182..20a1c2f967 100644
--- a/libavcodec/ffv1_template.c
+++ b/libavcodec/ffv1_template.c
@@ -50,4 +50,3 @@ static inline int RENAME(get_context)(PlaneContext *p, TYPE *src,
p->quant_table[1][(LT - T) & 0xFF] +
p->quant_table[2][(T - RT) & 0xFF];
}
-
diff --git a/libavcodec/ffv1enc_template.c b/libavcodec/ffv1enc_template.c
index bc0add5ed7..fe6156401a 100644
--- a/libavcodec/ffv1enc_template.c
+++ b/libavcodec/ffv1enc_template.c
@@ -199,4 +199,3 @@ static int RENAME(encode_rgb_frame)(FFV1Context *s, const uint8_t *src[4],
}
return 0;
}
-
diff --git a/libavcodec/h264_mc_template.c b/libavcodec/h264_mc_template.c
index d02e2bf580..d24b0a0edc 100644
--- a/libavcodec/h264_mc_template.c
+++ b/libavcodec/h264_mc_template.c
@@ -162,4 +162,3 @@ static void MCFUNC(hl_motion)(const H264Context *h, H264SliceContext *sl,
if (USES_LIST(mb_type, 1))
prefetch_motion(h, sl, 1, PIXEL_SHIFT, CHROMA_IDC);
}
-
diff --git a/libavcodec/hevc_cabac.c b/libavcodec/hevc_cabac.c
index 3635b16ca9..486b4c063d 100644
--- a/libavcodec/hevc_cabac.c
+++ b/libavcodec/hevc_cabac.c
@@ -1554,4 +1554,3 @@ void ff_hevc_hls_mvd_coding(HEVCContext *s, int x0, int y0, int log2_cb_size)
case 0: lc->pu.mvd.y = 0; break;
}
}
-
diff --git a/libavcodec/mpegaudiodsp_template.c b/libavcodec/mpegaudiodsp_template.c
index e531f8a904..b4c8955b0e 100644
--- a/libavcodec/mpegaudiodsp_template.c
+++ b/libavcodec/mpegaudiodsp_template.c
@@ -398,4 +398,3 @@ void RENAME(ff_imdct36_blocks)(INTFLOAT *out, INTFLOAT *buf, INTFLOAT *in,
out++;
}
}
-
diff --git a/libavcodec/mpegaudioenc_template.c b/libavcodec/mpegaudioenc_template.c
index 12f7a098e6..ec9582daee 100644
--- a/libavcodec/mpegaudioenc_template.c
+++ b/libavcodec/mpegaudioenc_template.c
@@ -782,4 +782,3 @@ static const AVCodecDefault mp2_defaults[] = {
{ "b", "0" },
{ NULL },
};
-
diff --git a/libavcodec/msmpeg4.c b/libavcodec/msmpeg4.c
index 920f50f32c..2fb5aa76db 100644
--- a/libavcodec/msmpeg4.c
+++ b/libavcodec/msmpeg4.c
@@ -334,4 +334,3 @@ int ff_msmpeg4_pred_dc(MpegEncContext *s, int n,
*dc_val_ptr = &dc_val[0];
return pred;
}
-
diff --git a/libavcodec/snow_dwt.c b/libavcodec/snow_dwt.c
index 25681e7edd..d5fa756ac7 100644
--- a/libavcodec/snow_dwt.c
+++ b/libavcodec/snow_dwt.c
@@ -856,5 +856,3 @@ av_cold void ff_dwt_init(SnowDWTContext *c)
if (HAVE_MMX)
ff_dwt_init_x86(c);
}
-
-
diff --git a/libavcodec/x86/fmtconvert.asm b/libavcodec/x86/fmtconvert.asm
index 8f62a0a093..dfc335e5c4 100644
--- a/libavcodec/x86/fmtconvert.asm
+++ b/libavcodec/x86/fmtconvert.asm
@@ -121,4 +121,3 @@ INIT_XMM sse
INT32_TO_FLOAT_FMUL_ARRAY8
INIT_XMM sse2
INT32_TO_FLOAT_FMUL_ARRAY8
-
diff --git a/libavcodec/x86/mpegvideoencdsp.asm b/libavcodec/x86/mpegvideoencdsp.asm
index aec73f82dc..53ddd802c8 100644
--- a/libavcodec/x86/mpegvideoencdsp.asm
+++ b/libavcodec/x86/mpegvideoencdsp.asm
@@ -151,4 +151,3 @@ INIT_MMX mmx
PIX_NORM1 0, 16
INIT_XMM sse2
PIX_NORM1 6, 8
-
diff --git a/libavfilter/aarch64/vf_nlmeans_neon.S b/libavfilter/aarch64/vf_nlmeans_neon.S
index e69b0dd923..a788cffd85 100644
--- a/libavfilter/aarch64/vf_nlmeans_neon.S
+++ b/libavfilter/aarch64/vf_nlmeans_neon.S
@@ -22,52 +22,52 @@
// acc_sum_store(ABCD) = {X+A, X+A+B, X+A+B+C, X+A+B+C+D}
.macro acc_sum_store x, xb
- dup v24.4S, v24.S[3] // ...X -> XXXX
- ext v25.16B, v26.16B, \xb, #12 // ext(0000,ABCD,12)=0ABC
- add v24.4S, v24.4S, \x // XXXX+ABCD={X+A,X+B,X+C,X+D}
- add v24.4S, v24.4S, v25.4S // {X+A,X+B+A,X+C+B,X+D+C} (+0ABC)
- ext v25.16B, v26.16B, v25.16B, #12 // ext(0000,0ABC,12)=00AB
- add v24.4S, v24.4S, v25.4S // {X+A,X+B+A,X+C+B+A,X+D+C+B} (+00AB)
- ext v25.16B, v26.16B, v25.16B, #12 // ext(0000,00AB,12)=000A
- add v24.4S, v24.4S, v25.4S // {X+A,X+B+A,X+C+B+A,X+D+C+B+A} (+000A)
- st1 {v24.4S}, [x0], #16 // write 4x32-bit final values
+ dup v24.4s, v24.s[3] // ...X -> XXXX
+ ext v25.16b, v26.16b, \xb, #12 // ext(0000,ABCD,12)=0ABC
+ add v24.4s, v24.4s, \x // XXXX+ABCD={X+A,X+B,X+C,X+D}
+ add v24.4s, v24.4s, v25.4s // {X+A,X+B+A,X+C+B,X+D+C} (+0ABC)
+ ext v25.16b, v26.16b, v25.16b, #12 // ext(0000,0ABC,12)=00AB
+ add v24.4s, v24.4s, v25.4s // {X+A,X+B+A,X+C+B+A,X+D+C+B} (+00AB)
+ ext v25.16b, v26.16b, v25.16b, #12 // ext(0000,00AB,12)=000A
+ add v24.4s, v24.4s, v25.4s // {X+A,X+B+A,X+C+B+A,X+D+C+B+A} (+000A)
+ st1 {v24.4s}, [x0], #16 // write 4x32-bit final values
.endm
function ff_compute_safe_ssd_integral_image_neon, export=1
- movi v26.4S, #0 // used as zero for the "rotations" in acc_sum_store
- sub x3, x3, w6, UXTW // s1 padding (s1_linesize - w)
- sub x5, x5, w6, UXTW // s2 padding (s2_linesize - w)
- sub x9, x0, w1, UXTW #2 // dst_top
- sub x1, x1, w6, UXTW // dst padding (dst_linesize_32 - w)
+ movi v26.4s, #0 // used as zero for the "rotations" in acc_sum_store
+ sub x3, x3, w6, uxtw // s1 padding (s1_linesize - w)
+ sub x5, x5, w6, uxtw // s2 padding (s2_linesize - w)
+ sub x9, x0, w1, uxtw #2 // dst_top
+ sub x1, x1, w6, uxtw // dst padding (dst_linesize_32 - w)
lsl x1, x1, #2 // dst padding expressed in bytes
1: mov w10, w6 // width copy for each line
sub x0, x0, #16 // beginning of the dst line minus 4 sums
sub x8, x9, #4 // dst_top-1
- ld1 {v24.4S}, [x0], #16 // load ...X (contextual last sums)
-2: ld1 {v0.16B}, [x2], #16 // s1[x + 0..15]
- ld1 {v1.16B}, [x4], #16 // s2[x + 0..15]
- ld1 {v16.4S,v17.4S}, [x8], #32 // dst_top[x + 0..7 - 1]
- usubl v2.8H, v0.8B, v1.8B // d[x + 0..7] = s1[x + 0..7] - s2[x + 0..7]
- usubl2 v3.8H, v0.16B, v1.16B // d[x + 8..15] = s1[x + 8..15] - s2[x + 8..15]
- ld1 {v18.4S,v19.4S}, [x8], #32 // dst_top[x + 8..15 - 1]
- smull v4.4S, v2.4H, v2.4H // d[x + 0..3]^2
- smull2 v5.4S, v2.8H, v2.8H // d[x + 4..7]^2
- ld1 {v20.4S,v21.4S}, [x9], #32 // dst_top[x + 0..7]
- smull v6.4S, v3.4H, v3.4H // d[x + 8..11]^2
- smull2 v7.4S, v3.8H, v3.8H // d[x + 12..15]^2
- ld1 {v22.4S,v23.4S}, [x9], #32 // dst_top[x + 8..15]
- sub v0.4S, v20.4S, v16.4S // dst_top[x + 0..3] - dst_top[x + 0..3 - 1]
- sub v1.4S, v21.4S, v17.4S // dst_top[x + 4..7] - dst_top[x + 4..7 - 1]
- add v0.4S, v0.4S, v4.4S // + d[x + 0..3]^2
- add v1.4S, v1.4S, v5.4S // + d[x + 4..7]^2
- sub v2.4S, v22.4S, v18.4S // dst_top[x + 8..11] - dst_top[x + 8..11 - 1]
- sub v3.4S, v23.4S, v19.4S // dst_top[x + 12..15] - dst_top[x + 12..15 - 1]
- add v2.4S, v2.4S, v6.4S // + d[x + 8..11]^2
- add v3.4S, v3.4S, v7.4S // + d[x + 12..15]^2
- acc_sum_store v0.4S, v0.16B // accumulate and store dst[ 0..3]
- acc_sum_store v1.4S, v1.16B // accumulate and store dst[ 4..7]
- acc_sum_store v2.4S, v2.16B // accumulate and store dst[ 8..11]
- acc_sum_store v3.4S, v3.16B // accumulate and store dst[12..15]
+ ld1 {v24.4s}, [x0], #16 // load ...X (contextual last sums)
+2: ld1 {v0.16b}, [x2], #16 // s1[x + 0..15]
+ ld1 {v1.16b}, [x4], #16 // s2[x + 0..15]
+ ld1 {v16.4s,v17.4s}, [x8], #32 // dst_top[x + 0..7 - 1]
+ usubl v2.8h, v0.8b, v1.8b // d[x + 0..7] = s1[x + 0..7] - s2[x + 0..7]
+ usubl2 v3.8h, v0.16b, v1.16b // d[x + 8..15] = s1[x + 8..15] - s2[x + 8..15]
+ ld1 {v18.4s,v19.4s}, [x8], #32 // dst_top[x + 8..15 - 1]
+ smull v4.4s, v2.4h, v2.4h // d[x + 0..3]^2
+ smull2 v5.4s, v2.8h, v2.8h // d[x + 4..7]^2
+ ld1 {v20.4s,v21.4s}, [x9], #32 // dst_top[x + 0..7]
+ smull v6.4s, v3.4h, v3.4h // d[x + 8..11]^2
+ smull2 v7.4s, v3.8h, v3.8h // d[x + 12..15]^2
+ ld1 {v22.4s,v23.4s}, [x9], #32 // dst_top[x + 8..15]
+ sub v0.4s, v20.4s, v16.4s // dst_top[x + 0..3] - dst_top[x + 0..3 - 1]
+ sub v1.4s, v21.4s, v17.4s // dst_top[x + 4..7] - dst_top[x + 4..7 - 1]
+ add v0.4s, v0.4s, v4.4s // + d[x + 0..3]^2
+ add v1.4s, v1.4s, v5.4s // + d[x + 4..7]^2
+ sub v2.4s, v22.4s, v18.4s // dst_top[x + 8..11] - dst_top[x + 8..11 - 1]
+ sub v3.4s, v23.4s, v19.4s // dst_top[x + 12..15] - dst_top[x + 12..15 - 1]
+ add v2.4s, v2.4s, v6.4s // + d[x + 8..11]^2
+ add v3.4s, v3.4s, v7.4s // + d[x + 12..15]^2
+ acc_sum_store v0.4s, v0.16b // accumulate and store dst[ 0..3]
+ acc_sum_store v1.4s, v1.16b // accumulate and store dst[ 4..7]
+ acc_sum_store v2.4s, v2.16b // accumulate and store dst[ 8..11]
+ acc_sum_store v3.4s, v3.16b // accumulate and store dst[12..15]
subs w10, w10, #16 // width dec
b.ne 2b // loop til next line
add x2, x2, x3 // skip to next line (s1)
diff --git a/libavfilter/scene_sad.c b/libavfilter/scene_sad.c
index 73d3eacbfa..2d8f5fbaa3 100644
--- a/libavfilter/scene_sad.c
+++ b/libavfilter/scene_sad.c
@@ -69,4 +69,3 @@ ff_scene_sad_fn ff_scene_sad_get_fn(int depth)
}
return sad;
}
-
diff --git a/libavfilter/vf_overlay_cuda.cu b/libavfilter/vf_overlay_cuda.cu
index 43ec36c2ed..9f1db0e9aa 100644
--- a/libavfilter/vf_overlay_cuda.cu
+++ b/libavfilter/vf_overlay_cuda.cu
@@ -51,4 +51,3 @@ __global__ void Overlay_Cuda(
}
}
-
diff --git a/libavformat/caf.c b/libavformat/caf.c
index fe242ff032..b985b26fad 100644
--- a/libavformat/caf.c
+++ b/libavformat/caf.c
@@ -77,4 +77,3 @@ const AVCodecTag ff_codec_caf_tags[] = {
{ AV_CODEC_ID_PCM_F64BE, MKTAG('l','p','c','m') },
{ AV_CODEC_ID_NONE, 0 },
};
-
diff --git a/libavformat/dash.c b/libavformat/dash.c
index d2501f013a..5f8d927ff8 100644
--- a/libavformat/dash.c
+++ b/libavformat/dash.c
@@ -153,5 +153,3 @@ void ff_dash_fill_tmpl_params(char *dst, size_t buffer_size,
t_cur = t_next;
}
}
-
-
diff --git a/libavformat/fifo_test.c b/libavformat/fifo_test.c
index 02ec215cbb..e78783cb04 100644
--- a/libavformat/fifo_test.c
+++ b/libavformat/fifo_test.c
@@ -149,4 +149,3 @@ AVOutputFormat ff_fifo_test_muxer = {
.priv_class = &failing_muxer_class,
.flags = AVFMT_NOFILE | AVFMT_ALLOW_FLUSH,
};
-
diff --git a/libavformat/g726.c b/libavformat/g726.c
index eca9404866..71a007699f 100644
--- a/libavformat/g726.c
+++ b/libavformat/g726.c
@@ -102,4 +102,3 @@ AVInputFormat ff_g726le_demuxer = {
.raw_codec_id = AV_CODEC_ID_ADPCM_G726LE,
};
#endif
-
diff --git a/libavformat/hlsplaylist.c b/libavformat/hlsplaylist.c
index 0e1dcc087f..adfdc70c6d 100644
--- a/libavformat/hlsplaylist.c
+++ b/libavformat/hlsplaylist.c
@@ -192,4 +192,3 @@ void ff_hls_write_end_list(AVIOContext *out)
return;
avio_printf(out, "#EXT-X-ENDLIST\n");
}
-
diff --git a/libavformat/mlpdec.c b/libavformat/mlpdec.c
index 40b1833761..b095de9d10 100644
--- a/libavformat/mlpdec.c
+++ b/libavformat/mlpdec.c
@@ -91,4 +91,3 @@ AVInputFormat ff_truehd_demuxer = {
.priv_class = &truehd_demuxer_class,
};
#endif
-
diff --git a/libavformat/mpjpegdec.c b/libavformat/mpjpegdec.c
index df2880412d..239b011f69 100644
--- a/libavformat/mpjpegdec.c
+++ b/libavformat/mpjpegdec.c
@@ -392,5 +392,3 @@ AVInputFormat ff_mpjpeg_demuxer = {
.priv_class = &mpjpeg_demuxer_class,
.flags = AVFMT_NOTIMESTAMPS,
};
-
-
diff --git a/libavformat/rdt.c b/libavformat/rdt.c
index 1250c9d70a..9308d871e4 100644
--- a/libavformat/rdt.c
+++ b/libavformat/rdt.c
@@ -571,4 +571,3 @@ RDT_HANDLER(live_video, "x-pn-multirate-realvideo-live", AVMEDIA_TYPE_VIDEO);
RDT_HANDLER(live_audio, "x-pn-multirate-realaudio-live", AVMEDIA_TYPE_AUDIO);
RDT_HANDLER(video, "x-pn-realvideo", AVMEDIA_TYPE_VIDEO);
RDT_HANDLER(audio, "x-pn-realaudio", AVMEDIA_TYPE_AUDIO);
-
diff --git a/libavutil/aarch64/float_dsp_neon.S b/libavutil/aarch64/float_dsp_neon.S
index 02d790c0cc..35e2715b87 100644
--- a/libavutil/aarch64/float_dsp_neon.S
+++ b/libavutil/aarch64/float_dsp_neon.S
@@ -25,16 +25,16 @@
function ff_vector_fmul_neon, export=1
1: subs w3, w3, #16
- ld1 {v0.4S, v1.4S}, [x1], #32
- ld1 {v2.4S, v3.4S}, [x1], #32
- ld1 {v4.4S, v5.4S}, [x2], #32
- ld1 {v6.4S, v7.4S}, [x2], #32
- fmul v16.4S, v0.4S, v4.4S
- fmul v17.4S, v1.4S, v5.4S
- fmul v18.4S, v2.4S, v6.4S
- fmul v19.4S, v3.4S, v7.4S
- st1 {v16.4S, v17.4S}, [x0], #32
- st1 {v18.4S, v19.4S}, [x0], #32
+ ld1 {v0.4s, v1.4s}, [x1], #32
+ ld1 {v2.4s, v3.4s}, [x1], #32
+ ld1 {v4.4s, v5.4s}, [x2], #32
+ ld1 {v6.4s, v7.4s}, [x2], #32
+ fmul v16.4s, v0.4s, v4.4s
+ fmul v17.4s, v1.4s, v5.4s
+ fmul v18.4s, v2.4s, v6.4s
+ fmul v19.4s, v3.4s, v7.4s
+ st1 {v16.4s, v17.4s}, [x0], #32
+ st1 {v18.4s, v19.4s}, [x0], #32
b.ne 1b
ret
endfunc
@@ -42,16 +42,16 @@ endfunc
function ff_vector_fmac_scalar_neon, export=1
mov x3, #-32
1: subs w2, w2, #16
- ld1 {v16.4S, v17.4S}, [x0], #32
- ld1 {v18.4S, v19.4S}, [x0], x3
- ld1 {v4.4S, v5.4S}, [x1], #32
- ld1 {v6.4S, v7.4S}, [x1], #32
- fmla v16.4S, v4.4S, v0.S[0]
- fmla v17.4S, v5.4S, v0.S[0]
- fmla v18.4S, v6.4S, v0.S[0]
- fmla v19.4S, v7.4S, v0.S[0]
- st1 {v16.4S, v17.4S}, [x0], #32
- st1 {v18.4S, v19.4S}, [x0], #32
+ ld1 {v16.4s, v17.4s}, [x0], #32
+ ld1 {v18.4s, v19.4s}, [x0], x3
+ ld1 {v4.4s, v5.4s}, [x1], #32
+ ld1 {v6.4s, v7.4s}, [x1], #32
+ fmla v16.4s, v4.4s, v0.s[0]
+ fmla v17.4s, v5.4s, v0.s[0]
+ fmla v18.4s, v6.4s, v0.s[0]
+ fmla v19.4s, v7.4s, v0.s[0]
+ st1 {v16.4s, v17.4s}, [x0], #32
+ st1 {v18.4s, v19.4s}, [x0], #32
b.ne 1b
ret
endfunc
@@ -59,43 +59,43 @@ endfunc
function ff_vector_fmul_scalar_neon, export=1
mov w4, #15
bics w3, w2, w4
- dup v16.4S, v0.S[0]
+ dup v16.4s, v0.s[0]
b.eq 3f
- ld1 {v0.4S, v1.4S}, [x1], #32
+ ld1 {v0.4s, v1.4s}, [x1], #32
1: subs w3, w3, #16
- fmul v0.4S, v0.4S, v16.4S
- ld1 {v2.4S, v3.4S}, [x1], #32
- fmul v1.4S, v1.4S, v16.4S
- fmul v2.4S, v2.4S, v16.4S
- st1 {v0.4S, v1.4S}, [x0], #32
- fmul v3.4S, v3.4S, v16.4S
+ fmul v0.4s, v0.4s, v16.4s
+ ld1 {v2.4s, v3.4s}, [x1], #32
+ fmul v1.4s, v1.4s, v16.4s
+ fmul v2.4s, v2.4s, v16.4s
+ st1 {v0.4s, v1.4s}, [x0], #32
+ fmul v3.4s, v3.4s, v16.4s
b.eq 2f
- ld1 {v0.4S, v1.4S}, [x1], #32
- st1 {v2.4S, v3.4S}, [x0], #32
+ ld1 {v0.4s, v1.4s}, [x1], #32
+ st1 {v2.4s, v3.4s}, [x0], #32
b 1b
2: ands w2, w2, #15
- st1 {v2.4S, v3.4S}, [x0], #32
+ st1 {v2.4s, v3.4s}, [x0], #32
b.eq 4f
-3: ld1 {v0.4S}, [x1], #16
- fmul v0.4S, v0.4S, v16.4S
- st1 {v0.4S}, [x0], #16
+3: ld1 {v0.4s}, [x1], #16
+ fmul v0.4s, v0.4s, v16.4s
+ st1 {v0.4s}, [x0], #16
subs w2, w2, #4
b.gt 3b
4: ret
endfunc
function ff_vector_dmul_scalar_neon, export=1
- dup v16.2D, v0.D[0]
- ld1 {v0.2D, v1.2D}, [x1], #32
+ dup v16.2d, v0.d[0]
+ ld1 {v0.2d, v1.2d}, [x1], #32
1: subs w2, w2, #8
- fmul v0.2D, v0.2D, v16.2D
- ld1 {v2.2D, v3.2D}, [x1], #32
- fmul v1.2D, v1.2D, v16.2D
- fmul v2.2D, v2.2D, v16.2D
- st1 {v0.2D, v1.2D}, [x0], #32
- fmul v3.2D, v3.2D, v16.2D
- ld1 {v0.2D, v1.2D}, [x1], #32
- st1 {v2.2D, v3.2D}, [x0], #32
+ fmul v0.2d, v0.2d, v16.2d
+ ld1 {v2.2d, v3.2d}, [x1], #32
+ fmul v1.2d, v1.2d, v16.2d
+ fmul v2.2d, v2.2d, v16.2d
+ st1 {v0.2d, v1.2d}, [x0], #32
+ fmul v3.2d, v3.2d, v16.2d
+ ld1 {v0.2d, v1.2d}, [x1], #32
+ st1 {v2.2d, v3.2d}, [x0], #32
b.gt 1b
ret
endfunc
@@ -108,49 +108,49 @@ function ff_vector_fmul_window_neon, export=1
add x6, x3, x5, lsl #3 // win + 8 * (len - 2)
add x5, x0, x5, lsl #3 // dst + 8 * (len - 2)
mov x7, #-16
- ld1 {v0.4S}, [x1], #16 // s0
- ld1 {v2.4S}, [x3], #16 // wi
- ld1 {v1.4S}, [x2], x7 // s1
-1: ld1 {v3.4S}, [x6], x7 // wj
+ ld1 {v0.4s}, [x1], #16 // s0
+ ld1 {v2.4s}, [x3], #16 // wi
+ ld1 {v1.4s}, [x2], x7 // s1
+1: ld1 {v3.4s}, [x6], x7 // wj
subs x4, x4, #4
- fmul v17.4S, v0.4S, v2.4S // s0 * wi
- rev64 v4.4S, v1.4S
- rev64 v5.4S, v3.4S
- rev64 v17.4S, v17.4S
- ext v4.16B, v4.16B, v4.16B, #8 // s1_r
- ext v5.16B, v5.16B, v5.16B, #8 // wj_r
- ext v17.16B, v17.16B, v17.16B, #8 // (s0 * wi)_rev
- fmul v16.4S, v0.4S, v5.4S // s0 * wj_r
- fmla v17.4S, v1.4S, v3.4S // (s0 * wi)_rev + s1 * wj
+ fmul v17.4s, v0.4s, v2.4s // s0 * wi
+ rev64 v4.4s, v1.4s
+ rev64 v5.4s, v3.4s
+ rev64 v17.4s, v17.4s
+ ext v4.16b, v4.16b, v4.16b, #8 // s1_r
+ ext v5.16b, v5.16b, v5.16b, #8 // wj_r
+ ext v17.16b, v17.16b, v17.16b, #8 // (s0 * wi)_rev
+ fmul v16.4s, v0.4s, v5.4s // s0 * wj_r
+ fmla v17.4s, v1.4s, v3.4s // (s0 * wi)_rev + s1 * wj
b.eq 2f
- ld1 {v0.4S}, [x1], #16
- fmls v16.4S, v4.4S, v2.4S // s0 * wj_r - s1_r * wi
- st1 {v17.4S}, [x5], x7
- ld1 {v2.4S}, [x3], #16
- ld1 {v1.4S}, [x2], x7
- st1 {v16.4S}, [x0], #16
+ ld1 {v0.4s}, [x1], #16
+ fmls v16.4s, v4.4s, v2.4s // s0 * wj_r - s1_r * wi
+ st1 {v17.4s}, [x5], x7
+ ld1 {v2.4s}, [x3], #16
+ ld1 {v1.4s}, [x2], x7
+ st1 {v16.4s}, [x0], #16
b 1b
2:
- fmls v16.4S, v4.4S, v2.4S // s0 * wj_r - s1_r * wi
- st1 {v17.4S}, [x5], x7
- st1 {v16.4S}, [x0], #16
+ fmls v16.4s, v4.4s, v2.4s // s0 * wj_r - s1_r * wi
+ st1 {v17.4s}, [x5], x7
+ st1 {v16.4s}, [x0], #16
ret
endfunc
function ff_vector_fmul_add_neon, export=1
- ld1 {v0.4S, v1.4S}, [x1], #32
- ld1 {v2.4S, v3.4S}, [x2], #32
- ld1 {v4.4S, v5.4S}, [x3], #32
+ ld1 {v0.4s, v1.4s}, [x1], #32
+ ld1 {v2.4s, v3.4s}, [x2], #32
+ ld1 {v4.4s, v5.4s}, [x3], #32
1: subs w4, w4, #8
- fmla v4.4S, v0.4S, v2.4S
- fmla v5.4S, v1.4S, v3.4S
+ fmla v4.4s, v0.4s, v2.4s
+ fmla v5.4s, v1.4s, v3.4s
b.eq 2f
- ld1 {v0.4S, v1.4S}, [x1], #32
- ld1 {v2.4S, v3.4S}, [x2], #32
- st1 {v4.4S, v5.4S}, [x0], #32
- ld1 {v4.4S, v5.4S}, [x3], #32
+ ld1 {v0.4s, v1.4s}, [x1], #32
+ ld1 {v2.4s, v3.4s}, [x2], #32
+ st1 {v4.4s, v5.4s}, [x0], #32
+ ld1 {v4.4s, v5.4s}, [x3], #32
b 1b
-2: st1 {v4.4S, v5.4S}, [x0], #32
+2: st1 {v4.4s, v5.4s}, [x0], #32
ret
endfunc
@@ -159,44 +159,44 @@ function ff_vector_fmul_reverse_neon, export=1
add x2, x2, x3, lsl #2
sub x2, x2, #32
mov x4, #-32
- ld1 {v2.4S, v3.4S}, [x2], x4
- ld1 {v0.4S, v1.4S}, [x1], #32
+ ld1 {v2.4s, v3.4s}, [x2], x4
+ ld1 {v0.4s, v1.4s}, [x1], #32
1: subs x3, x3, #8
- rev64 v3.4S, v3.4S
- rev64 v2.4S, v2.4S
- ext v3.16B, v3.16B, v3.16B, #8
- ext v2.16B, v2.16B, v2.16B, #8
- fmul v16.4S, v0.4S, v3.4S
- fmul v17.4S, v1.4S, v2.4S
+ rev64 v3.4s, v3.4s
+ rev64 v2.4s, v2.4s
+ ext v3.16b, v3.16b, v3.16b, #8
+ ext v2.16b, v2.16b, v2.16b, #8
+ fmul v16.4s, v0.4s, v3.4s
+ fmul v17.4s, v1.4s, v2.4s
b.eq 2f
- ld1 {v2.4S, v3.4S}, [x2], x4
- ld1 {v0.4S, v1.4S}, [x1], #32
- st1 {v16.4S, v17.4S}, [x0], #32
+ ld1 {v2.4s, v3.4s}, [x2], x4
+ ld1 {v0.4s, v1.4s}, [x1], #32
+ st1 {v16.4s, v17.4s}, [x0], #32
b 1b
-2: st1 {v16.4S, v17.4S}, [x0], #32
+2: st1 {v16.4s, v17.4s}, [x0], #32
ret
endfunc
function ff_butterflies_float_neon, export=1
-1: ld1 {v0.4S}, [x0]
- ld1 {v1.4S}, [x1]
+1: ld1 {v0.4s}, [x0]
+ ld1 {v1.4s}, [x1]
subs w2, w2, #4
- fsub v2.4S, v0.4S, v1.4S
- fadd v3.4S, v0.4S, v1.4S
- st1 {v2.4S}, [x1], #16
- st1 {v3.4S}, [x0], #16
+ fsub v2.4s, v0.4s, v1.4s
+ fadd v3.4s, v0.4s, v1.4s
+ st1 {v2.4s}, [x1], #16
+ st1 {v3.4s}, [x0], #16
b.gt 1b
ret
endfunc
function ff_scalarproduct_float_neon, export=1
- movi v2.4S, #0
-1: ld1 {v0.4S}, [x0], #16
- ld1 {v1.4S}, [x1], #16
+ movi v2.4s, #0
+1: ld1 {v0.4s}, [x0], #16
+ ld1 {v1.4s}, [x1], #16
subs w2, w2, #4
- fmla v2.4S, v0.4S, v1.4S
+ fmla v2.4s, v0.4s, v1.4s
b.gt 1b
- faddp v0.4S, v2.4S, v2.4S
- faddp s0, v0.2S
+ faddp v0.4s, v2.4s, v2.4s
+ faddp s0, v0.2s
ret
endfunc
diff --git a/libavutil/aes.c b/libavutil/aes.c
index 397ea77389..4dd9725b6c 100644
--- a/libavutil/aes.c
+++ b/libavutil/aes.c
@@ -265,4 +265,3 @@ int av_aes_init(AVAES *a, const uint8_t *key, int key_bits, int decrypt)
return 0;
}
-
diff --git a/libavutil/hwcontext_cuda_internal.h b/libavutil/hwcontext_cuda_internal.h
index d5633c58d5..a989c45861 100644
--- a/libavutil/hwcontext_cuda_internal.h
+++ b/libavutil/hwcontext_cuda_internal.h
@@ -36,4 +36,3 @@ struct AVCUDADeviceContextInternal {
};
#endif /* AVUTIL_HWCONTEXT_CUDA_INTERNAL_H */
-
diff --git a/libavutil/hwcontext_qsv.h b/libavutil/hwcontext_qsv.h
index b98d611cfc..901710bf21 100644
--- a/libavutil/hwcontext_qsv.h
+++ b/libavutil/hwcontext_qsv.h
@@ -50,4 +50,3 @@ typedef struct AVQSVFramesContext {
} AVQSVFramesContext;
#endif /* AVUTIL_HWCONTEXT_QSV_H */
-
diff --git a/libavutil/tests/blowfish.c b/libavutil/tests/blowfish.c
index f4c9ced9b4..0dd4d22c9d 100644
--- a/libavutil/tests/blowfish.c
+++ b/libavutil/tests/blowfish.c
@@ -191,4 +191,3 @@ int main(void)
return 0;
}
-
diff --git a/libswresample/aarch64/resample.S b/libswresample/aarch64/resample.S
index bbad619a81..89edd1ce60 100644
--- a/libswresample/aarch64/resample.S
+++ b/libswresample/aarch64/resample.S
@@ -21,57 +21,57 @@
#include "libavutil/aarch64/asm.S"
function ff_resample_common_apply_filter_x4_float_neon, export=1
- movi v0.4S, #0 // accumulator
-1: ld1 {v1.4S}, [x1], #16 // src[0..3]
- ld1 {v2.4S}, [x2], #16 // filter[0..3]
- fmla v0.4S, v1.4S, v2.4S // accumulator += src[0..3] * filter[0..3]
- subs w3, w3, #4 // filter_length -= 4
- b.gt 1b // loop until filter_length
- faddp v0.4S, v0.4S, v0.4S // pair adding of the 4x32-bit accumulated values
- faddp v0.4S, v0.4S, v0.4S // pair adding of the 4x32-bit accumulated values
- st1 {v0.S}[0], [x0], #4 // write accumulator
- ret
+ movi v0.4s, #0 // accumulator
+1: ld1 {v1.4s}, [x1], #16 // src[0..3]
+ ld1 {v2.4s}, [x2], #16 // filter[0..3]
+ fmla v0.4s, v1.4s, v2.4s // accumulator += src[0..3] * filter[0..3]
+ subs w3, w3, #4 // filter_length -= 4
+ b.gt 1b // loop until filter_length
+ faddp v0.4s, v0.4s, v0.4s // pair adding of the 4x32-bit accumulated values
+ faddp v0.4s, v0.4s, v0.4s // pair adding of the 4x32-bit accumulated values
+ st1 {v0.s}[0], [x0], #4 // write accumulator
+ ret
endfunc
function ff_resample_common_apply_filter_x8_float_neon, export=1
- movi v0.4S, #0 // accumulator
-1: ld1 {v1.4S}, [x1], #16 // src[0..3]
- ld1 {v2.4S}, [x2], #16 // filter[0..3]
- ld1 {v3.4S}, [x1], #16 // src[4..7]
- ld1 {v4.4S}, [x2], #16 // filter[4..7]
- fmla v0.4S, v1.4S, v2.4S // accumulator += src[0..3] * filter[0..3]
- fmla v0.4S, v3.4S, v4.4S // accumulator += src[4..7] * filter[4..7]
- subs w3, w3, #8 // filter_length -= 8
- b.gt 1b // loop until filter_length
- faddp v0.4S, v0.4S, v0.4S // pair adding of the 4x32-bit accumulated values
- faddp v0.4S, v0.4S, v0.4S // pair adding of the 4x32-bit accumulated values
- st1 {v0.S}[0], [x0], #4 // write accumulator
- ret
+ movi v0.4s, #0 // accumulator
+1: ld1 {v1.4s}, [x1], #16 // src[0..3]
+ ld1 {v2.4s}, [x2], #16 // filter[0..3]
+ ld1 {v3.4s}, [x1], #16 // src[4..7]
+ ld1 {v4.4s}, [x2], #16 // filter[4..7]
+ fmla v0.4s, v1.4s, v2.4s // accumulator += src[0..3] * filter[0..3]
+ fmla v0.4s, v3.4s, v4.4s // accumulator += src[4..7] * filter[4..7]
+ subs w3, w3, #8 // filter_length -= 8
+ b.gt 1b // loop until filter_length
+ faddp v0.4s, v0.4s, v0.4s // pair adding of the 4x32-bit accumulated values
+ faddp v0.4s, v0.4s, v0.4s // pair adding of the 4x32-bit accumulated values
+ st1 {v0.s}[0], [x0], #4 // write accumulator
+ ret
endfunc
function ff_resample_common_apply_filter_x4_s16_neon, export=1
- movi v0.4S, #0 // accumulator
-1: ld1 {v1.4H}, [x1], #8 // src[0..3]
- ld1 {v2.4H}, [x2], #8 // filter[0..3]
- smlal v0.4S, v1.4H, v2.4H // accumulator += src[0..3] * filter[0..3]
- subs w3, w3, #4 // filter_length -= 4
- b.gt 1b // loop until filter_length
- addp v0.4S, v0.4S, v0.4S // pair adding of the 4x32-bit accumulated values
- addp v0.4S, v0.4S, v0.4S // pair adding of the 4x32-bit accumulated values
- st1 {v0.S}[0], [x0], #4 // write accumulator
- ret
+ movi v0.4s, #0 // accumulator
+1: ld1 {v1.4h}, [x1], #8 // src[0..3]
+ ld1 {v2.4h}, [x2], #8 // filter[0..3]
+ smlal v0.4s, v1.4h, v2.4h // accumulator += src[0..3] * filter[0..3]
+ subs w3, w3, #4 // filter_length -= 4
+ b.gt 1b // loop until filter_length
+ addp v0.4s, v0.4s, v0.4s // pair adding of the 4x32-bit accumulated values
+ addp v0.4s, v0.4s, v0.4s // pair adding of the 4x32-bit accumulated values
+ st1 {v0.s}[0], [x0], #4 // write accumulator
+ ret
endfunc
function ff_resample_common_apply_filter_x8_s16_neon, export=1
- movi v0.4S, #0 // accumulator
-1: ld1 {v1.8H}, [x1], #16 // src[0..7]
- ld1 {v2.8H}, [x2], #16 // filter[0..7]
- smlal v0.4S, v1.4H, v2.4H // accumulator += src[0..3] * filter[0..3]
- smlal2 v0.4S, v1.8H, v2.8H // accumulator += src[4..7] * filter[4..7]
- subs w3, w3, #8 // filter_length -= 8
- b.gt 1b // loop until filter_length
- addp v0.4S, v0.4S, v0.4S // pair adding of the 4x32-bit accumulated values
- addp v0.4S, v0.4S, v0.4S // pair adding of the 4x32-bit accumulated values
- st1 {v0.S}[0], [x0], #4 // write accumulator
- ret
+ movi v0.4s, #0 // accumulator
+1: ld1 {v1.8h}, [x1], #16 // src[0..7]
+ ld1 {v2.8h}, [x2], #16 // filter[0..7]
+ smlal v0.4s, v1.4h, v2.4h // accumulator += src[0..3] * filter[0..3]
+ smlal2 v0.4s, v1.8h, v2.8h // accumulator += src[4..7] * filter[4..7]
+ subs w3, w3, #8 // filter_length -= 8
+ b.gt 1b // loop until filter_length
+ addp v0.4s, v0.4s, v0.4s // pair adding of the 4x32-bit accumulated values
+ addp v0.4s, v0.4s, v0.4s // pair adding of the 4x32-bit accumulated values
+ st1 {v0.s}[0], [x0], #4 // write accumulator
+ ret
endfunc
diff --git a/libswresample/soxr_resample.c b/libswresample/soxr_resample.c
index 8181c749b8..00d79878ca 100644
--- a/libswresample/soxr_resample.c
+++ b/libswresample/soxr_resample.c
@@ -127,4 +127,3 @@ struct Resampler const swri_soxr_resampler={
create, destroy, process, flush, NULL /* set_compensation */, get_delay,
invert_initial_buffer, get_out_samples
};
-
diff --git a/libswresample/swresample_frame.c b/libswresample/swresample_frame.c
index 2853266d6c..0ec9f81512 100644
--- a/libswresample/swresample_frame.c
+++ b/libswresample/swresample_frame.c
@@ -156,4 +156,3 @@ int swr_convert_frame(SwrContext *s,
return convert_frame(s, out, in);
}
-
diff --git a/libswscale/aarch64/hscale.S b/libswscale/aarch64/hscale.S
index af55ffe2b7..f267760b63 100644
--- a/libswscale/aarch64/hscale.S
+++ b/libswscale/aarch64/hscale.S
@@ -21,60 +21,60 @@
#include "libavutil/aarch64/asm.S"
function ff_hscale_8_to_15_neon, export=1
- sbfiz x7, x6, #1, #32 // filterSize*2 (*2 because int16)
-1: ldr w8, [x5], #4 // filterPos[idx]
- ldr w0, [x5], #4 // filterPos[idx + 1]
- ldr w11, [x5], #4 // filterPos[idx + 2]
- ldr w9, [x5], #4 // filterPos[idx + 3]
- mov x16, x4 // filter0 = filter
- add x12, x16, x7 // filter1 = filter0 + filterSize*2
- add x13, x12, x7 // filter2 = filter1 + filterSize*2
- add x4, x13, x7 // filter3 = filter2 + filterSize*2
- movi v0.2D, #0 // val sum part 1 (for dst[0])
- movi v1.2D, #0 // val sum part 2 (for dst[1])
- movi v2.2D, #0 // val sum part 3 (for dst[2])
- movi v3.2D, #0 // val sum part 4 (for dst[3])
- add x17, x3, w8, UXTW // srcp + filterPos[0]
- add x8, x3, w0, UXTW // srcp + filterPos[1]
- add x0, x3, w11, UXTW // srcp + filterPos[2]
- add x11, x3, w9, UXTW // srcp + filterPos[3]
- mov w15, w6 // filterSize counter
-2: ld1 {v4.8B}, [x17], #8 // srcp[filterPos[0] + {0..7}]
- ld1 {v5.8H}, [x16], #16 // load 8x16-bit filter values, part 1
- ld1 {v6.8B}, [x8], #8 // srcp[filterPos[1] + {0..7}]
- ld1 {v7.8H}, [x12], #16 // load 8x16-bit at filter+filterSize
- uxtl v4.8H, v4.8B // unpack part 1 to 16-bit
- smlal v0.4S, v4.4H, v5.4H // v0 accumulates srcp[filterPos[0] + {0..3}] * filter[{0..3}]
- smlal2 v0.4S, v4.8H, v5.8H // v0 accumulates srcp[filterPos[0] + {4..7}] * filter[{4..7}]
- ld1 {v16.8B}, [x0], #8 // srcp[filterPos[2] + {0..7}]
- ld1 {v17.8H}, [x13], #16 // load 8x16-bit at filter+2*filterSize
- uxtl v6.8H, v6.8B // unpack part 2 to 16-bit
- smlal v1.4S, v6.4H, v7.4H // v1 accumulates srcp[filterPos[1] + {0..3}] * filter[{0..3}]
- uxtl v16.8H, v16.8B // unpack part 3 to 16-bit
- smlal v2.4S, v16.4H, v17.4H // v2 accumulates srcp[filterPos[2] + {0..3}] * filter[{0..3}]
- smlal2 v2.4S, v16.8H, v17.8H // v2 accumulates srcp[filterPos[2] + {4..7}] * filter[{4..7}]
- ld1 {v18.8B}, [x11], #8 // srcp[filterPos[3] + {0..7}]
- smlal2 v1.4S, v6.8H, v7.8H // v1 accumulates srcp[filterPos[1] + {4..7}] * filter[{4..7}]
- ld1 {v19.8H}, [x4], #16 // load 8x16-bit at filter+3*filterSize
- subs w15, w15, #8 // j -= 8: processed 8/filterSize
- uxtl v18.8H, v18.8B // unpack part 4 to 16-bit
- smlal v3.4S, v18.4H, v19.4H // v3 accumulates srcp[filterPos[3] + {0..3}] * filter[{0..3}]
- smlal2 v3.4S, v18.8H, v19.8H // v3 accumulates srcp[filterPos[3] + {4..7}] * filter[{4..7}]
- b.gt 2b // inner loop if filterSize not consumed completely
- addp v0.4S, v0.4S, v0.4S // part0 horizontal pair adding
- addp v1.4S, v1.4S, v1.4S // part1 horizontal pair adding
- addp v2.4S, v2.4S, v2.4S // part2 horizontal pair adding
- addp v3.4S, v3.4S, v3.4S // part3 horizontal pair adding
- addp v0.4S, v0.4S, v0.4S // part0 horizontal pair adding
- addp v1.4S, v1.4S, v1.4S // part1 horizontal pair adding
- addp v2.4S, v2.4S, v2.4S // part2 horizontal pair adding
- addp v3.4S, v3.4S, v3.4S // part3 horizontal pair adding
- zip1 v0.4S, v0.4S, v1.4S // part01 = zip values from part0 and part1
- zip1 v2.4S, v2.4S, v3.4S // part23 = zip values from part2 and part3
- mov v0.d[1], v2.d[0] // part0123 = zip values from part01 and part23
- subs w2, w2, #4 // dstW -= 4
- sqshrn v0.4H, v0.4S, #7 // shift and clip the 2x16-bit final values
- st1 {v0.4H}, [x1], #8 // write to destination part0123
- b.gt 1b // loop until end of line
+ sbfiz x7, x6, #1, #32 // filterSize*2 (*2 because int16)
+1: ldr w8, [x5], #4 // filterPos[idx]
+ ldr w0, [x5], #4 // filterPos[idx + 1]
+ ldr w11, [x5], #4 // filterPos[idx + 2]
+ ldr w9, [x5], #4 // filterPos[idx + 3]
+ mov x16, x4 // filter0 = filter
+ add x12, x16, x7 // filter1 = filter0 + filterSize*2
+ add x13, x12, x7 // filter2 = filter1 + filterSize*2
+ add x4, x13, x7 // filter3 = filter2 + filterSize*2
+ movi v0.2d, #0 // val sum part 1 (for dst[0])
+ movi v1.2d, #0 // val sum part 2 (for dst[1])
+ movi v2.2d, #0 // val sum part 3 (for dst[2])
+ movi v3.2d, #0 // val sum part 4 (for dst[3])
+ add x17, x3, w8, uxtw // srcp + filterPos[0]
+ add x8, x3, w0, uxtw // srcp + filterPos[1]
+ add x0, x3, w11, uxtw // srcp + filterPos[2]
+ add x11, x3, w9, uxtw // srcp + filterPos[3]
+ mov w15, w6 // filterSize counter
+2: ld1 {v4.8b}, [x17], #8 // srcp[filterPos[0] + {0..7}]
+ ld1 {v5.8h}, [x16], #16 // load 8x16-bit filter values, part 1
+ ld1 {v6.8b}, [x8], #8 // srcp[filterPos[1] + {0..7}]
+ ld1 {v7.8h}, [x12], #16 // load 8x16-bit at filter+filterSize
+ uxtl v4.8h, v4.8b // unpack part 1 to 16-bit
+ smlal v0.4s, v4.4h, v5.4h // v0 accumulates srcp[filterPos[0] + {0..3}] * filter[{0..3}]
+ smlal2 v0.4s, v4.8h, v5.8h // v0 accumulates srcp[filterPos[0] + {4..7}] * filter[{4..7}]
+ ld1 {v16.8b}, [x0], #8 // srcp[filterPos[2] + {0..7}]
+ ld1 {v17.8h}, [x13], #16 // load 8x16-bit at filter+2*filterSize
+ uxtl v6.8h, v6.8b // unpack part 2 to 16-bit
+ smlal v1.4s, v6.4h, v7.4h // v1 accumulates srcp[filterPos[1] + {0..3}] * filter[{0..3}]
+ uxtl v16.8h, v16.8b // unpack part 3 to 16-bit
+ smlal v2.4s, v16.4h, v17.4h // v2 accumulates srcp[filterPos[2] + {0..3}] * filter[{0..3}]
+ smlal2 v2.4s, v16.8h, v17.8h // v2 accumulates srcp[filterPos[2] + {4..7}] * filter[{4..7}]
+ ld1 {v18.8b}, [x11], #8 // srcp[filterPos[3] + {0..7}]
+ smlal2 v1.4s, v6.8h, v7.8h // v1 accumulates srcp[filterPos[1] + {4..7}] * filter[{4..7}]
+ ld1 {v19.8h}, [x4], #16 // load 8x16-bit at filter+3*filterSize
+ subs w15, w15, #8 // j -= 8: processed 8/filterSize
+ uxtl v18.8h, v18.8b // unpack part 4 to 16-bit
+ smlal v3.4s, v18.4h, v19.4h // v3 accumulates srcp[filterPos[3] + {0..3}] * filter[{0..3}]
+ smlal2 v3.4s, v18.8h, v19.8h // v3 accumulates srcp[filterPos[3] + {4..7}] * filter[{4..7}]
+ b.gt 2b // inner loop if filterSize not consumed completely
+ addp v0.4s, v0.4s, v0.4s // part0 horizontal pair adding
+ addp v1.4s, v1.4s, v1.4s // part1 horizontal pair adding
+ addp v2.4s, v2.4s, v2.4s // part2 horizontal pair adding
+ addp v3.4s, v3.4s, v3.4s // part3 horizontal pair adding
+ addp v0.4s, v0.4s, v0.4s // part0 horizontal pair adding
+ addp v1.4s, v1.4s, v1.4s // part1 horizontal pair adding
+ addp v2.4s, v2.4s, v2.4s // part2 horizontal pair adding
+ addp v3.4s, v3.4s, v3.4s // part3 horizontal pair adding
+ zip1 v0.4s, v0.4s, v1.4s // part01 = zip values from part0 and part1
+ zip1 v2.4s, v2.4s, v3.4s // part23 = zip values from part2 and part3
+ mov v0.d[1], v2.d[0] // part0123 = zip values from part01 and part23
+ subs w2, w2, #4 // dstW -= 4
+ sqshrn v0.4h, v0.4s, #7 // shift and clip the 2x16-bit final values
+ st1 {v0.4h}, [x1], #8 // write to destination part0123
+ b.gt 1b // loop until end of line
ret
endfunc
diff --git a/libswscale/aarch64/output.S b/libswscale/aarch64/output.S
index af71de6050..7c25685494 100644
--- a/libswscale/aarch64/output.S
+++ b/libswscale/aarch64/output.S
@@ -21,38 +21,38 @@
#include "libavutil/aarch64/asm.S"
function ff_yuv2planeX_8_neon, export=1
- ld1 {v0.8B}, [x5] // load 8x8-bit dither
- cbz w6, 1f // check if offsetting present
- ext v0.8B, v0.8B, v0.8B, #3 // honor offsetting which can be 0 or 3 only
-1: uxtl v0.8H, v0.8B // extend dither to 16-bit
- ushll v1.4S, v0.4H, #12 // extend dither to 32-bit with left shift by 12 (part 1)
- ushll2 v2.4S, v0.8H, #12 // extend dither to 32-bit with left shift by 12 (part 2)
- mov x7, #0 // i = 0
-2: mov v3.16B, v1.16B // initialize accumulator part 1 with dithering value
- mov v4.16B, v2.16B // initialize accumulator part 2 with dithering value
- mov w8, w1 // tmpfilterSize = filterSize
- mov x9, x2 // srcp = src
- mov x10, x0 // filterp = filter
-3: ldp x11, x12, [x9], #16 // get 2 pointers: src[j] and src[j+1]
- add x11, x11, x7, lsl #1 // &src[j ][i]
- add x12, x12, x7, lsl #1 // &src[j+1][i]
- ld1 {v5.8H}, [x11] // read 8x16-bit @ src[j ][i + {0..7}]: A,B,C,D,E,F,G,H
- ld1 {v6.8H}, [x12] // read 8x16-bit @ src[j+1][i + {0..7}]: I,J,K,L,M,N,O,P
- ld1r {v7.8H}, [x10], #2 // read 1x16-bit coeff X at filter[j ] and duplicate across lanes
- ld1r {v16.8H}, [x10], #2 // read 1x16-bit coeff Y at filter[j+1] and duplicate across lanes
- smlal v3.4S, v5.4H, v7.4H // val0 += {A,B,C,D} * X
- smlal2 v4.4S, v5.8H, v7.8H // val1 += {E,F,G,H} * X
- smlal v3.4S, v6.4H, v16.4H // val0 += {I,J,K,L} * Y
- smlal2 v4.4S, v6.8H, v16.8H // val1 += {M,N,O,P} * Y
- subs w8, w8, #2 // tmpfilterSize -= 2
- b.gt 3b // loop until filterSize consumed
+ ld1 {v0.8b}, [x5] // load 8x8-bit dither
+ cbz w6, 1f // check if offsetting present
+ ext v0.8b, v0.8b, v0.8b, #3 // honor offsetting which can be 0 or 3 only
+1: uxtl v0.8h, v0.8b // extend dither to 16-bit
+ ushll v1.4s, v0.4h, #12 // extend dither to 32-bit with left shift by 12 (part 1)
+ ushll2 v2.4s, v0.8h, #12 // extend dither to 32-bit with left shift by 12 (part 2)
+ mov x7, #0 // i = 0
+2: mov v3.16b, v1.16b // initialize accumulator part 1 with dithering value
+ mov v4.16b, v2.16b // initialize accumulator part 2 with dithering value
+ mov w8, w1 // tmpfilterSize = filterSize
+ mov x9, x2 // srcp = src
+ mov x10, x0 // filterp = filter
+3: ldp x11, x12, [x9], #16 // get 2 pointers: src[j] and src[j+1]
+ add x11, x11, x7, lsl #1 // &src[j ][i]
+ add x12, x12, x7, lsl #1 // &src[j+1][i]
+ ld1 {v5.8h}, [x11] // read 8x16-bit @ src[j ][i + {0..7}]: A,B,C,D,E,F,G,H
+ ld1 {v6.8h}, [x12] // read 8x16-bit @ src[j+1][i + {0..7}]: I,J,K,L,M,N,O,P
+ ld1r {v7.8h}, [x10], #2 // read 1x16-bit coeff X at filter[j ] and duplicate across lanes
+ ld1r {v16.8h}, [x10], #2 // read 1x16-bit coeff Y at filter[j+1] and duplicate across lanes
+ smlal v3.4s, v5.4h, v7.4h // val0 += {A,B,C,D} * X
+ smlal2 v4.4s, v5.8h, v7.8h // val1 += {E,F,G,H} * X
+ smlal v3.4s, v6.4h, v16.4h // val0 += {I,J,K,L} * Y
+ smlal2 v4.4s, v6.8h, v16.8h // val1 += {M,N,O,P} * Y
+ subs w8, w8, #2 // tmpfilterSize -= 2
+ b.gt 3b // loop until filterSize consumed
- sqshrun v3.4h, v3.4s, #16 // clip16(val0>>16)
- sqshrun2 v3.8h, v4.4s, #16 // clip16(val1>>16)
- uqshrn v3.8b, v3.8h, #3 // clip8(val>>19)
- st1 {v3.8b}, [x3], #8 // write to destination
- subs w4, w4, #8 // dstW -= 8
- add x7, x7, #8 // i += 8
- b.gt 2b // loop until width consumed
+ sqshrun v3.4h, v3.4s, #16 // clip16(val0>>16)
+ sqshrun2 v3.8h, v4.4s, #16 // clip16(val1>>16)
+ uqshrn v3.8b, v3.8h, #3 // clip8(val>>19)
+ st1 {v3.8b}, [x3], #8 // write to destination
+ subs w4, w4, #8 // dstW -= 8
+ add x7, x7, #8 // i += 8
+ b.gt 2b // loop until width consumed
ret
endfunc
diff --git a/libswscale/aarch64/yuv2rgb_neon.S b/libswscale/aarch64/yuv2rgb_neon.S
index 10bd1f7480..09a0a04cee 100644
--- a/libswscale/aarch64/yuv2rgb_neon.S
+++ b/libswscale/aarch64/yuv2rgb_neon.S
@@ -23,185 +23,185 @@
.macro load_yoff_ycoeff yoff ycoeff
#if defined(__APPLE__)
- ldp w9, w10, [sp, #\yoff]
+ ldp w9, w10, [sp, #\yoff]
#else
- ldr w9, [sp, #\yoff]
- ldr w10, [sp, #\ycoeff]
+ ldr w9, [sp, #\yoff]
+ ldr w10, [sp, #\ycoeff]
#endif
.endm
.macro load_args_nv12
- ldr x8, [sp] // table
- load_yoff_ycoeff 8, 16 // y_offset, y_coeff
- ld1 {v1.1D}, [x8]
- dup v0.8H, w10
- dup v3.8H, w9
- sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding)
- sub w5, w5, w0 // w5 = linesizeY - width (paddingY)
- sub w7, w7, w0 // w7 = linesizeC - width (paddingC)
- neg w11, w0
+ ldr x8, [sp] // table
+ load_yoff_ycoeff 8, 16 // y_offset, y_coeff
+ ld1 {v1.1d}, [x8]
+ dup v0.8h, w10
+ dup v3.8h, w9
+ sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding)
+ sub w5, w5, w0 // w5 = linesizeY - width (paddingY)
+ sub w7, w7, w0 // w7 = linesizeC - width (paddingC)
+ neg w11, w0
.endm
.macro load_args_nv21
- load_args_nv12
+ load_args_nv12
.endm
.macro load_args_yuv420p
- ldr x13, [sp] // srcV
- ldr w14, [sp, #8] // linesizeV
- ldr x8, [sp, #16] // table
- load_yoff_ycoeff 24, 32 // y_offset, y_coeff
- ld1 {v1.1D}, [x8]
- dup v0.8H, w10
- dup v3.8H, w9
- sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding)
- sub w5, w5, w0 // w5 = linesizeY - width (paddingY)
- sub w7, w7, w0, lsr #1 // w7 = linesizeU - width / 2 (paddingU)
- sub w14, w14, w0, lsr #1 // w14 = linesizeV - width / 2 (paddingV)
- lsr w11, w0, #1
- neg w11, w11
+ ldr x13, [sp] // srcV
+ ldr w14, [sp, #8] // linesizeV
+ ldr x8, [sp, #16] // table
+ load_yoff_ycoeff 24, 32 // y_offset, y_coeff
+ ld1 {v1.1d}, [x8]
+ dup v0.8h, w10
+ dup v3.8h, w9
+ sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding)
+ sub w5, w5, w0 // w5 = linesizeY - width (paddingY)
+ sub w7, w7, w0, lsr #1 // w7 = linesizeU - width / 2 (paddingU)
+ sub w14, w14, w0, lsr #1 // w14 = linesizeV - width / 2 (paddingV)
+ lsr w11, w0, #1
+ neg w11, w11
.endm
.macro load_args_yuv422p
- ldr x13, [sp] // srcV
- ldr w14, [sp, #8] // linesizeV
- ldr x8, [sp, #16] // table
- load_yoff_ycoeff 24, 32 // y_offset, y_coeff
- ld1 {v1.1D}, [x8]
- dup v0.8H, w10
- dup v3.8H, w9
- sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding)
- sub w5, w5, w0 // w5 = linesizeY - width (paddingY)
- sub w7, w7, w0, lsr #1 // w7 = linesizeU - width / 2 (paddingU)
- sub w14, w14, w0, lsr #1 // w14 = linesizeV - width / 2 (paddingV)
+ ldr x13, [sp] // srcV
+ ldr w14, [sp, #8] // linesizeV
+ ldr x8, [sp, #16] // table
+ load_yoff_ycoeff 24, 32 // y_offset, y_coeff
+ ld1 {v1.1d}, [x8]
+ dup v0.8h, w10
+ dup v3.8h, w9
+ sub w3, w3, w0, lsl #2 // w3 = linesize - width * 4 (padding)
+ sub w5, w5, w0 // w5 = linesizeY - width (paddingY)
+ sub w7, w7, w0, lsr #1 // w7 = linesizeU - width / 2 (paddingU)
+ sub w14, w14, w0, lsr #1 // w14 = linesizeV - width / 2 (paddingV)
.endm
.macro load_chroma_nv12
- ld2 {v16.8B, v17.8B}, [x6], #16
- ushll v18.8H, v16.8B, #3
- ushll v19.8H, v17.8B, #3
+ ld2 {v16.8b, v17.8b}, [x6], #16
+ ushll v18.8h, v16.8b, #3
+ ushll v19.8h, v17.8b, #3
.endm
.macro load_chroma_nv21
- ld2 {v16.8B, v17.8B}, [x6], #16
- ushll v19.8H, v16.8B, #3
- ushll v18.8H, v17.8B, #3
+ ld2 {v16.8b, v17.8b}, [x6], #16
+ ushll v19.8h, v16.8b, #3
+ ushll v18.8h, v17.8b, #3
.endm
.macro load_chroma_yuv420p
- ld1 {v16.8B}, [ x6], #8
- ld1 {v17.8B}, [x13], #8
- ushll v18.8H, v16.8B, #3
- ushll v19.8H, v17.8B, #3
+ ld1 {v16.8b}, [ x6], #8
+ ld1 {v17.8b}, [x13], #8
+ ushll v18.8h, v16.8b, #3
+ ushll v19.8h, v17.8b, #3
.endm
.macro load_chroma_yuv422p
- load_chroma_yuv420p
+ load_chroma_yuv420p
.endm
.macro increment_nv12
- ands w15, w1, #1
- csel w16, w7, w11, ne // incC = (h & 1) ? paddincC : -width
- add x6, x6, w16, SXTW // srcC += incC
+ ands w15, w1, #1
+ csel w16, w7, w11, ne // incC = (h & 1) ? paddincC : -width
+ add x6, x6, w16, sxtw // srcC += incC
.endm
.macro increment_nv21
- increment_nv12
+ increment_nv12
.endm
.macro increment_yuv420p
- ands w15, w1, #1
- csel w16, w7, w11, ne // incU = (h & 1) ? paddincU : -width/2
- csel w17, w14, w11, ne // incV = (h & 1) ? paddincV : -width/2
- add x6, x6, w16, SXTW // srcU += incU
- add x13, x13, w17, SXTW // srcV += incV
+ ands w15, w1, #1
+ csel w16, w7, w11, ne // incU = (h & 1) ? paddincU : -width/2
+ csel w17, w14, w11, ne // incV = (h & 1) ? paddincV : -width/2
+ add x6, x6, w16, sxtw // srcU += incU
+ add x13, x13, w17, sxtw // srcV += incV
.endm
.macro increment_yuv422p
- add x6, x6, w7, SXTW // srcU += incU
- add x13, x13, w14, SXTW // srcV += incV
+ add x6, x6, w7, sxtw // srcU += incU
+ add x13, x13, w14, sxtw // srcV += incV
.endm
.macro compute_rgba r1 g1 b1 a1 r2 g2 b2 a2
- add v20.8H, v26.8H, v20.8H // Y1 + R1
- add v21.8H, v27.8H, v21.8H // Y2 + R2
- add v22.8H, v26.8H, v22.8H // Y1 + G1
- add v23.8H, v27.8H, v23.8H // Y2 + G2
- add v24.8H, v26.8H, v24.8H // Y1 + B1
- add v25.8H, v27.8H, v25.8H // Y2 + B2
- sqrshrun \r1, v20.8H, #1 // clip_u8((Y1 + R1) >> 1)
- sqrshrun \r2, v21.8H, #1 // clip_u8((Y2 + R1) >> 1)
- sqrshrun \g1, v22.8H, #1 // clip_u8((Y1 + G1) >> 1)
- sqrshrun \g2, v23.8H, #1 // clip_u8((Y2 + G1) >> 1)
- sqrshrun \b1, v24.8H, #1 // clip_u8((Y1 + B1) >> 1)
- sqrshrun \b2, v25.8H, #1 // clip_u8((Y2 + B1) >> 1)
- movi \a1, #255
- movi \a2, #255
+ add v20.8h, v26.8h, v20.8h // Y1 + R1
+ add v21.8h, v27.8h, v21.8h // Y2 + R2
+ add v22.8h, v26.8h, v22.8h // Y1 + G1
+ add v23.8h, v27.8h, v23.8h // Y2 + G2
+ add v24.8h, v26.8h, v24.8h // Y1 + B1
+ add v25.8h, v27.8h, v25.8h // Y2 + B2
+ sqrshrun \r1, v20.8h, #1 // clip_u8((Y1 + R1) >> 1)
+ sqrshrun \r2, v21.8h, #1 // clip_u8((Y2 + R1) >> 1)
+ sqrshrun \g1, v22.8h, #1 // clip_u8((Y1 + G1) >> 1)
+ sqrshrun \g2, v23.8h, #1 // clip_u8((Y2 + G1) >> 1)
+ sqrshrun \b1, v24.8h, #1 // clip_u8((Y1 + B1) >> 1)
+ sqrshrun \b2, v25.8h, #1 // clip_u8((Y2 + B1) >> 1)
+ movi \a1, #255
+ movi \a2, #255
.endm
.macro declare_func ifmt ofmt
function ff_\ifmt\()_to_\ofmt\()_neon, export=1
- load_args_\ifmt
+ load_args_\ifmt
1:
- mov w8, w0 // w8 = width
+ mov w8, w0 // w8 = width
2:
- movi v5.8H, #4, lsl #8 // 128 * (1<<3)
- load_chroma_\ifmt
- sub v18.8H, v18.8H, v5.8H // U*(1<<3) - 128*(1<<3)
- sub v19.8H, v19.8H, v5.8H // V*(1<<3) - 128*(1<<3)
- sqdmulh v20.8H, v19.8H, v1.H[0] // V * v2r (R)
- sqdmulh v22.8H, v18.8H, v1.H[1] // U * u2g
- sqdmulh v19.8H, v19.8H, v1.H[2] // V * v2g
- add v22.8H, v22.8H, v19.8H // U * u2g + V * v2g (G)
- sqdmulh v24.8H, v18.8H, v1.H[3] // U * u2b (B)
- zip2 v21.8H, v20.8H, v20.8H // R2
- zip1 v20.8H, v20.8H, v20.8H // R1
- zip2 v23.8H, v22.8H, v22.8H // G2
- zip1 v22.8H, v22.8H, v22.8H // G1
- zip2 v25.8H, v24.8H, v24.8H // B2
- zip1 v24.8H, v24.8H, v24.8H // B1
- ld1 {v2.16B}, [x4], #16 // load luma
- ushll v26.8H, v2.8B, #3 // Y1*(1<<3)
- ushll2 v27.8H, v2.16B, #3 // Y2*(1<<3)
- sub v26.8H, v26.8H, v3.8H // Y1*(1<<3) - y_offset
- sub v27.8H, v27.8H, v3.8H // Y2*(1<<3) - y_offset
- sqdmulh v26.8H, v26.8H, v0.8H // ((Y1*(1<<3) - y_offset) * y_coeff) >> 15
- sqdmulh v27.8H, v27.8H, v0.8H // ((Y2*(1<<3) - y_offset) * y_coeff) >> 15
+ movi v5.8h, #4, lsl #8 // 128 * (1<<3)
+ load_chroma_\ifmt
+ sub v18.8h, v18.8h, v5.8h // U*(1<<3) - 128*(1<<3)
+ sub v19.8h, v19.8h, v5.8h // V*(1<<3) - 128*(1<<3)
+ sqdmulh v20.8h, v19.8h, v1.h[0] // V * v2r (R)
+ sqdmulh v22.8h, v18.8h, v1.h[1] // U * u2g
+ sqdmulh v19.8h, v19.8h, v1.h[2] // V * v2g
+ add v22.8h, v22.8h, v19.8h // U * u2g + V * v2g (G)
+ sqdmulh v24.8h, v18.8h, v1.h[3] // U * u2b (B)
+ zip2 v21.8h, v20.8h, v20.8h // R2
+ zip1 v20.8h, v20.8h, v20.8h // R1
+ zip2 v23.8h, v22.8h, v22.8h // G2
+ zip1 v22.8h, v22.8h, v22.8h // G1
+ zip2 v25.8h, v24.8h, v24.8h // B2
+ zip1 v24.8h, v24.8h, v24.8h // B1
+ ld1 {v2.16b}, [x4], #16 // load luma
+ ushll v26.8h, v2.8b, #3 // Y1*(1<<3)
+ ushll2 v27.8h, v2.16b, #3 // Y2*(1<<3)
+ sub v26.8h, v26.8h, v3.8h // Y1*(1<<3) - y_offset
+ sub v27.8h, v27.8h, v3.8h // Y2*(1<<3) - y_offset
+ sqdmulh v26.8h, v26.8h, v0.8h // ((Y1*(1<<3) - y_offset) * y_coeff) >> 15
+ sqdmulh v27.8h, v27.8h, v0.8h // ((Y2*(1<<3) - y_offset) * y_coeff) >> 15
.ifc \ofmt,argb // 1 2 3 0
- compute_rgba v5.8B,v6.8B,v7.8B,v4.8B, v17.8B,v18.8B,v19.8B,v16.8B
+ compute_rgba v5.8b,v6.8b,v7.8b,v4.8b, v17.8b,v18.8b,v19.8b,v16.8b
.endif
.ifc \ofmt,rgba // 0 1 2 3
- compute_rgba v4.8B,v5.8B,v6.8B,v7.8B, v16.8B,v17.8B,v18.8B,v19.8B
+ compute_rgba v4.8b,v5.8b,v6.8b,v7.8b, v16.8b,v17.8b,v18.8b,v19.8b
.endif
.ifc \ofmt,abgr // 3 2 1 0
- compute_rgba v7.8B,v6.8B,v5.8B,v4.8B, v19.8B,v18.8B,v17.8B,v16.8B
+ compute_rgba v7.8b,v6.8b,v5.8b,v4.8b, v19.8b,v18.8b,v17.8b,v16.8b
.endif
.ifc \ofmt,bgra // 2 1 0 3
- compute_rgba v6.8B,v5.8B,v4.8B,v7.8B, v18.8B,v17.8B,v16.8B,v19.8B
+ compute_rgba v6.8b,v5.8b,v4.8b,v7.8b, v18.8b,v17.8b,v16.8b,v19.8b
.endif
- st4 { v4.8B, v5.8B, v6.8B, v7.8B}, [x2], #32
- st4 {v16.8B,v17.8B,v18.8B,v19.8B}, [x2], #32
- subs w8, w8, #16 // width -= 16
- b.gt 2b
- add x2, x2, w3, SXTW // dst += padding
- add x4, x4, w5, SXTW // srcY += paddingY
- increment_\ifmt
- subs w1, w1, #1 // height -= 1
- b.gt 1b
- ret
+ st4 { v4.8b, v5.8b, v6.8b, v7.8b}, [x2], #32
+ st4 {v16.8b,v17.8b,v18.8b,v19.8b}, [x2], #32
+ subs w8, w8, #16 // width -= 16
+ b.gt 2b
+ add x2, x2, w3, sxtw // dst += padding
+ add x4, x4, w5, sxtw // srcY += paddingY
+ increment_\ifmt
+ subs w1, w1, #1 // height -= 1
+ b.gt 1b
+ ret
endfunc
.endm
.macro declare_rgb_funcs ifmt
- declare_func \ifmt, argb
- declare_func \ifmt, rgba
- declare_func \ifmt, abgr
- declare_func \ifmt, bgra
+ declare_func \ifmt, argb
+ declare_func \ifmt, rgba
+ declare_func \ifmt, abgr
+ declare_func \ifmt, bgra
.endm
declare_rgb_funcs nv12
diff --git a/libswscale/gamma.c b/libswscale/gamma.c
index d7470cb1c9..bb1177bce3 100644
--- a/libswscale/gamma.c
+++ b/libswscale/gamma.c
@@ -69,4 +69,3 @@ int ff_init_gamma_convert(SwsFilterDescriptor *desc, SwsSlice * src, uint16_t *t
return 0;
}
-
diff --git a/libswscale/vscale.c b/libswscale/vscale.c
index 9ed227e908..96bb96ecb1 100644
--- a/libswscale/vscale.c
+++ b/libswscale/vscale.c
@@ -318,5 +318,3 @@ void ff_init_vscale_pfn(SwsContext *c,
lumCtx->pfn.yuv2anyX = yuv2anyX;
}
}
-
-
diff --git a/libswscale/x86/yuv2rgb_template.c b/libswscale/x86/yuv2rgb_template.c
index d506f75e15..344e61abe8 100644
--- a/libswscale/x86/yuv2rgb_template.c
+++ b/libswscale/x86/yuv2rgb_template.c
@@ -192,4 +192,3 @@ static inline int RENAME(yuv420_bgr24)(SwsContext *c, const uint8_t *src[],
}
return srcSliceH;
}
-
diff --git a/tests/extended.ffconcat b/tests/extended.ffconcat
index 7359113c23..5ef64320b5 100644
--- a/tests/extended.ffconcat
+++ b/tests/extended.ffconcat
@@ -111,4 +111,3 @@ outpoint 00:00.40
file %SRCFILE%
inpoint 00:00.40
-
diff --git a/tests/fate/ffprobe.mak b/tests/fate/ffprobe.mak
index c867bebf41..8656752109 100644
--- a/tests/fate/ffprobe.mak
+++ b/tests/fate/ffprobe.mak
@@ -32,4 +32,3 @@ fate-ffprobe_xml: CMD = run $(FFPROBE_COMMAND) -of xml
FATE_FFPROBE += $(FATE_FFPROBE-yes)
fate-ffprobe: $(FATE_FFPROBE)
-
diff --git a/tests/fate/flvenc.mak b/tests/fate/flvenc.mak
index 4fdeeff4c1..373e347892 100644
--- a/tests/fate/flvenc.mak
+++ b/tests/fate/flvenc.mak
@@ -7,5 +7,3 @@ tests/data/add_keyframe_index.flv: ffmpeg$(PROGSSUF)$(EXESUF) | tests/data
FATE_AFILTER-$(call ALLYES, FLV_MUXER FLV_DEMUXER AVDEVICE TESTSRC_FILTER LAVFI_INDEV FLV_ENCODER) += fate-flv-add_keyframe_index
fate-flv-add_keyframe_index: tests/data/add_keyframe_index.flv
fate-flv-add_keyframe_index: CMD = ffmetadata -flags +bitexact -i $(TARGET_PATH)/tests/data/add_keyframe_index.flv
-
-
diff --git a/tests/fate/lossless-audio.mak b/tests/fate/lossless-audio.mak
index 66ac6d8972..9e5f09a0f4 100644
--- a/tests/fate/lossless-audio.mak
+++ b/tests/fate/lossless-audio.mak
@@ -30,4 +30,3 @@ FATE_SAMPLES_LOSSLESS_AUDIO += $(FATE_SAMPLES_LOSSLESS_AUDIO-yes)
FATE_SAMPLES_FFMPEG += $(FATE_SAMPLES_LOSSLESS_AUDIO)
fate-lossless-audio: $(FATE_SAMPLES_LOSSLESS_AUDIO)
-
diff --git a/tests/simple1.ffconcat b/tests/simple1.ffconcat
index 0a754af421..ae62126bb2 100644
--- a/tests/simple1.ffconcat
+++ b/tests/simple1.ffconcat
@@ -9,4 +9,3 @@ file %SRCFILE%
inpoint 00:00.20
outpoint 00:00.40
file_packet_metadata dummy=1
-
diff --git a/tests/simple2.ffconcat b/tests/simple2.ffconcat
index 2a0a1b5c9e..910da188e0 100644
--- a/tests/simple2.ffconcat
+++ b/tests/simple2.ffconcat
@@ -16,4 +16,3 @@ inpoint 00:02.20
file %SRCFILE%
inpoint 00:01.80
outpoint 00:02.00
-
--
2.49.1
>From c24cea065cbfb5a6bd61901c16305d0ef3bfa275 Mon Sep 17 00:00:00 2001
From: Timo Rothenpieler <timo@rothenpieler.org>
Date: Sun, 30 Nov 2025 21:39:04 +0100
Subject: [PATCH 4/4] tools/check_arm_indent: skip empty glob
---
tools/check_arm_indent.sh | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/check_arm_indent.sh b/tools/check_arm_indent.sh
index 5becfe0aec..ff6f4006f1 100755
--- a/tools/check_arm_indent.sh
+++ b/tools/check_arm_indent.sh
@@ -34,6 +34,9 @@ fi
ret=0
for i in */aarch64/*.S */aarch64/*/*.S; do
+ if ! [ -f "$i" ]; then
+ continue
+ fi
case $i in
libavcodec/aarch64/h264idct_neon.S|libavcodec/aarch64/h26x/epel_neon.S|libavcodec/aarch64/h26x/qpel_neon.S|libavcodec/aarch64/vc1dsp_neon.S)
# Skip files with known (and tolerated) deviations from the tool.
--
2.49.1
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-01-01 21:30 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-01 21:28 [FFmpeg-devel] [PR] Backport CI to release/4.3 (PR #21351) Timo Rothenpieler via ffmpeg-devel
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git