From: Hubert Mazur <hum@semihalf.com>
To: "Martin Storsjö" <martin@martin.st>
Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com,
ffmpeg-devel@ffmpeg.org, mw@semihalf.com, spop@amazon.com
Subject: Re: [FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19
Date: Tue, 25 Oct 2022 09:14:50 +0200
Message-ID: <CAGEip35hgrFEhD=YtAk9AJMDrxEH4HcbxMgTAhn-stpSrPJg0w@mail.gmail.com> (raw)
In-Reply-To: <757bb051-c5c3-91cb-cb7a-f5854bc6d26e@martin.st>
Thanks for the review.
I will fix the failing checkasm first and then take care of the minor
issues. I will try to to resend fixed versions this week.
Regards,
Hubert
On Mon, Oct 24, 2022 at 3:19 PM Martin Storsjö <martin@martin.st> wrote:
> On Mon, 17 Oct 2022, Hubert Mazur wrote:
>
> > Provide arm64 neon optimized implementations for hscale16To19 with
> > filter sizes 4, 8 and X4.
> >
> > The tests and benchmarks run on AWS Graviton 2 instances.
> > The results from a checkasm tool are shown below.
> >
> > hscale_16_to_19__fs_4_dstW_512_c: 6216.0
> > hscale_16_to_19__fs_4_dstW_512_neon: 2257.0
> > hscale_16_to_19__fs_8_dstW_512_c: 10417.7
> > hscale_16_to_19__fs_8_dstW_512_neon: 3112.5
> > hscale_16_to_19__fs_12_dstW_512_c: 14890.5
> > hscale_16_to_19__fs_12_dstW_512_neon: 3899.0
> > hscale_16_to_19__fs_16_dstW_512_c: 19006.5
> > hscale_16_to_19__fs_16_dstW_512_neon: 5341.2
> > hscale_16_to_19__fs_32_dstW_512_c: 36629.5
> > hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
> > hscale_16_to_19__fs_40_dstW_512_c: 45477.5
> > hscale_16_to_19__fs_40_dstW_512_neon: 11552.0
> >
> > Signed-off-by: Hubert Mazur <hum@semihalf.com>
> > ---
> > libswscale/aarch64/hscale.S | 402 +++++++++++++++++++++++++++++++++++
> > libswscale/aarch64/swscale.c | 70 +++++-
> > 2 files changed, 471 insertions(+), 1 deletion(-)
>
> > +void ff_hscale16to19_4_neon_asm(int shift, int16_t *_dst, int dstW,
> > + const uint8_t *_src, const int16_t *filter,
> > + const int32_t *filterPos, int filterSize);
> > +void ff_hscale16to19_X8_neon_asm(int shift, int16_t *_dst, int dstW,
> > + const uint8_t *_src, const int16_t *filter,
> > + const int32_t *filterPos, int filterSize);
> > +void ff_hscale16to19_X4_neon_asm(int shift, int16_t *_dst, int dstW,
> > + const uint8_t *_src, const int16_t *filter,
> > + const int32_t *filterPos, int filterSize);
> > +
> > #define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \
> > void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt(
> \
> > SwsContext *c, int16_t
> *data, \
> > @@ -43,7 +53,8 @@ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ##
> filter_n ## _ ## opt( \
> > #define SCALE_FUNCS(filter_n, opt) \
> > SCALE_FUNC(filter_n, 8, 15, opt); \
> > SCALE_FUNC(filter_n, 8, 19, opt); \
> > - SCALE_FUNC(filter_n, 16, 15, opt);
> > + SCALE_FUNC(filter_n, 16, 15, opt); \
> > + SCALE_FUNC(filter_n, 16, 19, opt);
>
> So this declares the functions we're implementing as C wrappers below, and
> the manual declarations further up declare the actual asm functions?
>
> I guess that works, although it makes unnecessary extern functions. In
> such cases, we usually have the C functions be static functions, placed
> above the code that uses them. But it's not a big deal.
>
> Other than that, this patchset mostly seems fine.
>
> However, I tested the patches on x86, and the new checkasm tests do fail
> on x86 (both i386 and x86_64) - so that needs to be fixed anyway. So since
> we'll need to do a new round anyway, please do try to fix up the minor
> cosmetics I mentioned.
>
> // Martin
>
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
prev parent reply other threads:[~2022-10-25 7:15 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-17 13:07 [FFmpeg-devel] [PATCH 0/4] Provide neon implementations for hscale functions Hubert Mazur
2022-10-17 13:07 ` [FFmpeg-devel] [PATCH 1/4] sw_scale: Add specializations for hscale 8 to 19 Hubert Mazur
2022-10-24 12:31 ` Martin Storsjö
2022-10-17 13:07 ` [FFmpeg-devel] [PATCH 2/4] tests/sw_scale: Add test cases for input sizes 16 Hubert Mazur
2022-10-17 13:07 ` [FFmpeg-devel] [PATCH 3/4] sw_scale: Add specializations for hscale 16 to 15 Hubert Mazur
2022-10-17 13:07 ` [FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19 Hubert Mazur
2022-10-24 13:19 ` Martin Storsjö
2022-10-25 7:14 ` Hubert Mazur [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGEip35hgrFEhD=YtAk9AJMDrxEH4HcbxMgTAhn-stpSrPJg0w@mail.gmail.com' \
--to=hum@semihalf.com \
--cc=ffmpeg-devel@ffmpeg.org \
--cc=gjb@semihalf.com \
--cc=jswinney@amazon.com \
--cc=martin@martin.st \
--cc=mw@semihalf.com \
--cc=spop@amazon.com \
--cc=upstream@semihalf.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git