Re: [FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19

From: Hubert Mazur <hum@semihalf.com>
To: "Martin Storsjö" <martin@martin.st>
Cc: gjb@semihalf.com, upstream@semihalf.com, jswinney@amazon.com,
	ffmpeg-devel@ffmpeg.org, mw@semihalf.com, spop@amazon.com
Subject: Re: [FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19
Date: Tue, 25 Oct 2022 09:14:50 +0200
Message-ID: <CAGEip35hgrFEhD=YtAk9AJMDrxEH4HcbxMgTAhn-stpSrPJg0w@mail.gmail.com> (raw)
In-Reply-To: <757bb051-c5c3-91cb-cb7a-f5854bc6d26e@martin.st>

Thanks for the review.
I will fix the failing checkasm first and then take care of the minor
issues. I will try to to resend fixed versions this week.

Regards,
Hubert

On Mon, Oct 24, 2022 at 3:19 PM Martin Storsjö <martin@martin.st> wrote:

> On Mon, 17 Oct 2022, Hubert Mazur wrote:
>
> > Provide arm64 neon optimized implementations for hscale16To19 with
> > filter sizes 4, 8 and X4.
> >
> > The tests and benchmarks run on AWS Graviton 2 instances.
> > The results from a checkasm tool are shown below.
> >
> > hscale_16_to_19__fs_4_dstW_512_c: 6216.0
> > hscale_16_to_19__fs_4_dstW_512_neon: 2257.0
> > hscale_16_to_19__fs_8_dstW_512_c: 10417.7
> > hscale_16_to_19__fs_8_dstW_512_neon: 3112.5
> > hscale_16_to_19__fs_12_dstW_512_c: 14890.5
> > hscale_16_to_19__fs_12_dstW_512_neon: 3899.0
> > hscale_16_to_19__fs_16_dstW_512_c: 19006.5
> > hscale_16_to_19__fs_16_dstW_512_neon: 5341.2
> > hscale_16_to_19__fs_32_dstW_512_c: 36629.5
> > hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
> > hscale_16_to_19__fs_40_dstW_512_c: 45477.5
> > hscale_16_to_19__fs_40_dstW_512_neon: 11552.0
> >
> > Signed-off-by: Hubert Mazur <hum@semihalf.com>
> > ---
> > libswscale/aarch64/hscale.S  | 402 +++++++++++++++++++++++++++++++++++
> > libswscale/aarch64/swscale.c |  70 +++++-
> > 2 files changed, 471 insertions(+), 1 deletion(-)
>
> > +void ff_hscale16to19_4_neon_asm(int shift, int16_t *_dst, int dstW,
> > +                      const uint8_t *_src, const int16_t *filter,
> > +                      const int32_t *filterPos, int filterSize);
> > +void ff_hscale16to19_X8_neon_asm(int shift, int16_t *_dst, int dstW,
> > +                      const uint8_t *_src, const int16_t *filter,
> > +                      const int32_t *filterPos, int filterSize);
> > +void ff_hscale16to19_X4_neon_asm(int shift, int16_t *_dst, int dstW,
> > +                      const uint8_t *_src, const int16_t *filter,
> > +                      const int32_t *filterPos, int filterSize);
> > +
> > #define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \
> > void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt(
> \
> >                                                 SwsContext *c, int16_t
> *data, \
> > @@ -43,7 +53,8 @@ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ##
> filter_n ## _ ## opt( \
> > #define SCALE_FUNCS(filter_n, opt) \
> >     SCALE_FUNC(filter_n,  8, 15, opt); \
> >     SCALE_FUNC(filter_n, 8, 19, opt); \
> > -    SCALE_FUNC(filter_n, 16, 15, opt);
> > +    SCALE_FUNC(filter_n, 16, 15, opt); \
> > +    SCALE_FUNC(filter_n, 16, 19, opt);
>
> So this declares the functions we're implementing as C wrappers below, and
> the manual declarations further up declare the actual asm functions?
>
> I guess that works, although it makes unnecessary extern functions. In
> such cases, we usually have the C functions be static functions, placed
> above the code that uses them. But it's not a big deal.
>
> Other than that, this patchset mostly seems fine.
>
> However, I tested the patches on x86, and the new checkasm tests do fail
> on x86 (both i386 and x86_64) - so that needs to be fixed anyway. So since
> we'll need to do a new round anyway, please do try to fix up the minor
> cosmetics I mentioned.
>
> // Martin
>
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".