Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
From: Niklas Haas <ffmpeg@haasn.xyz>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH] swscale/aarch64: dotprod implementation of rgba32_to_Y
Date: Fri, 28 Feb 2025 12:32:12 +0100
Message-ID: <20250228123212.GB629346@haasn.xyz> (raw)
In-Reply-To: <GV1SPRMB0036660B3DEF03336AFA45B08FCC2@GV1SPRMB0036.EURP250.PROD.OUTLOOK.COM>

On Fri, 28 Feb 2025 11:49:53 +0100 Andreas Rheinhardt <andreas.rheinhardt@outlook.com> wrote:
> Niklas Haas:
> > On Fri, 28 Feb 2025 10:31:19 +0800 Zhao Zhili <quinkblack@foxmail.com> wrote:
> >> Cc haasn.
> >>
> >> Libswscale in under refactor. Does current asm works after refactor, or they need to be refactored or
> >> rewrite after? If it’s the second case, maybe we should hold on to do more asm with libswscale
> >> before hassn work done.
> >
> > No, almost all current asm will be unused after the rewrite. There are some we
> > can in theory reuse, but for the most part, it doesn't seem to be worth it.
> >
> > Especially for the very bespoke functions like this one.
> >
> > For context, in general, the focus in nu-swscale is to focus more on smaller,
> > flexible primitives and have the calling code combine them as needed. So instead
> > of a "brga_to_y" function, you would have a sequence that looks like this:
> >
> > Operation list:
> >   [ u8 XXXX -> dddX] SWS_OP_READ         : 4 elem(s) packed >> 0
> >   [ u8 ...X -> dddX] SWS_OP_SWIZZLE      : 2103
> >   [ u8 ...X -> dddX] SWS_OP_CONVERT      : u8 -> f32
> >   [f32 ...X -> .XXX] SWS_OP_LINEAR       : dot3 [[0.299000 0.587000 0.114000 0 0] [0 1 0 0 0] [0 0 1 0 0] [0 0 0 1 0]]
> >   [f32 .XXX -> .XXX] SWS_OP_DITHER       : 16x16 {255 _ _ _}
> >   [f32 .XXX -> dXXX] SWS_OP_CONVERT      : f32 -> u8
> >   [ u8 .XXX -> XXXX] SWS_OP_WRITE        : 1 elem(s) packed >> 0
> >
> > Where each low-level implementation can combine one, or multiple, such
> > operations together. For example, in the current prototype, SWS_OP_CONVERT and
> > SWS_OP_WRITE can be fused together into a single implementation.
> >
> > Note also the conversion to float. I found that the cost of going through
> > floats seems to be lower on average, across all tested platforms, than the
> > extra cost of dealing with integers (which require extra shifting, extra
> > dithering, and extra width conversions - all of which exceed the cost of just
> > one extra float->int conversion step). This also comes with improved accuracy.
> >
> But what about bitexactness?

Are you worried about bitexactness relative to an integer implementation, or
bitexactness between platforms?

For the former, all coefficients are calculated as AVRational, using only
exact values (e.g. matrix coefficients as taken from the spec), and collapsed
down to a single linear operation in the end. This guarantees no loss of
precision, as long as we pick a floating point precision in the end that is
sufficient to store all of the needed bits of precision. (This can even be
determined automatically)

For example, if the input is 16 bit or below, a 32 bit float is enough to
guarantee bitexactness. For 32 bit or higher inputs, we would need to bump
up to 64 bit intermediates, although note that swscale currently does not
accept 32 bit integer coefficients in any case.

Also, in the special cases where all matrix coefficients are integers (which
we can easily check for AVRational), we can even skip the float conversion
(and dithering) steps entirely and collapse it down to e.g. a pure bit shift.

For the latter, as long as all platforms implement IEEE semantics I don't
think there is any room for deviation.

>
> - Andreas
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

      reply	other threads:[~2025-02-28 11:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-27 22:44 Krzysztof Pyrkosz via ffmpeg-devel
2025-02-28  2:31 ` Zhao Zhili
2025-02-28 10:21   ` Niklas Haas
2025-02-28 10:43     ` Martin Storsjö
2025-02-28 10:49     ` Andreas Rheinhardt
2025-02-28 11:32       ` Niklas Haas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250228123212.GB629346@haasn.xyz \
    --to=ffmpeg@haasn.xyz \
    --cc=ffmpeg-devel@ffmpeg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git