* [FFmpeg-devel] [PATCH] lavc/pixblockdsp: specialise aligned 16-bit get_pixels
@ 2024-07-25 15:53 Rémi Denis-Courmont
2024-07-25 16:16 ` James Almer
0 siblings, 1 reply; 5+ messages in thread
From: Rémi Denis-Courmont @ 2024-07-25 15:53 UTC (permalink / raw)
To: ffmpeg-devel
The current code assumes that we have unaligned rows, which hurts on
platforms with slower unaligned accesses. (Also, this lets the compiler
unroll manually, which it seems to do in practice.)
---
libavcodec/pixblockdsp.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/libavcodec/pixblockdsp.c b/libavcodec/pixblockdsp.c
index bbbeca1618..1fff244511 100644
--- a/libavcodec/pixblockdsp.c
+++ b/libavcodec/pixblockdsp.c
@@ -26,6 +26,13 @@
static void get_pixels_16_c(int16_t *restrict block, const uint8_t *pixels,
ptrdiff_t stride)
+{
+ for (int i = 0; i < 8; i++)
+ AV_COPY128(block + i * 8, pixels + i * stride);
+}
+
+static void get_pixels_unaligned_16_c(int16_t *restrict block,
+ const uint8_t *pixels, ptrdiff_t stride)
{
AV_COPY128U(block + 0 * 8, pixels + 0 * stride);
AV_COPY128U(block + 1 * 8, pixels + 1 * stride);
@@ -90,7 +97,7 @@ av_cold void ff_pixblockdsp_init(PixblockDSPContext *c, AVCodecContext *avctx)
case 10:
case 12:
case 14:
- c->get_pixels_unaligned =
+ c->get_pixels_unaligned = get_pixels_unaligned_16_c;
c->get_pixels = get_pixels_16_c;
break;
default:
--
2.45.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] lavc/pixblockdsp: specialise aligned 16-bit get_pixels
2024-07-25 15:53 [FFmpeg-devel] [PATCH] lavc/pixblockdsp: specialise aligned 16-bit get_pixels Rémi Denis-Courmont
@ 2024-07-25 16:16 ` James Almer
2024-07-25 16:50 ` Rémi Denis-Courmont
0 siblings, 1 reply; 5+ messages in thread
From: James Almer @ 2024-07-25 16:16 UTC (permalink / raw)
To: ffmpeg-devel
On 7/25/2024 12:53 PM, Rémi Denis-Courmont wrote:
> The current code assumes that we have unaligned rows, which hurts on
> platforms with slower unaligned accesses. (Also, this lets the compiler
> unroll manually, which it seems to do in practice.)
> ---
> libavcodec/pixblockdsp.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/libavcodec/pixblockdsp.c b/libavcodec/pixblockdsp.c
> index bbbeca1618..1fff244511 100644
> --- a/libavcodec/pixblockdsp.c
> +++ b/libavcodec/pixblockdsp.c
> @@ -26,6 +26,13 @@
>
> static void get_pixels_16_c(int16_t *restrict block, const uint8_t *pixels,
> ptrdiff_t stride)
Is there a way to hint the compiler that block is 16 byte aligned? GCC
14 at least emits unaligned loads and stores for these.
> +{
> + for (int i = 0; i < 8; i++)
> + AV_COPY128(block + i * 8, pixels + i * stride);
> +}
> +
> +static void get_pixels_unaligned_16_c(int16_t *restrict block,
> + const uint8_t *pixels, ptrdiff_t stride)
> {
> AV_COPY128U(block + 0 * 8, pixels + 0 * stride);
> AV_COPY128U(block + 1 * 8, pixels + 1 * stride);
> @@ -90,7 +97,7 @@ av_cold void ff_pixblockdsp_init(PixblockDSPContext *c, AVCodecContext *avctx)
> case 10:
> case 12:
> case 14:
> - c->get_pixels_unaligned =
> + c->get_pixels_unaligned = get_pixels_unaligned_16_c;
> c->get_pixels = get_pixels_16_c;
> break;
> default:
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] lavc/pixblockdsp: specialise aligned 16-bit get_pixels
2024-07-25 16:16 ` James Almer
@ 2024-07-25 16:50 ` Rémi Denis-Courmont
2024-07-25 18:25 ` James Almer
0 siblings, 1 reply; 5+ messages in thread
From: Rémi Denis-Courmont @ 2024-07-25 16:50 UTC (permalink / raw)
To: ffmpeg-devel
Le torstaina 25. heinäkuuta 2024, 19.16.21 EEST James Almer a écrit :
> On 7/25/2024 12:53 PM, Rémi Denis-Courmont wrote:
> > The current code assumes that we have unaligned rows, which hurts on
> > platforms with slower unaligned accesses. (Also, this lets the compiler
> > unroll manually, which it seems to do in practice.)
> > ---
> >
> > libavcodec/pixblockdsp.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/libavcodec/pixblockdsp.c b/libavcodec/pixblockdsp.c
> > index bbbeca1618..1fff244511 100644
> > --- a/libavcodec/pixblockdsp.c
> > +++ b/libavcodec/pixblockdsp.c
> > @@ -26,6 +26,13 @@
> >
> > static void get_pixels_16_c(int16_t *restrict block, const uint8_t
> > *pixels,
> >
> > ptrdiff_t stride)
>
> Is there a way to hint the compiler that block is 16 byte aligned? GCC
> 14 at least emits unaligned loads and stores for these.
We don't have uint128_t, so the best we could do is cast to uint64_t *. Though
GCC 13 emits 64-bit loads and stores on RV64 here with the given code. Is this
maybe a problem with the COPY128 macro definition on x86?
--
Rémi Denis-Courmont
http://www.remlab.net/
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] lavc/pixblockdsp: specialise aligned 16-bit get_pixels
2024-07-25 16:50 ` Rémi Denis-Courmont
@ 2024-07-25 18:25 ` James Almer
2024-07-25 20:28 ` Rémi Denis-Courmont
0 siblings, 1 reply; 5+ messages in thread
From: James Almer @ 2024-07-25 18:25 UTC (permalink / raw)
To: ffmpeg-devel
On 7/25/2024 1:50 PM, Rémi Denis-Courmont wrote:
> Le torstaina 25. heinäkuuta 2024, 19.16.21 EEST James Almer a écrit :
>> On 7/25/2024 12:53 PM, Rémi Denis-Courmont wrote:
>>> The current code assumes that we have unaligned rows, which hurts on
>>> platforms with slower unaligned accesses. (Also, this lets the compiler
>>> unroll manually, which it seems to do in practice.)
>>> ---
>>>
>>> libavcodec/pixblockdsp.c | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/libavcodec/pixblockdsp.c b/libavcodec/pixblockdsp.c
>>> index bbbeca1618..1fff244511 100644
>>> --- a/libavcodec/pixblockdsp.c
>>> +++ b/libavcodec/pixblockdsp.c
>>> @@ -26,6 +26,13 @@
>>>
>>> static void get_pixels_16_c(int16_t *restrict block, const uint8_t
>>> *pixels,
>>>
>>> ptrdiff_t stride)
>>
>> Is there a way to hint the compiler that block is 16 byte aligned? GCC
>> 14 at least emits unaligned loads and stores for these.
>
> We don't have uint128_t, so the best we could do is cast to uint64_t *. Though
> GCC 13 emits 64-bit loads and stores on RV64 here with the given code. Is this
> maybe a problem with the COPY128 macro definition on x86?
AV_COPY128 with GCC x86 uses aligned load intrinsics, but at least GCC
14 emits movdqu instructions here for some reason.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH] lavc/pixblockdsp: specialise aligned 16-bit get_pixels
2024-07-25 18:25 ` James Almer
@ 2024-07-25 20:28 ` Rémi Denis-Courmont
0 siblings, 0 replies; 5+ messages in thread
From: Rémi Denis-Courmont @ 2024-07-25 20:28 UTC (permalink / raw)
To: ffmpeg-devel
Le torstaina 25. heinäkuuta 2024, 21.25.11 EEST James Almer a écrit :
> On 7/25/2024 1:50 PM, Rémi Denis-Courmont wrote:
> > Le torstaina 25. heinäkuuta 2024, 19.16.21 EEST James Almer a écrit :
> >> On 7/25/2024 12:53 PM, Rémi Denis-Courmont wrote:
> >>> The current code assumes that we have unaligned rows, which hurts on
> >>> platforms with slower unaligned accesses. (Also, this lets the compiler
> >>> unroll manually, which it seems to do in practice.)
> >>> ---
> >>>
> >>> libavcodec/pixblockdsp.c | 9 ++++++++-
> >>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/libavcodec/pixblockdsp.c b/libavcodec/pixblockdsp.c
> >>> index bbbeca1618..1fff244511 100644
> >>> --- a/libavcodec/pixblockdsp.c
> >>> +++ b/libavcodec/pixblockdsp.c
> >>> @@ -26,6 +26,13 @@
> >>>
> >>> static void get_pixels_16_c(int16_t *restrict block, const uint8_t
> >>> *pixels,
> >>>
> >>> ptrdiff_t stride)
> >>
> >> Is there a way to hint the compiler that block is 16 byte aligned? GCC
> >> 14 at least emits unaligned loads and stores for these.
> >
> > We don't have uint128_t, so the best we could do is cast to uint64_t *.
> > Though GCC 13 emits 64-bit loads and stores on RV64 here with the given
> > code. Is this maybe a problem with the COPY128 macro definition on x86?
>
> AV_COPY128 with GCC x86 uses aligned load intrinsics, but at least GCC
> 14 emits movdqu instructions here for some reason.
Another approach would be to define a structure with size and alignment of 16
bytes, but I am none too sure that compilers will like it all that much. TBH,
I am merely aiming for 64-bit aligned loads and stores here, which is a big
improvement over 8-bit ones.
--
レミ・デニ-クールモン
http://www.remlab.net/
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-25 20:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-25 15:53 [FFmpeg-devel] [PATCH] lavc/pixblockdsp: specialise aligned 16-bit get_pixels Rémi Denis-Courmont
2024-07-25 16:16 ` James Almer
2024-07-25 16:50 ` Rémi Denis-Courmont
2024-07-25 18:25 ` James Almer
2024-07-25 20:28 ` Rémi Denis-Courmont
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git