* [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab
@ 2024-05-07 0:27 James Almer
2024-05-07 5:44 ` Martin Storsjö
2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests James Almer
0 siblings, 2 replies; 11+ messages in thread
From: James Almer @ 2024-05-07 0:27 UTC (permalink / raw)
To: ffmpeg-devel
It ignores and overwrites the previous values.
Fixes running the test under ubsan.
Signed-off-by: James Almer <jamrial@gmail.com>
---
tests/checkasm/blockdsp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tests/checkasm/blockdsp.c b/tests/checkasm/blockdsp.c
index 19d69b8687..ab87fc8fa4 100644
--- a/tests/checkasm/blockdsp.c
+++ b/tests/checkasm/blockdsp.c
@@ -71,7 +71,8 @@ static void check_fill(BlockDSPContext *h){
ptrdiff_t line_size, int h);
if (check_func(h->fill_block_tab[t], "blockdsp.%s", tests[t].name)) {
uint8_t value = rnd();
- randomize_buffers(tests[t].size);
+ memset(buf0, 0, sizeof(*buf0) * n * n);
+ memset(buf1, 0, sizeof(*buf0) * n * n);
call_ref(buf0, value, n, n);
call_new(buf1, value, n, n);
if (memcmp(buf0, buf1, sizeof(*buf0) * n * n))
--
2.45.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab
2024-05-07 0:27 [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab James Almer
@ 2024-05-07 5:44 ` Martin Storsjö
2024-05-07 10:49 ` Andreas Rheinhardt
2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests James Almer
1 sibling, 1 reply; 11+ messages in thread
From: Martin Storsjö @ 2024-05-07 5:44 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Mon, 6 May 2024, James Almer wrote:
> It ignores and overwrites the previous values.
> Fixes running the test under ubsan.
>
> Signed-off-by: James Almer <jamrial@gmail.com>
> ---
> tests/checkasm/blockdsp.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
The change is probably correct, but what issue is ubsan complaining about?
If this would just be a dead store of unused random values, that shouldn't
be an ubsan issue in general, right?
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab
2024-05-07 5:44 ` Martin Storsjö
@ 2024-05-07 10:49 ` Andreas Rheinhardt
2024-05-07 10:52 ` Martin Storsjö
2024-05-07 11:11 ` James Almer
0 siblings, 2 replies; 11+ messages in thread
From: Andreas Rheinhardt @ 2024-05-07 10:49 UTC (permalink / raw)
To: ffmpeg-devel
Martin Storsjö:
> On Mon, 6 May 2024, James Almer wrote:
>
>> It ignores and overwrites the previous values.
>> Fixes running the test under ubsan.
>>
>> Signed-off-by: James Almer <jamrial@gmail.com>
>> ---
>> tests/checkasm/blockdsp.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> The change is probably correct, but what issue is ubsan complaining
> about? If this would just be a dead store of unused random values, that
> shouldn't be an ubsan issue in general, right?
>
UBSan complains about unaligned stores in randomize_buffers; which is
obvious given that i is incremented by 1, not by 2. I sent a patch that
fixes this without removing randomization:
https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/326945.html
(Ideally, this test should be extended to cover cases of stride != width.)
- Andreas
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab
2024-05-07 10:49 ` Andreas Rheinhardt
@ 2024-05-07 10:52 ` Martin Storsjö
2024-05-07 11:11 ` James Almer
1 sibling, 0 replies; 11+ messages in thread
From: Martin Storsjö @ 2024-05-07 10:52 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Tue, 7 May 2024, Andreas Rheinhardt wrote:
> Martin Storsjö:
>> On Mon, 6 May 2024, James Almer wrote:
>>
>>> It ignores and overwrites the previous values.
>>> Fixes running the test under ubsan.
>>>
>>> Signed-off-by: James Almer <jamrial@gmail.com>
>>> ---
>>> tests/checkasm/blockdsp.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> The change is probably correct, but what issue is ubsan complaining
>> about? If this would just be a dead store of unused random values, that
>> shouldn't be an ubsan issue in general, right?
>>
>
> UBSan complains about unaligned stores in randomize_buffers; which is
> obvious given that i is incremented by 1, not by 2. I sent a patch that
> fixes this without removing randomization:
> https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/326945.html
Thanks, that explains it. Those two patches LGTM.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab
2024-05-07 10:49 ` Andreas Rheinhardt
2024-05-07 10:52 ` Martin Storsjö
@ 2024-05-07 11:11 ` James Almer
1 sibling, 0 replies; 11+ messages in thread
From: James Almer @ 2024-05-07 11:11 UTC (permalink / raw)
To: ffmpeg-devel
On 5/7/2024 7:49 AM, Andreas Rheinhardt wrote:
> Martin Storsjö:
>> On Mon, 6 May 2024, James Almer wrote:
>>
>>> It ignores and overwrites the previous values.
>>> Fixes running the test under ubsan.
>>>
>>> Signed-off-by: James Almer <jamrial@gmail.com>
>>> ---
>>> tests/checkasm/blockdsp.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> The change is probably correct, but what issue is ubsan complaining
>> about? If this would just be a dead store of unused random values, that
>> shouldn't be an ubsan issue in general, right?
>>
>
> UBSan complains about unaligned stores in randomize_buffers; which is
> obvious given that i is incremented by 1, not by 2. I sent a patch that
> fixes this without removing randomization:
There's no reason to keep the randomization for this test. That's why i
replaced it with a memset instead of fixing the unaligned writes.
> https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/326945.html
>
> (Ideally, this test should be extended to cover cases of stride != width.)
>
> - Andreas
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests
2024-05-07 0:27 [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab James Almer
2024-05-07 5:44 ` Martin Storsjö
@ 2024-05-07 15:02 ` James Almer
2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab James Almer
2024-05-07 15:14 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests Andreas Rheinhardt
1 sibling, 2 replies; 11+ messages in thread
From: James Almer @ 2024-05-07 15:02 UTC (permalink / raw)
To: ffmpeg-devel
The requirement is either 8 or 16 bytes alignment, not 32.
This should help finding bugs in asm implementations.
Signed-off-by: James Almer <jamrial@gmail.com>
---
tests/checkasm/blockdsp.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/tests/checkasm/blockdsp.c b/tests/checkasm/blockdsp.c
index ab87fc8fa4..f67a38d302 100644
--- a/tests/checkasm/blockdsp.c
+++ b/tests/checkasm/blockdsp.c
@@ -29,11 +29,6 @@
#include "libavutil/intreadwrite.h"
#include "libavutil/mem_internal.h"
-typedef struct {
- const char *name;
- int size;
-} test;
-
#define randomize_buffers(size) \
do { \
int i; \
@@ -58,18 +53,18 @@ do { \
} while (0)
static void check_fill(BlockDSPContext *h){
- const test tests[] = {
- {"fill_block_tab[0]", 16},
- {"fill_block_tab[1]", 8},
- };
- LOCAL_ALIGNED_32(uint8_t, buf0, [16 * 16]);
- LOCAL_ALIGNED_32(uint8_t, buf1, [16 * 16]);
+ LOCAL_ALIGNED_16(uint8_t, buf0_16, [16 * 16]);
+ LOCAL_ALIGNED_16(uint8_t, buf1_16, [16 * 16]);
+ LOCAL_ALIGNED_8(uint8_t, buf0_8, [8 * 8]);
+ LOCAL_ALIGNED_8(uint8_t, buf1_8, [8 * 8]);
- for (size_t t = 0; t < FF_ARRAY_ELEMS(tests); ++t) {
- int n = tests[t].size;
+ for (int t = 0; t < 2; ++t) {
+ uint8_t *buf0 = t ? buf0_8 : buf0_16;
+ uint8_t *buf1 = t ? buf1_8 : buf1_16;
+ int n = 16 - 8 * t;
declare_func(void, uint8_t *block, uint8_t value,
ptrdiff_t line_size, int h);
- if (check_func(h->fill_block_tab[t], "blockdsp.%s", tests[t].name)) {
+ if (check_func(h->fill_block_tab[t], "blockdsp.fill_block_tab[%d]", t)) {
uint8_t value = rnd();
memset(buf0, 0, sizeof(*buf0) * n * n);
memset(buf1, 0, sizeof(*buf0) * n * n);
--
2.45.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab
2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests James Almer
@ 2024-05-07 15:02 ` James Almer
2024-05-07 15:10 ` Andreas Rheinhardt
2024-05-07 15:14 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests Andreas Rheinhardt
1 sibling, 1 reply; 11+ messages in thread
From: James Almer @ 2024-05-07 15:02 UTC (permalink / raw)
To: ffmpeg-devel
Signed-off-by: James Almer <jamrial@gmail.com>
---
libavcodec/x86/blockdsp.asm | 33 +++++++++++++++++++++++++++++++++
libavcodec/x86/blockdsp_init.c | 13 +++++++++++++
2 files changed, 46 insertions(+)
diff --git a/libavcodec/x86/blockdsp.asm b/libavcodec/x86/blockdsp.asm
index e380308d4a..cccc9a801a 100644
--- a/libavcodec/x86/blockdsp.asm
+++ b/libavcodec/x86/blockdsp.asm
@@ -80,3 +80,36 @@ INIT_XMM sse
CLEAR_BLOCKS 1
INIT_YMM avx
CLEAR_BLOCKS 1
+
+;-----------------------------------------
+; void ff_fill_block_tab_%1(uint8_t *block, uint8_t value,
+; ptrdiff_t line_size, int h);
+;-----------------------------------------
+%macro FILL_BLOCK_TAB 2
+cglobal fill_block_tab_%1, 4, 5, 1, block, value, stride, h, stride3
+ lea stride3q, [strideq + strideq * 2]
+%if cpuflag(avx2)
+ movd m0, valued
+ vpbroadcastb m0, m0
+%else
+ SPLATB_REG m0, value, x
+%endif
+.loop:
+ mov%2 [blockq], m0
+ mov%2 [blockq + strideq], m0
+ mov%2 [blockq + strideq * 2], m0
+ mov%2 [blockq + stride3q], m0
+ lea blockq, [blockq + strideq * 4]
+ sub hd, 4
+ jg .loop
+ RET
+%endmacro
+
+INIT_XMM sse2
+FILL_BLOCK_TAB 8, q
+FILL_BLOCK_TAB 16, a
+%if HAVE_AVX2_EXTERNAL
+INIT_XMM avx2
+FILL_BLOCK_TAB 8, q
+FILL_BLOCK_TAB 16, a
+%endif
diff --git a/libavcodec/x86/blockdsp_init.c b/libavcodec/x86/blockdsp_init.c
index 996124114f..37f3bb6a84 100644
--- a/libavcodec/x86/blockdsp_init.c
+++ b/libavcodec/x86/blockdsp_init.c
@@ -29,6 +29,11 @@ void ff_clear_block_avx(int16_t *block);
void ff_clear_blocks_sse(int16_t *blocks);
void ff_clear_blocks_avx(int16_t *blocks);
+void ff_fill_block_tab_16_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
+void ff_fill_block_tab_8_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
+void ff_fill_block_tab_16_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
+void ff_fill_block_tab_8_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
+
av_cold void ff_blockdsp_init_x86(BlockDSPContext *c)
{
#if HAVE_X86ASM
@@ -38,9 +43,17 @@ av_cold void ff_blockdsp_init_x86(BlockDSPContext *c)
c->clear_block = ff_clear_block_sse;
c->clear_blocks = ff_clear_blocks_sse;
}
+ if (EXTERNAL_SSE2(cpu_flags)) {
+ c->fill_block_tab[0] = ff_fill_block_tab_16_sse2;
+ c->fill_block_tab[1] = ff_fill_block_tab_8_sse2;
+ }
if (EXTERNAL_AVX_FAST(cpu_flags)) {
c->clear_block = ff_clear_block_avx;
c->clear_blocks = ff_clear_blocks_avx;
}
+ if (EXTERNAL_AVX2(cpu_flags)) {
+ c->fill_block_tab[0] = ff_fill_block_tab_16_avx2;
+ c->fill_block_tab[1] = ff_fill_block_tab_8_avx2;
+ }
#endif /* HAVE_X86ASM */
}
--
2.45.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab
2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab James Almer
@ 2024-05-07 15:10 ` Andreas Rheinhardt
2024-05-07 15:15 ` James Almer
0 siblings, 1 reply; 11+ messages in thread
From: Andreas Rheinhardt @ 2024-05-07 15:10 UTC (permalink / raw)
To: ffmpeg-devel
James Almer:
> Signed-off-by: James Almer <jamrial@gmail.com>
> ---
> libavcodec/x86/blockdsp.asm | 33 +++++++++++++++++++++++++++++++++
> libavcodec/x86/blockdsp_init.c | 13 +++++++++++++
> 2 files changed, 46 insertions(+)
>
> diff --git a/libavcodec/x86/blockdsp.asm b/libavcodec/x86/blockdsp.asm
> index e380308d4a..cccc9a801a 100644
> --- a/libavcodec/x86/blockdsp.asm
> +++ b/libavcodec/x86/blockdsp.asm
> @@ -80,3 +80,36 @@ INIT_XMM sse
> CLEAR_BLOCKS 1
> INIT_YMM avx
> CLEAR_BLOCKS 1
> +
> +;-----------------------------------------
> +; void ff_fill_block_tab_%1(uint8_t *block, uint8_t value,
> +; ptrdiff_t line_size, int h);
> +;-----------------------------------------
> +%macro FILL_BLOCK_TAB 2
> +cglobal fill_block_tab_%1, 4, 5, 1, block, value, stride, h, stride3
> + lea stride3q, [strideq + strideq * 2]
> +%if cpuflag(avx2)
> + movd m0, valued
> + vpbroadcastb m0, m0
> +%else
> + SPLATB_REG m0, value, x
> +%endif
> +.loop:
> + mov%2 [blockq], m0
> + mov%2 [blockq + strideq], m0
> + mov%2 [blockq + strideq * 2], m0
> + mov%2 [blockq + stride3q], m0
> + lea blockq, [blockq + strideq * 4]
> + sub hd, 4
> + jg .loop
> + RET
> +%endmacro
> +
> +INIT_XMM sse2
> +FILL_BLOCK_TAB 8, q
> +FILL_BLOCK_TAB 16, a
> +%if HAVE_AVX2_EXTERNAL
> +INIT_XMM avx2
> +FILL_BLOCK_TAB 8, q
> +FILL_BLOCK_TAB 16, a
> +%endif
> diff --git a/libavcodec/x86/blockdsp_init.c b/libavcodec/x86/blockdsp_init.c
> index 996124114f..37f3bb6a84 100644
> --- a/libavcodec/x86/blockdsp_init.c
> +++ b/libavcodec/x86/blockdsp_init.c
> @@ -29,6 +29,11 @@ void ff_clear_block_avx(int16_t *block);
> void ff_clear_blocks_sse(int16_t *blocks);
> void ff_clear_blocks_avx(int16_t *blocks);
>
> +void ff_fill_block_tab_16_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
> +void ff_fill_block_tab_8_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
> +void ff_fill_block_tab_16_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
> +void ff_fill_block_tab_8_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
> +
> av_cold void ff_blockdsp_init_x86(BlockDSPContext *c)
> {
> #if HAVE_X86ASM
> @@ -38,9 +43,17 @@ av_cold void ff_blockdsp_init_x86(BlockDSPContext *c)
> c->clear_block = ff_clear_block_sse;
> c->clear_blocks = ff_clear_blocks_sse;
> }
> + if (EXTERNAL_SSE2(cpu_flags)) {
> + c->fill_block_tab[0] = ff_fill_block_tab_16_sse2;
> + c->fill_block_tab[1] = ff_fill_block_tab_8_sse2;
> + }
> if (EXTERNAL_AVX_FAST(cpu_flags)) {
> c->clear_block = ff_clear_block_avx;
> c->clear_blocks = ff_clear_blocks_avx;
> }
> + if (EXTERNAL_AVX2(cpu_flags)) {
> + c->fill_block_tab[0] = ff_fill_block_tab_16_avx2;
> + c->fill_block_tab[1] = ff_fill_block_tab_8_avx2;
> + }
> #endif /* HAVE_X86ASM */
> }
Benchmarks?
- Andreas
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests
2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests James Almer
2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab James Almer
@ 2024-05-07 15:14 ` Andreas Rheinhardt
2024-05-07 15:26 ` James Almer
1 sibling, 1 reply; 11+ messages in thread
From: Andreas Rheinhardt @ 2024-05-07 15:14 UTC (permalink / raw)
To: ffmpeg-devel
James Almer:
> The requirement is either 8 or 16 bytes alignment, not 32.
> This should help finding bugs in asm implementations.
>
> Signed-off-by: James Almer <jamrial@gmail.com>
> ---
> tests/checkasm/blockdsp.c | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/tests/checkasm/blockdsp.c b/tests/checkasm/blockdsp.c
> index ab87fc8fa4..f67a38d302 100644
> --- a/tests/checkasm/blockdsp.c
> +++ b/tests/checkasm/blockdsp.c
> @@ -29,11 +29,6 @@
> #include "libavutil/intreadwrite.h"
> #include "libavutil/mem_internal.h"
>
> -typedef struct {
> - const char *name;
> - int size;
> -} test;
> -
> #define randomize_buffers(size) \
> do { \
> int i; \
> @@ -58,18 +53,18 @@ do { \
> } while (0)
>
> static void check_fill(BlockDSPContext *h){
> - const test tests[] = {
> - {"fill_block_tab[0]", 16},
> - {"fill_block_tab[1]", 8},
> - };
> - LOCAL_ALIGNED_32(uint8_t, buf0, [16 * 16]);
> - LOCAL_ALIGNED_32(uint8_t, buf1, [16 * 16]);
> + LOCAL_ALIGNED_16(uint8_t, buf0_16, [16 * 16]);
> + LOCAL_ALIGNED_16(uint8_t, buf1_16, [16 * 16]);
> + LOCAL_ALIGNED_8(uint8_t, buf0_8, [8 * 8]);
> + LOCAL_ALIGNED_8(uint8_t, buf1_8, [8 * 8]);
>
> - for (size_t t = 0; t < FF_ARRAY_ELEMS(tests); ++t) {
> - int n = tests[t].size;
> + for (int t = 0; t < 2; ++t) {
> + uint8_t *buf0 = t ? buf0_8 : buf0_16;
> + uint8_t *buf1 = t ? buf1_8 : buf1_16;
> + int n = 16 - 8 * t;
> declare_func(void, uint8_t *block, uint8_t value,
> ptrdiff_t line_size, int h);
> - if (check_func(h->fill_block_tab[t], "blockdsp.%s", tests[t].name)) {
> + if (check_func(h->fill_block_tab[t], "blockdsp.fill_block_tab[%d]", t)) {
> uint8_t value = rnd();
> memset(buf0, 0, sizeof(*buf0) * n * n);
> memset(buf1, 0, sizeof(*buf0) * n * n);
1. I wouldn't be surprised if the *_8 buffers were still 16 byte
aligned. You should probably force 8 byte alignment by using a 16
byte-aligned buffer with an offset of eight.
2. Can you also extend this test to actually test the case of stride !=
width? (And negative strides.)
- Andreas
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab
2024-05-07 15:10 ` Andreas Rheinhardt
@ 2024-05-07 15:15 ` James Almer
0 siblings, 0 replies; 11+ messages in thread
From: James Almer @ 2024-05-07 15:15 UTC (permalink / raw)
To: ffmpeg-devel
On 5/7/2024 12:10 PM, Andreas Rheinhardt wrote:
> James Almer:
>> Signed-off-by: James Almer <jamrial@gmail.com>
>> ---
>> libavcodec/x86/blockdsp.asm | 33 +++++++++++++++++++++++++++++++++
>> libavcodec/x86/blockdsp_init.c | 13 +++++++++++++
>> 2 files changed, 46 insertions(+)
>>
>> diff --git a/libavcodec/x86/blockdsp.asm b/libavcodec/x86/blockdsp.asm
>> index e380308d4a..cccc9a801a 100644
>> --- a/libavcodec/x86/blockdsp.asm
>> +++ b/libavcodec/x86/blockdsp.asm
>> @@ -80,3 +80,36 @@ INIT_XMM sse
>> CLEAR_BLOCKS 1
>> INIT_YMM avx
>> CLEAR_BLOCKS 1
>> +
>> +;-----------------------------------------
>> +; void ff_fill_block_tab_%1(uint8_t *block, uint8_t value,
>> +; ptrdiff_t line_size, int h);
>> +;-----------------------------------------
>> +%macro FILL_BLOCK_TAB 2
>> +cglobal fill_block_tab_%1, 4, 5, 1, block, value, stride, h, stride3
>> + lea stride3q, [strideq + strideq * 2]
>> +%if cpuflag(avx2)
>> + movd m0, valued
>> + vpbroadcastb m0, m0
>> +%else
>> + SPLATB_REG m0, value, x
>> +%endif
>> +.loop:
>> + mov%2 [blockq], m0
>> + mov%2 [blockq + strideq], m0
>> + mov%2 [blockq + strideq * 2], m0
>> + mov%2 [blockq + stride3q], m0
>> + lea blockq, [blockq + strideq * 4]
>> + sub hd, 4
>> + jg .loop
>> + RET
>> +%endmacro
>> +
>> +INIT_XMM sse2
>> +FILL_BLOCK_TAB 8, q
>> +FILL_BLOCK_TAB 16, a
>> +%if HAVE_AVX2_EXTERNAL
>> +INIT_XMM avx2
>> +FILL_BLOCK_TAB 8, q
>> +FILL_BLOCK_TAB 16, a
>> +%endif
>> diff --git a/libavcodec/x86/blockdsp_init.c b/libavcodec/x86/blockdsp_init.c
>> index 996124114f..37f3bb6a84 100644
>> --- a/libavcodec/x86/blockdsp_init.c
>> +++ b/libavcodec/x86/blockdsp_init.c
>> @@ -29,6 +29,11 @@ void ff_clear_block_avx(int16_t *block);
>> void ff_clear_blocks_sse(int16_t *blocks);
>> void ff_clear_blocks_avx(int16_t *blocks);
>>
>> +void ff_fill_block_tab_16_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
>> +void ff_fill_block_tab_8_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
>> +void ff_fill_block_tab_16_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
>> +void ff_fill_block_tab_8_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h);
>> +
>> av_cold void ff_blockdsp_init_x86(BlockDSPContext *c)
>> {
>> #if HAVE_X86ASM
>> @@ -38,9 +43,17 @@ av_cold void ff_blockdsp_init_x86(BlockDSPContext *c)
>> c->clear_block = ff_clear_block_sse;
>> c->clear_blocks = ff_clear_blocks_sse;
>> }
>> + if (EXTERNAL_SSE2(cpu_flags)) {
>> + c->fill_block_tab[0] = ff_fill_block_tab_16_sse2;
>> + c->fill_block_tab[1] = ff_fill_block_tab_8_sse2;
>> + }
>> if (EXTERNAL_AVX_FAST(cpu_flags)) {
>> c->clear_block = ff_clear_block_avx;
>> c->clear_blocks = ff_clear_blocks_avx;
>> }
>> + if (EXTERNAL_AVX2(cpu_flags)) {
>> + c->fill_block_tab[0] = ff_fill_block_tab_16_avx2;
>> + c->fill_block_tab[1] = ff_fill_block_tab_8_avx2;
>> + }
>> #endif /* HAVE_X86ASM */
>> }
>
> Benchmarks?
blockdsp.fill_block_tab[0]_c: 13.4
blockdsp.fill_block_tab[0]_sse2: 8.4
blockdsp.fill_block_tab[0]_avx2: 7.4
blockdsp.fill_block_tab[1]_c: 7.9
blockdsp.fill_block_tab[1]_sse2: 4.5
blockdsp.fill_block_tab[1]_avx2: 4.0
On an Alder Lake using --cpu-alderlake. If i let gcc compile without cpu
specific optimizations, the C version is even slower.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests
2024-05-07 15:14 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests Andreas Rheinhardt
@ 2024-05-07 15:26 ` James Almer
0 siblings, 0 replies; 11+ messages in thread
From: James Almer @ 2024-05-07 15:26 UTC (permalink / raw)
To: ffmpeg-devel
On 5/7/2024 12:14 PM, Andreas Rheinhardt wrote:
> James Almer:
>> The requirement is either 8 or 16 bytes alignment, not 32.
>> This should help finding bugs in asm implementations.
>>
>> Signed-off-by: James Almer <jamrial@gmail.com>
>> ---
>> tests/checkasm/blockdsp.c | 23 +++++++++--------------
>> 1 file changed, 9 insertions(+), 14 deletions(-)
>>
>> diff --git a/tests/checkasm/blockdsp.c b/tests/checkasm/blockdsp.c
>> index ab87fc8fa4..f67a38d302 100644
>> --- a/tests/checkasm/blockdsp.c
>> +++ b/tests/checkasm/blockdsp.c
>> @@ -29,11 +29,6 @@
>> #include "libavutil/intreadwrite.h"
>> #include "libavutil/mem_internal.h"
>>
>> -typedef struct {
>> - const char *name;
>> - int size;
>> -} test;
>> -
>> #define randomize_buffers(size) \
>> do { \
>> int i; \
>> @@ -58,18 +53,18 @@ do { \
>> } while (0)
>>
>> static void check_fill(BlockDSPContext *h){
>> - const test tests[] = {
>> - {"fill_block_tab[0]", 16},
>> - {"fill_block_tab[1]", 8},
>> - };
>> - LOCAL_ALIGNED_32(uint8_t, buf0, [16 * 16]);
>> - LOCAL_ALIGNED_32(uint8_t, buf1, [16 * 16]);
>> + LOCAL_ALIGNED_16(uint8_t, buf0_16, [16 * 16]);
>> + LOCAL_ALIGNED_16(uint8_t, buf1_16, [16 * 16]);
>> + LOCAL_ALIGNED_8(uint8_t, buf0_8, [8 * 8]);
>> + LOCAL_ALIGNED_8(uint8_t, buf1_8, [8 * 8]);
>>
>> - for (size_t t = 0; t < FF_ARRAY_ELEMS(tests); ++t) {
>> - int n = tests[t].size;
>> + for (int t = 0; t < 2; ++t) {
>> + uint8_t *buf0 = t ? buf0_8 : buf0_16;
>> + uint8_t *buf1 = t ? buf1_8 : buf1_16;
>> + int n = 16 - 8 * t;
>> declare_func(void, uint8_t *block, uint8_t value,
>> ptrdiff_t line_size, int h);
>> - if (check_func(h->fill_block_tab[t], "blockdsp.%s", tests[t].name)) {
>> + if (check_func(h->fill_block_tab[t], "blockdsp.fill_block_tab[%d]", t)) {
>> uint8_t value = rnd();
>> memset(buf0, 0, sizeof(*buf0) * n * n);
>> memset(buf1, 0, sizeof(*buf0) * n * n);
>
> 1. I wouldn't be surprised if the *_8 buffers were still 16 byte
> aligned. You should probably force 8 byte alignment by using a 16
> byte-aligned buffer with an offset of eight.
Amended the following locally:
> diff --git a/tests/checkasm/blockdsp.c b/tests/checkasm/blockdsp.c
> index 8c1f8281d2..5f4d46b8fa 100644
> --- a/tests/checkasm/blockdsp.c
> +++ b/tests/checkasm/blockdsp.c
> @@ -55,12 +55,10 @@ do { \
> static void check_fill(BlockDSPContext *h){
> LOCAL_ALIGNED_16(uint8_t, buf0_16, [16 * 16]);
> LOCAL_ALIGNED_16(uint8_t, buf1_16, [16 * 16]);
> - LOCAL_ALIGNED_8(uint8_t, buf0_8, [8 * 8]);
> - LOCAL_ALIGNED_8(uint8_t, buf1_8, [8 * 8]);
>
> for (int t = 0; t < 2; ++t) {
> - uint8_t *buf0 = t ? buf0_8 : buf0_16;
> - uint8_t *buf1 = t ? buf1_8 : buf1_16;
> + uint8_t *buf0 = buf0_16 + t * /* force 8 byte alignment */ 8;
> + uint8_t *buf1 = buf1_16 + t * /* force 8 byte alignment */ 8;
> int n = 16 - 8 * t;
> declare_func(void, uint8_t *block, uint8_t value,
> ptrdiff_t line_size, int h);
> 2. Can you also extend this test to actually test the case of stride !=
> width? (And negative strides.)
Maybe later.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-05-07 15:27 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-07 0:27 [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab James Almer
2024-05-07 5:44 ` Martin Storsjö
2024-05-07 10:49 ` Andreas Rheinhardt
2024-05-07 10:52 ` Martin Storsjö
2024-05-07 11:11 ` James Almer
2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests James Almer
2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab James Almer
2024-05-07 15:10 ` Andreas Rheinhardt
2024-05-07 15:15 ` James Almer
2024-05-07 15:14 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests Andreas Rheinhardt
2024-05-07 15:26 ` James Almer
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git