* [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab @ 2024-05-07 0:27 James Almer 2024-05-07 5:44 ` Martin Storsjö 2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests James Almer 0 siblings, 2 replies; 11+ messages in thread From: James Almer @ 2024-05-07 0:27 UTC (permalink / raw) To: ffmpeg-devel It ignores and overwrites the previous values. Fixes running the test under ubsan. Signed-off-by: James Almer <jamrial@gmail.com> --- tests/checkasm/blockdsp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/checkasm/blockdsp.c b/tests/checkasm/blockdsp.c index 19d69b8687..ab87fc8fa4 100644 --- a/tests/checkasm/blockdsp.c +++ b/tests/checkasm/blockdsp.c @@ -71,7 +71,8 @@ static void check_fill(BlockDSPContext *h){ ptrdiff_t line_size, int h); if (check_func(h->fill_block_tab[t], "blockdsp.%s", tests[t].name)) { uint8_t value = rnd(); - randomize_buffers(tests[t].size); + memset(buf0, 0, sizeof(*buf0) * n * n); + memset(buf1, 0, sizeof(*buf0) * n * n); call_ref(buf0, value, n, n); call_new(buf1, value, n, n); if (memcmp(buf0, buf1, sizeof(*buf0) * n * n)) -- 2.45.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab 2024-05-07 0:27 [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab James Almer @ 2024-05-07 5:44 ` Martin Storsjö 2024-05-07 10:49 ` Andreas Rheinhardt 2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests James Almer 1 sibling, 1 reply; 11+ messages in thread From: Martin Storsjö @ 2024-05-07 5:44 UTC (permalink / raw) To: FFmpeg development discussions and patches On Mon, 6 May 2024, James Almer wrote: > It ignores and overwrites the previous values. > Fixes running the test under ubsan. > > Signed-off-by: James Almer <jamrial@gmail.com> > --- > tests/checkasm/blockdsp.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) The change is probably correct, but what issue is ubsan complaining about? If this would just be a dead store of unused random values, that shouldn't be an ubsan issue in general, right? // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab 2024-05-07 5:44 ` Martin Storsjö @ 2024-05-07 10:49 ` Andreas Rheinhardt 2024-05-07 10:52 ` Martin Storsjö 2024-05-07 11:11 ` James Almer 0 siblings, 2 replies; 11+ messages in thread From: Andreas Rheinhardt @ 2024-05-07 10:49 UTC (permalink / raw) To: ffmpeg-devel Martin Storsjö: > On Mon, 6 May 2024, James Almer wrote: > >> It ignores and overwrites the previous values. >> Fixes running the test under ubsan. >> >> Signed-off-by: James Almer <jamrial@gmail.com> >> --- >> tests/checkasm/blockdsp.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) > > The change is probably correct, but what issue is ubsan complaining > about? If this would just be a dead store of unused random values, that > shouldn't be an ubsan issue in general, right? > UBSan complains about unaligned stores in randomize_buffers; which is obvious given that i is incremented by 1, not by 2. I sent a patch that fixes this without removing randomization: https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/326945.html (Ideally, this test should be extended to cover cases of stride != width.) - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab 2024-05-07 10:49 ` Andreas Rheinhardt @ 2024-05-07 10:52 ` Martin Storsjö 2024-05-07 11:11 ` James Almer 1 sibling, 0 replies; 11+ messages in thread From: Martin Storsjö @ 2024-05-07 10:52 UTC (permalink / raw) To: FFmpeg development discussions and patches On Tue, 7 May 2024, Andreas Rheinhardt wrote: > Martin Storsjö: >> On Mon, 6 May 2024, James Almer wrote: >> >>> It ignores and overwrites the previous values. >>> Fixes running the test under ubsan. >>> >>> Signed-off-by: James Almer <jamrial@gmail.com> >>> --- >>> tests/checkasm/blockdsp.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> The change is probably correct, but what issue is ubsan complaining >> about? If this would just be a dead store of unused random values, that >> shouldn't be an ubsan issue in general, right? >> > > UBSan complains about unaligned stores in randomize_buffers; which is > obvious given that i is incremented by 1, not by 2. I sent a patch that > fixes this without removing randomization: > https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/326945.html Thanks, that explains it. Those two patches LGTM. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab 2024-05-07 10:49 ` Andreas Rheinhardt 2024-05-07 10:52 ` Martin Storsjö @ 2024-05-07 11:11 ` James Almer 1 sibling, 0 replies; 11+ messages in thread From: James Almer @ 2024-05-07 11:11 UTC (permalink / raw) To: ffmpeg-devel On 5/7/2024 7:49 AM, Andreas Rheinhardt wrote: > Martin Storsjö: >> On Mon, 6 May 2024, James Almer wrote: >> >>> It ignores and overwrites the previous values. >>> Fixes running the test under ubsan. >>> >>> Signed-off-by: James Almer <jamrial@gmail.com> >>> --- >>> tests/checkasm/blockdsp.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> The change is probably correct, but what issue is ubsan complaining >> about? If this would just be a dead store of unused random values, that >> shouldn't be an ubsan issue in general, right? >> > > UBSan complains about unaligned stores in randomize_buffers; which is > obvious given that i is incremented by 1, not by 2. I sent a patch that > fixes this without removing randomization: There's no reason to keep the randomization for this test. That's why i replaced it with a memset instead of fixing the unaligned writes. > https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/326945.html > > (Ideally, this test should be extended to cover cases of stride != width.) > > - Andreas > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests 2024-05-07 0:27 [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab James Almer 2024-05-07 5:44 ` Martin Storsjö @ 2024-05-07 15:02 ` James Almer 2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab James Almer 2024-05-07 15:14 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests Andreas Rheinhardt 1 sibling, 2 replies; 11+ messages in thread From: James Almer @ 2024-05-07 15:02 UTC (permalink / raw) To: ffmpeg-devel The requirement is either 8 or 16 bytes alignment, not 32. This should help finding bugs in asm implementations. Signed-off-by: James Almer <jamrial@gmail.com> --- tests/checkasm/blockdsp.c | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/tests/checkasm/blockdsp.c b/tests/checkasm/blockdsp.c index ab87fc8fa4..f67a38d302 100644 --- a/tests/checkasm/blockdsp.c +++ b/tests/checkasm/blockdsp.c @@ -29,11 +29,6 @@ #include "libavutil/intreadwrite.h" #include "libavutil/mem_internal.h" -typedef struct { - const char *name; - int size; -} test; - #define randomize_buffers(size) \ do { \ int i; \ @@ -58,18 +53,18 @@ do { \ } while (0) static void check_fill(BlockDSPContext *h){ - const test tests[] = { - {"fill_block_tab[0]", 16}, - {"fill_block_tab[1]", 8}, - }; - LOCAL_ALIGNED_32(uint8_t, buf0, [16 * 16]); - LOCAL_ALIGNED_32(uint8_t, buf1, [16 * 16]); + LOCAL_ALIGNED_16(uint8_t, buf0_16, [16 * 16]); + LOCAL_ALIGNED_16(uint8_t, buf1_16, [16 * 16]); + LOCAL_ALIGNED_8(uint8_t, buf0_8, [8 * 8]); + LOCAL_ALIGNED_8(uint8_t, buf1_8, [8 * 8]); - for (size_t t = 0; t < FF_ARRAY_ELEMS(tests); ++t) { - int n = tests[t].size; + for (int t = 0; t < 2; ++t) { + uint8_t *buf0 = t ? buf0_8 : buf0_16; + uint8_t *buf1 = t ? buf1_8 : buf1_16; + int n = 16 - 8 * t; declare_func(void, uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); - if (check_func(h->fill_block_tab[t], "blockdsp.%s", tests[t].name)) { + if (check_func(h->fill_block_tab[t], "blockdsp.fill_block_tab[%d]", t)) { uint8_t value = rnd(); memset(buf0, 0, sizeof(*buf0) * n * n); memset(buf1, 0, sizeof(*buf0) * n * n); -- 2.45.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab 2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests James Almer @ 2024-05-07 15:02 ` James Almer 2024-05-07 15:10 ` Andreas Rheinhardt 2024-05-07 15:14 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests Andreas Rheinhardt 1 sibling, 1 reply; 11+ messages in thread From: James Almer @ 2024-05-07 15:02 UTC (permalink / raw) To: ffmpeg-devel Signed-off-by: James Almer <jamrial@gmail.com> --- libavcodec/x86/blockdsp.asm | 33 +++++++++++++++++++++++++++++++++ libavcodec/x86/blockdsp_init.c | 13 +++++++++++++ 2 files changed, 46 insertions(+) diff --git a/libavcodec/x86/blockdsp.asm b/libavcodec/x86/blockdsp.asm index e380308d4a..cccc9a801a 100644 --- a/libavcodec/x86/blockdsp.asm +++ b/libavcodec/x86/blockdsp.asm @@ -80,3 +80,36 @@ INIT_XMM sse CLEAR_BLOCKS 1 INIT_YMM avx CLEAR_BLOCKS 1 + +;----------------------------------------- +; void ff_fill_block_tab_%1(uint8_t *block, uint8_t value, +; ptrdiff_t line_size, int h); +;----------------------------------------- +%macro FILL_BLOCK_TAB 2 +cglobal fill_block_tab_%1, 4, 5, 1, block, value, stride, h, stride3 + lea stride3q, [strideq + strideq * 2] +%if cpuflag(avx2) + movd m0, valued + vpbroadcastb m0, m0 +%else + SPLATB_REG m0, value, x +%endif +.loop: + mov%2 [blockq], m0 + mov%2 [blockq + strideq], m0 + mov%2 [blockq + strideq * 2], m0 + mov%2 [blockq + stride3q], m0 + lea blockq, [blockq + strideq * 4] + sub hd, 4 + jg .loop + RET +%endmacro + +INIT_XMM sse2 +FILL_BLOCK_TAB 8, q +FILL_BLOCK_TAB 16, a +%if HAVE_AVX2_EXTERNAL +INIT_XMM avx2 +FILL_BLOCK_TAB 8, q +FILL_BLOCK_TAB 16, a +%endif diff --git a/libavcodec/x86/blockdsp_init.c b/libavcodec/x86/blockdsp_init.c index 996124114f..37f3bb6a84 100644 --- a/libavcodec/x86/blockdsp_init.c +++ b/libavcodec/x86/blockdsp_init.c @@ -29,6 +29,11 @@ void ff_clear_block_avx(int16_t *block); void ff_clear_blocks_sse(int16_t *blocks); void ff_clear_blocks_avx(int16_t *blocks); +void ff_fill_block_tab_16_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); +void ff_fill_block_tab_8_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); +void ff_fill_block_tab_16_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); +void ff_fill_block_tab_8_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); + av_cold void ff_blockdsp_init_x86(BlockDSPContext *c) { #if HAVE_X86ASM @@ -38,9 +43,17 @@ av_cold void ff_blockdsp_init_x86(BlockDSPContext *c) c->clear_block = ff_clear_block_sse; c->clear_blocks = ff_clear_blocks_sse; } + if (EXTERNAL_SSE2(cpu_flags)) { + c->fill_block_tab[0] = ff_fill_block_tab_16_sse2; + c->fill_block_tab[1] = ff_fill_block_tab_8_sse2; + } if (EXTERNAL_AVX_FAST(cpu_flags)) { c->clear_block = ff_clear_block_avx; c->clear_blocks = ff_clear_blocks_avx; } + if (EXTERNAL_AVX2(cpu_flags)) { + c->fill_block_tab[0] = ff_fill_block_tab_16_avx2; + c->fill_block_tab[1] = ff_fill_block_tab_8_avx2; + } #endif /* HAVE_X86ASM */ } -- 2.45.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab 2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab James Almer @ 2024-05-07 15:10 ` Andreas Rheinhardt 2024-05-07 15:15 ` James Almer 0 siblings, 1 reply; 11+ messages in thread From: Andreas Rheinhardt @ 2024-05-07 15:10 UTC (permalink / raw) To: ffmpeg-devel James Almer: > Signed-off-by: James Almer <jamrial@gmail.com> > --- > libavcodec/x86/blockdsp.asm | 33 +++++++++++++++++++++++++++++++++ > libavcodec/x86/blockdsp_init.c | 13 +++++++++++++ > 2 files changed, 46 insertions(+) > > diff --git a/libavcodec/x86/blockdsp.asm b/libavcodec/x86/blockdsp.asm > index e380308d4a..cccc9a801a 100644 > --- a/libavcodec/x86/blockdsp.asm > +++ b/libavcodec/x86/blockdsp.asm > @@ -80,3 +80,36 @@ INIT_XMM sse > CLEAR_BLOCKS 1 > INIT_YMM avx > CLEAR_BLOCKS 1 > + > +;----------------------------------------- > +; void ff_fill_block_tab_%1(uint8_t *block, uint8_t value, > +; ptrdiff_t line_size, int h); > +;----------------------------------------- > +%macro FILL_BLOCK_TAB 2 > +cglobal fill_block_tab_%1, 4, 5, 1, block, value, stride, h, stride3 > + lea stride3q, [strideq + strideq * 2] > +%if cpuflag(avx2) > + movd m0, valued > + vpbroadcastb m0, m0 > +%else > + SPLATB_REG m0, value, x > +%endif > +.loop: > + mov%2 [blockq], m0 > + mov%2 [blockq + strideq], m0 > + mov%2 [blockq + strideq * 2], m0 > + mov%2 [blockq + stride3q], m0 > + lea blockq, [blockq + strideq * 4] > + sub hd, 4 > + jg .loop > + RET > +%endmacro > + > +INIT_XMM sse2 > +FILL_BLOCK_TAB 8, q > +FILL_BLOCK_TAB 16, a > +%if HAVE_AVX2_EXTERNAL > +INIT_XMM avx2 > +FILL_BLOCK_TAB 8, q > +FILL_BLOCK_TAB 16, a > +%endif > diff --git a/libavcodec/x86/blockdsp_init.c b/libavcodec/x86/blockdsp_init.c > index 996124114f..37f3bb6a84 100644 > --- a/libavcodec/x86/blockdsp_init.c > +++ b/libavcodec/x86/blockdsp_init.c > @@ -29,6 +29,11 @@ void ff_clear_block_avx(int16_t *block); > void ff_clear_blocks_sse(int16_t *blocks); > void ff_clear_blocks_avx(int16_t *blocks); > > +void ff_fill_block_tab_16_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); > +void ff_fill_block_tab_8_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); > +void ff_fill_block_tab_16_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); > +void ff_fill_block_tab_8_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); > + > av_cold void ff_blockdsp_init_x86(BlockDSPContext *c) > { > #if HAVE_X86ASM > @@ -38,9 +43,17 @@ av_cold void ff_blockdsp_init_x86(BlockDSPContext *c) > c->clear_block = ff_clear_block_sse; > c->clear_blocks = ff_clear_blocks_sse; > } > + if (EXTERNAL_SSE2(cpu_flags)) { > + c->fill_block_tab[0] = ff_fill_block_tab_16_sse2; > + c->fill_block_tab[1] = ff_fill_block_tab_8_sse2; > + } > if (EXTERNAL_AVX_FAST(cpu_flags)) { > c->clear_block = ff_clear_block_avx; > c->clear_blocks = ff_clear_blocks_avx; > } > + if (EXTERNAL_AVX2(cpu_flags)) { > + c->fill_block_tab[0] = ff_fill_block_tab_16_avx2; > + c->fill_block_tab[1] = ff_fill_block_tab_8_avx2; > + } > #endif /* HAVE_X86ASM */ > } Benchmarks? - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab 2024-05-07 15:10 ` Andreas Rheinhardt @ 2024-05-07 15:15 ` James Almer 0 siblings, 0 replies; 11+ messages in thread From: James Almer @ 2024-05-07 15:15 UTC (permalink / raw) To: ffmpeg-devel On 5/7/2024 12:10 PM, Andreas Rheinhardt wrote: > James Almer: >> Signed-off-by: James Almer <jamrial@gmail.com> >> --- >> libavcodec/x86/blockdsp.asm | 33 +++++++++++++++++++++++++++++++++ >> libavcodec/x86/blockdsp_init.c | 13 +++++++++++++ >> 2 files changed, 46 insertions(+) >> >> diff --git a/libavcodec/x86/blockdsp.asm b/libavcodec/x86/blockdsp.asm >> index e380308d4a..cccc9a801a 100644 >> --- a/libavcodec/x86/blockdsp.asm >> +++ b/libavcodec/x86/blockdsp.asm >> @@ -80,3 +80,36 @@ INIT_XMM sse >> CLEAR_BLOCKS 1 >> INIT_YMM avx >> CLEAR_BLOCKS 1 >> + >> +;----------------------------------------- >> +; void ff_fill_block_tab_%1(uint8_t *block, uint8_t value, >> +; ptrdiff_t line_size, int h); >> +;----------------------------------------- >> +%macro FILL_BLOCK_TAB 2 >> +cglobal fill_block_tab_%1, 4, 5, 1, block, value, stride, h, stride3 >> + lea stride3q, [strideq + strideq * 2] >> +%if cpuflag(avx2) >> + movd m0, valued >> + vpbroadcastb m0, m0 >> +%else >> + SPLATB_REG m0, value, x >> +%endif >> +.loop: >> + mov%2 [blockq], m0 >> + mov%2 [blockq + strideq], m0 >> + mov%2 [blockq + strideq * 2], m0 >> + mov%2 [blockq + stride3q], m0 >> + lea blockq, [blockq + strideq * 4] >> + sub hd, 4 >> + jg .loop >> + RET >> +%endmacro >> + >> +INIT_XMM sse2 >> +FILL_BLOCK_TAB 8, q >> +FILL_BLOCK_TAB 16, a >> +%if HAVE_AVX2_EXTERNAL >> +INIT_XMM avx2 >> +FILL_BLOCK_TAB 8, q >> +FILL_BLOCK_TAB 16, a >> +%endif >> diff --git a/libavcodec/x86/blockdsp_init.c b/libavcodec/x86/blockdsp_init.c >> index 996124114f..37f3bb6a84 100644 >> --- a/libavcodec/x86/blockdsp_init.c >> +++ b/libavcodec/x86/blockdsp_init.c >> @@ -29,6 +29,11 @@ void ff_clear_block_avx(int16_t *block); >> void ff_clear_blocks_sse(int16_t *blocks); >> void ff_clear_blocks_avx(int16_t *blocks); >> >> +void ff_fill_block_tab_16_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); >> +void ff_fill_block_tab_8_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); >> +void ff_fill_block_tab_16_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); >> +void ff_fill_block_tab_8_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); >> + >> av_cold void ff_blockdsp_init_x86(BlockDSPContext *c) >> { >> #if HAVE_X86ASM >> @@ -38,9 +43,17 @@ av_cold void ff_blockdsp_init_x86(BlockDSPContext *c) >> c->clear_block = ff_clear_block_sse; >> c->clear_blocks = ff_clear_blocks_sse; >> } >> + if (EXTERNAL_SSE2(cpu_flags)) { >> + c->fill_block_tab[0] = ff_fill_block_tab_16_sse2; >> + c->fill_block_tab[1] = ff_fill_block_tab_8_sse2; >> + } >> if (EXTERNAL_AVX_FAST(cpu_flags)) { >> c->clear_block = ff_clear_block_avx; >> c->clear_blocks = ff_clear_blocks_avx; >> } >> + if (EXTERNAL_AVX2(cpu_flags)) { >> + c->fill_block_tab[0] = ff_fill_block_tab_16_avx2; >> + c->fill_block_tab[1] = ff_fill_block_tab_8_avx2; >> + } >> #endif /* HAVE_X86ASM */ >> } > > Benchmarks? blockdsp.fill_block_tab[0]_c: 13.4 blockdsp.fill_block_tab[0]_sse2: 8.4 blockdsp.fill_block_tab[0]_avx2: 7.4 blockdsp.fill_block_tab[1]_c: 7.9 blockdsp.fill_block_tab[1]_sse2: 4.5 blockdsp.fill_block_tab[1]_avx2: 4.0 On an Alder Lake using --cpu-alderlake. If i let gcc compile without cpu specific optimizations, the C version is even slower. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests 2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests James Almer 2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab James Almer @ 2024-05-07 15:14 ` Andreas Rheinhardt 2024-05-07 15:26 ` James Almer 1 sibling, 1 reply; 11+ messages in thread From: Andreas Rheinhardt @ 2024-05-07 15:14 UTC (permalink / raw) To: ffmpeg-devel James Almer: > The requirement is either 8 or 16 bytes alignment, not 32. > This should help finding bugs in asm implementations. > > Signed-off-by: James Almer <jamrial@gmail.com> > --- > tests/checkasm/blockdsp.c | 23 +++++++++-------------- > 1 file changed, 9 insertions(+), 14 deletions(-) > > diff --git a/tests/checkasm/blockdsp.c b/tests/checkasm/blockdsp.c > index ab87fc8fa4..f67a38d302 100644 > --- a/tests/checkasm/blockdsp.c > +++ b/tests/checkasm/blockdsp.c > @@ -29,11 +29,6 @@ > #include "libavutil/intreadwrite.h" > #include "libavutil/mem_internal.h" > > -typedef struct { > - const char *name; > - int size; > -} test; > - > #define randomize_buffers(size) \ > do { \ > int i; \ > @@ -58,18 +53,18 @@ do { \ > } while (0) > > static void check_fill(BlockDSPContext *h){ > - const test tests[] = { > - {"fill_block_tab[0]", 16}, > - {"fill_block_tab[1]", 8}, > - }; > - LOCAL_ALIGNED_32(uint8_t, buf0, [16 * 16]); > - LOCAL_ALIGNED_32(uint8_t, buf1, [16 * 16]); > + LOCAL_ALIGNED_16(uint8_t, buf0_16, [16 * 16]); > + LOCAL_ALIGNED_16(uint8_t, buf1_16, [16 * 16]); > + LOCAL_ALIGNED_8(uint8_t, buf0_8, [8 * 8]); > + LOCAL_ALIGNED_8(uint8_t, buf1_8, [8 * 8]); > > - for (size_t t = 0; t < FF_ARRAY_ELEMS(tests); ++t) { > - int n = tests[t].size; > + for (int t = 0; t < 2; ++t) { > + uint8_t *buf0 = t ? buf0_8 : buf0_16; > + uint8_t *buf1 = t ? buf1_8 : buf1_16; > + int n = 16 - 8 * t; > declare_func(void, uint8_t *block, uint8_t value, > ptrdiff_t line_size, int h); > - if (check_func(h->fill_block_tab[t], "blockdsp.%s", tests[t].name)) { > + if (check_func(h->fill_block_tab[t], "blockdsp.fill_block_tab[%d]", t)) { > uint8_t value = rnd(); > memset(buf0, 0, sizeof(*buf0) * n * n); > memset(buf1, 0, sizeof(*buf0) * n * n); 1. I wouldn't be surprised if the *_8 buffers were still 16 byte aligned. You should probably force 8 byte alignment by using a 16 byte-aligned buffer with an offset of eight. 2. Can you also extend this test to actually test the case of stride != width? (And negative strides.) - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests 2024-05-07 15:14 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests Andreas Rheinhardt @ 2024-05-07 15:26 ` James Almer 0 siblings, 0 replies; 11+ messages in thread From: James Almer @ 2024-05-07 15:26 UTC (permalink / raw) To: ffmpeg-devel On 5/7/2024 12:14 PM, Andreas Rheinhardt wrote: > James Almer: >> The requirement is either 8 or 16 bytes alignment, not 32. >> This should help finding bugs in asm implementations. >> >> Signed-off-by: James Almer <jamrial@gmail.com> >> --- >> tests/checkasm/blockdsp.c | 23 +++++++++-------------- >> 1 file changed, 9 insertions(+), 14 deletions(-) >> >> diff --git a/tests/checkasm/blockdsp.c b/tests/checkasm/blockdsp.c >> index ab87fc8fa4..f67a38d302 100644 >> --- a/tests/checkasm/blockdsp.c >> +++ b/tests/checkasm/blockdsp.c >> @@ -29,11 +29,6 @@ >> #include "libavutil/intreadwrite.h" >> #include "libavutil/mem_internal.h" >> >> -typedef struct { >> - const char *name; >> - int size; >> -} test; >> - >> #define randomize_buffers(size) \ >> do { \ >> int i; \ >> @@ -58,18 +53,18 @@ do { \ >> } while (0) >> >> static void check_fill(BlockDSPContext *h){ >> - const test tests[] = { >> - {"fill_block_tab[0]", 16}, >> - {"fill_block_tab[1]", 8}, >> - }; >> - LOCAL_ALIGNED_32(uint8_t, buf0, [16 * 16]); >> - LOCAL_ALIGNED_32(uint8_t, buf1, [16 * 16]); >> + LOCAL_ALIGNED_16(uint8_t, buf0_16, [16 * 16]); >> + LOCAL_ALIGNED_16(uint8_t, buf1_16, [16 * 16]); >> + LOCAL_ALIGNED_8(uint8_t, buf0_8, [8 * 8]); >> + LOCAL_ALIGNED_8(uint8_t, buf1_8, [8 * 8]); >> >> - for (size_t t = 0; t < FF_ARRAY_ELEMS(tests); ++t) { >> - int n = tests[t].size; >> + for (int t = 0; t < 2; ++t) { >> + uint8_t *buf0 = t ? buf0_8 : buf0_16; >> + uint8_t *buf1 = t ? buf1_8 : buf1_16; >> + int n = 16 - 8 * t; >> declare_func(void, uint8_t *block, uint8_t value, >> ptrdiff_t line_size, int h); >> - if (check_func(h->fill_block_tab[t], "blockdsp.%s", tests[t].name)) { >> + if (check_func(h->fill_block_tab[t], "blockdsp.fill_block_tab[%d]", t)) { >> uint8_t value = rnd(); >> memset(buf0, 0, sizeof(*buf0) * n * n); >> memset(buf1, 0, sizeof(*buf0) * n * n); > > 1. I wouldn't be surprised if the *_8 buffers were still 16 byte > aligned. You should probably force 8 byte alignment by using a 16 > byte-aligned buffer with an offset of eight. Amended the following locally: > diff --git a/tests/checkasm/blockdsp.c b/tests/checkasm/blockdsp.c > index 8c1f8281d2..5f4d46b8fa 100644 > --- a/tests/checkasm/blockdsp.c > +++ b/tests/checkasm/blockdsp.c > @@ -55,12 +55,10 @@ do { \ > static void check_fill(BlockDSPContext *h){ > LOCAL_ALIGNED_16(uint8_t, buf0_16, [16 * 16]); > LOCAL_ALIGNED_16(uint8_t, buf1_16, [16 * 16]); > - LOCAL_ALIGNED_8(uint8_t, buf0_8, [8 * 8]); > - LOCAL_ALIGNED_8(uint8_t, buf1_8, [8 * 8]); > > for (int t = 0; t < 2; ++t) { > - uint8_t *buf0 = t ? buf0_8 : buf0_16; > - uint8_t *buf1 = t ? buf1_8 : buf1_16; > + uint8_t *buf0 = buf0_16 + t * /* force 8 byte alignment */ 8; > + uint8_t *buf1 = buf1_16 + t * /* force 8 byte alignment */ 8; > int n = 16 - 8 * t; > declare_func(void, uint8_t *block, uint8_t value, > ptrdiff_t line_size, int h); > 2. Can you also extend this test to actually test the case of stride != > width? (And negative strides.) Maybe later. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-05-07 15:27 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-05-07 0:27 [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab James Almer 2024-05-07 5:44 ` Martin Storsjö 2024-05-07 10:49 ` Andreas Rheinhardt 2024-05-07 10:52 ` Martin Storsjö 2024-05-07 11:11 ` James Almer 2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests James Almer 2024-05-07 15:02 ` [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab James Almer 2024-05-07 15:10 ` Andreas Rheinhardt 2024-05-07 15:15 ` James Almer 2024-05-07 15:14 ` [FFmpeg-devel] [PATCH 2/3] checkasm/blockdsp: use smallest allowed aligned buffers for fill_block_tab tests Andreas Rheinhardt 2024-05-07 15:26 ` James Almer
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git