* [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm
@ 2024-03-12 13:12 Martin Storsjö
2024-03-12 13:12 ` [FFmpeg-devel] [PATCH 2/4] checkasm: hevc_pel: Check the full output in hevc_epel/hevc_qpel Martin Storsjö
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Martin Storsjö @ 2024-03-12 13:12 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Logan Lyu, J . Dekker
The first 32 elements of each row were correct, while the
last 16 were scrambled.
This hasn't been noticed, because the checkasm test erroneously
only checked half of the output (for 8 bit functions), and
apparently none of the samples as part of "fate-hevc" seem to
trigger this specific function.
---
libavcodec/aarch64/hevcdsp_epel_neon.S | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S
index 2dafa09337..d3f0a26f79 100644
--- a/libavcodec/aarch64/hevcdsp_epel_neon.S
+++ b/libavcodec/aarch64/hevcdsp_epel_neon.S
@@ -1572,6 +1572,7 @@ function ff_hevc_put_hevc_epel_h48_8_neon_i8mm, export=1
xtn2 v22.8h, v26.4s
xtn v23.4h, v23.4s
xtn2 v23.8h, v27.4s
+ add x7, x0, #64
st4 {v20.8h, v21.8h, v22.8h, v23.8h}, [x0], x10
ext v4.16b, v2.16b, v3.16b, #1
ext v5.16b, v2.16b, v3.16b, #2
@@ -1584,11 +1585,14 @@ function ff_hevc_put_hevc_epel_h48_8_neon_i8mm, export=1
usdot v21.4s, v4.16b, v30.16b
usdot v22.4s, v5.16b, v30.16b
usdot v23.4s, v6.16b, v30.16b
- xtn v20.4h, v20.4s
- xtn2 v20.8h, v22.4s
- xtn v21.4h, v21.4s
- xtn2 v21.8h, v23.4s
- add x7, x0, #64
+ zip1 v24.4s, v20.4s, v22.4s
+ zip2 v25.4s, v20.4s, v22.4s
+ zip1 v26.4s, v21.4s, v23.4s
+ zip2 v27.4s, v21.4s, v23.4s
+ xtn v20.4h, v24.4s
+ xtn2 v20.8h, v25.4s
+ xtn v21.4h, v26.4s
+ xtn2 v21.8h, v27.4s
st2 {v20.8h, v21.8h}, [x7]
b.ne 1b
ret
--
2.39.3 (Apple Git-146)
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 2/4] checkasm: hevc_pel: Check the full output in hevc_epel/hevc_qpel
2024-03-12 13:12 [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm Martin Storsjö
@ 2024-03-12 13:12 ` Martin Storsjö
2024-03-12 13:12 ` [FFmpeg-devel] [PATCH 3/4] checkasm: hevc_pel: Split a couple excessively long lines Martin Storsjö
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Martin Storsjö @ 2024-03-12 13:12 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Logan Lyu, J . Dekker
Previously it only checked half the output in 8 bit per pixel mode,
as the output actually is 16 bit elements here.
---
tests/checkasm/hevc_pel.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/checkasm/hevc_pel.c b/tests/checkasm/hevc_pel.c
index f9a7a7717c..065da87622 100644
--- a/tests/checkasm/hevc_pel.c
+++ b/tests/checkasm/hevc_pel.c
@@ -102,7 +102,7 @@ static void checkasm_check_hevc_qpel(void)
call_ref(dstw0, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
call_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
for (row = 0; row < size[sizes]; row++) {
- if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * SIZEOF_PIXEL))
+ if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * sizeof(int16_t)))
fail();
}
bench_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
@@ -334,7 +334,7 @@ static void checkasm_check_hevc_epel(void)
call_ref(dstw0, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
call_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
for (row = 0; row < size[sizes]; row++) {
- if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * SIZEOF_PIXEL))
+ if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * sizeof(int16_t)))
fail();
}
bench_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
--
2.39.3 (Apple Git-146)
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 3/4] checkasm: hevc_pel: Split a couple excessively long lines
2024-03-12 13:12 [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm Martin Storsjö
2024-03-12 13:12 ` [FFmpeg-devel] [PATCH 2/4] checkasm: hevc_pel: Check the full output in hevc_epel/hevc_qpel Martin Storsjö
@ 2024-03-12 13:12 ` Martin Storsjö
2024-03-12 13:12 ` [FFmpeg-devel] [PATCH 4/4] checkasm: hevc_pel: Use checkasm_check for printing failing output Martin Storsjö
2024-03-14 12:47 ` [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm J. Dekker
3 siblings, 0 replies; 6+ messages in thread
From: Martin Storsjö @ 2024-03-12 13:12 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Logan Lyu, J . Dekker
---
tests/checkasm/hevc_pel.c | 134 ++++++++++++++++++++++++++++----------
1 file changed, 98 insertions(+), 36 deletions(-)
diff --git a/tests/checkasm/hevc_pel.c b/tests/checkasm/hevc_pel.c
index 065da87622..73a4619978 100644
--- a/tests/checkasm/hevc_pel.c
+++ b/tests/checkasm/hevc_pel.c
@@ -96,13 +96,16 @@ static void checkasm_check_hevc_qpel(void)
case 3: type = "qpel_hv"; break; // 1 1
}
- if (check_func(h.put_hevc_qpel[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
+ if (check_func(h.put_hevc_qpel[size][j][i],
+ "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
int16_t *dstw0 = (int16_t *) dst0, *dstw1 = (int16_t *) dst1;
randomize_buffers();
call_ref(dstw0, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
call_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
for (row = 0; row < size[sizes]; row++) {
- if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * sizeof(int16_t)))
+ if (memcmp(dstw0 + row * MAX_PB_SIZE,
+ dstw1 + row * MAX_PB_SIZE,
+ sizes[size] * sizeof(int16_t)))
fail();
}
bench_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
@@ -140,13 +143,20 @@ static void checkasm_check_hevc_qpel_uni(void)
case 3: type = "qpel_uni_hv"; break; // 1 1
}
- if (check_func(h.put_hevc_qpel_uni[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
+ if (check_func(h.put_hevc_qpel_uni[size][j][i],
+ "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
randomize_buffers();
- call_ref(dst0, sizes[size] * SIZEOF_PIXEL, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
- call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
+ call_ref(dst0, sizes[size] * SIZEOF_PIXEL,
+ src0, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], i, j, sizes[size]);
+ call_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], i, j, sizes[size]);
if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
fail();
- bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
+ bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], i, j, sizes[size]);
}
}
}
@@ -182,16 +192,23 @@ static void checkasm_check_hevc_qpel_uni_w(void)
case 3: type = "qpel_uni_w_hv"; break; // 1 1
}
- if (check_func(h.put_hevc_qpel_uni_w[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
+ if (check_func(h.put_hevc_qpel_uni_w[size][j][i],
+ "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
for (denom = denoms; *denom >= 0; denom++) {
for (wx = weights; *wx >= 0; wx++) {
for (ox = offsets; *ox >= 0; ox++) {
randomize_buffers();
- call_ref(dst0, sizes[size] * SIZEOF_PIXEL, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
- call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
+ call_ref(dst0, sizes[size] * SIZEOF_PIXEL,
+ src0, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
+ call_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
fail();
- bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
+ bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
}
}
}
@@ -232,13 +249,20 @@ static void checkasm_check_hevc_qpel_bi(void)
case 3: type = "qpel_bi_hv"; break; // 1 1
}
- if (check_func(h.put_hevc_qpel_bi[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
+ if (check_func(h.put_hevc_qpel_bi[size][j][i],
+ "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
randomize_buffers_ref();
- call_ref(dst0, sizes[size] * SIZEOF_PIXEL, src0, sizes[size] * SIZEOF_PIXEL, ref0, sizes[size], i, j, sizes[size]);
- call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, ref1, sizes[size], i, j, sizes[size]);
+ call_ref(dst0, sizes[size] * SIZEOF_PIXEL,
+ src0, sizes[size] * SIZEOF_PIXEL,
+ ref0, sizes[size], i, j, sizes[size]);
+ call_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ ref1, sizes[size], i, j, sizes[size]);
if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
fail();
- bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, ref1, sizes[size], i, j, sizes[size]);
+ bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ ref1, sizes[size], i, j, sizes[size]);
}
}
}
@@ -278,16 +302,23 @@ static void checkasm_check_hevc_qpel_bi_w(void)
case 3: type = "qpel_bi_w_hv"; break; // 1 1
}
- if (check_func(h.put_hevc_qpel_bi_w[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
+ if (check_func(h.put_hevc_qpel_bi_w[size][j][i],
+ "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
for (denom = denoms; *denom >= 0; denom++) {
for (wx = weights; *wx >= 0; wx++) {
for (ox = offsets; *ox >= 0; ox++) {
randomize_buffers_ref();
- call_ref(dst0, sizes[size] * SIZEOF_PIXEL, src0, sizes[size] * SIZEOF_PIXEL, ref0, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
- call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
+ call_ref(dst0, sizes[size] * SIZEOF_PIXEL,
+ src0, sizes[size] * SIZEOF_PIXEL,
+ ref0, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
+ call_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
fail();
- bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
+ bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
}
}
}
@@ -328,13 +359,16 @@ static void checkasm_check_hevc_epel(void)
case 3: type = "epel_hv"; break; // 1 1
}
- if (check_func(h.put_hevc_epel[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
+ if (check_func(h.put_hevc_epel[size][j][i],
+ "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
int16_t *dstw0 = (int16_t *) dst0, *dstw1 = (int16_t *) dst1;
randomize_buffers();
call_ref(dstw0, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
call_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
for (row = 0; row < size[sizes]; row++) {
- if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * sizeof(int16_t)))
+ if (memcmp(dstw0 + row * MAX_PB_SIZE,
+ dstw1 + row * MAX_PB_SIZE,
+ sizes[size] * sizeof(int16_t)))
fail();
}
bench_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
@@ -372,13 +406,20 @@ static void checkasm_check_hevc_epel_uni(void)
case 3: type = "epel_uni_hv"; break; // 1 1
}
- if (check_func(h.put_hevc_epel_uni[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
+ if (check_func(h.put_hevc_epel_uni[size][j][i],
+ "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
randomize_buffers();
- call_ref(dst0, sizes[size] * SIZEOF_PIXEL, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
- call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
+ call_ref(dst0, sizes[size] * SIZEOF_PIXEL,
+ src0, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], i, j, sizes[size]);
+ call_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], i, j, sizes[size]);
if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
fail();
- bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
+ bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], i, j, sizes[size]);
}
}
}
@@ -414,16 +455,23 @@ static void checkasm_check_hevc_epel_uni_w(void)
case 3: type = "epel_uni_w_hv"; break; // 1 1
}
- if (check_func(h.put_hevc_epel_uni_w[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
+ if (check_func(h.put_hevc_epel_uni_w[size][j][i],
+ "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
for (denom = denoms; *denom >= 0; denom++) {
for (wx = weights; *wx >= 0; wx++) {
for (ox = offsets; *ox >= 0; ox++) {
randomize_buffers();
- call_ref(dst0, sizes[size] * SIZEOF_PIXEL, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
- call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
+ call_ref(dst0, sizes[size] * SIZEOF_PIXEL,
+ src0, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
+ call_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
fail();
- bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
+ bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
}
}
}
@@ -464,13 +512,20 @@ static void checkasm_check_hevc_epel_bi(void)
case 3: type = "epel_bi_hv"; break; // 1 1
}
- if (check_func(h.put_hevc_epel_bi[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
+ if (check_func(h.put_hevc_epel_bi[size][j][i],
+ "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
randomize_buffers_ref();
- call_ref(dst0, sizes[size] * SIZEOF_PIXEL, src0, sizes[size] * SIZEOF_PIXEL, ref0, sizes[size], i, j, sizes[size]);
- call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, ref1, sizes[size], i, j, sizes[size]);
+ call_ref(dst0, sizes[size] * SIZEOF_PIXEL,
+ src0, sizes[size] * SIZEOF_PIXEL,
+ ref0, sizes[size], i, j, sizes[size]);
+ call_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ ref1, sizes[size], i, j, sizes[size]);
if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
fail();
- bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, ref1, sizes[size], i, j, sizes[size]);
+ bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ ref1, sizes[size], i, j, sizes[size]);
}
}
}
@@ -510,16 +565,23 @@ static void checkasm_check_hevc_epel_bi_w(void)
case 3: type = "epel_bi_w_hv"; break; // 1 1
}
- if (check_func(h.put_hevc_epel_bi_w[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
+ if (check_func(h.put_hevc_epel_bi_w[size][j][i],
+ "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) {
for (denom = denoms; *denom >= 0; denom++) {
for (wx = weights; *wx >= 0; wx++) {
for (ox = offsets; *ox >= 0; ox++) {
randomize_buffers_ref();
- call_ref(dst0, sizes[size] * SIZEOF_PIXEL, src0, sizes[size] * SIZEOF_PIXEL, ref0, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
- call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
+ call_ref(dst0, sizes[size] * SIZEOF_PIXEL,
+ src0, sizes[size] * SIZEOF_PIXEL,
+ ref0, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
+ call_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
fail();
- bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
+ bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
+ src1, sizes[size] * SIZEOF_PIXEL,
+ ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
}
}
}
--
2.39.3 (Apple Git-146)
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] [PATCH 4/4] checkasm: hevc_pel: Use checkasm_check for printing failing output
2024-03-12 13:12 [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm Martin Storsjö
2024-03-12 13:12 ` [FFmpeg-devel] [PATCH 2/4] checkasm: hevc_pel: Check the full output in hevc_epel/hevc_qpel Martin Storsjö
2024-03-12 13:12 ` [FFmpeg-devel] [PATCH 3/4] checkasm: hevc_pel: Split a couple excessively long lines Martin Storsjö
@ 2024-03-12 13:12 ` Martin Storsjö
2024-03-14 12:47 ` [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm J. Dekker
3 siblings, 0 replies; 6+ messages in thread
From: Martin Storsjö @ 2024-03-12 13:12 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Logan Lyu, J . Dekker
This simplifies the code for checking the output, and can print
the failing output (including a map of matching/mismatching
elements) if checkasm is run with the -v/--verbose option.
---
tests/checkasm/hevc_pel.c | 71 ++++++++++++++++++++++-----------------
1 file changed, 41 insertions(+), 30 deletions(-)
diff --git a/tests/checkasm/hevc_pel.c b/tests/checkasm/hevc_pel.c
index 73a4619978..ed22ec4f9d 100644
--- a/tests/checkasm/hevc_pel.c
+++ b/tests/checkasm/hevc_pel.c
@@ -36,6 +36,15 @@ static const int offsets[] = {0, 255, -1 };
#define SIZEOF_PIXEL ((bit_depth + 7) / 8)
#define BUF_SIZE (2 * MAX_PB_SIZE * (2 * 4 + MAX_PB_SIZE))
+#define checkasm_check_pixel(buf1, stride1, buf2, stride2, ...) \
+ ((bit_depth > 8) ? \
+ checkasm_check(uint16_t, (const uint16_t*)buf1, stride1, \
+ (const uint16_t*)buf2, stride2, \
+ __VA_ARGS__) : \
+ checkasm_check(uint8_t, (const uint8_t*) buf1, stride1, \
+ (const uint8_t*) buf2, stride2, \
+ __VA_ARGS__))
+
#define randomize_buffers() \
do { \
uint32_t mask = pixel_mask[bit_depth - 8]; \
@@ -78,7 +87,7 @@ static void checkasm_check_hevc_qpel(void)
LOCAL_ALIGNED_32(uint8_t, dst1, [BUF_SIZE]);
HEVCDSPContext h;
- int size, bit_depth, i, j, row;
+ int size, bit_depth, i, j;
declare_func(void, int16_t *dst, uint8_t *src, ptrdiff_t srcstride,
int height, intptr_t mx, intptr_t my, int width);
@@ -102,12 +111,9 @@ static void checkasm_check_hevc_qpel(void)
randomize_buffers();
call_ref(dstw0, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
call_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
- for (row = 0; row < size[sizes]; row++) {
- if (memcmp(dstw0 + row * MAX_PB_SIZE,
- dstw1 + row * MAX_PB_SIZE,
- sizes[size] * sizeof(int16_t)))
- fail();
- }
+ checkasm_check(int16_t, dstw0, MAX_PB_SIZE * sizeof(int16_t),
+ dstw1, MAX_PB_SIZE * sizeof(int16_t),
+ size[sizes], size[sizes], "dst");
bench_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
}
}
@@ -152,8 +158,9 @@ static void checkasm_check_hevc_qpel_uni(void)
call_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
sizes[size], i, j, sizes[size]);
- if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
- fail();
+ checkasm_check_pixel(dst0, sizes[size] * SIZEOF_PIXEL,
+ dst1, sizes[size] * SIZEOF_PIXEL,
+ size[sizes], size[sizes], "dst");
bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
sizes[size], i, j, sizes[size]);
@@ -204,8 +211,9 @@ static void checkasm_check_hevc_qpel_uni_w(void)
call_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
- if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
- fail();
+ checkasm_check_pixel(dst0, sizes[size] * SIZEOF_PIXEL,
+ dst1, sizes[size] * SIZEOF_PIXEL,
+ size[sizes], size[sizes], "dst");
bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
@@ -258,8 +266,9 @@ static void checkasm_check_hevc_qpel_bi(void)
call_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
ref1, sizes[size], i, j, sizes[size]);
- if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
- fail();
+ checkasm_check_pixel(dst0, sizes[size] * SIZEOF_PIXEL,
+ dst1, sizes[size] * SIZEOF_PIXEL,
+ size[sizes], size[sizes], "dst");
bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
ref1, sizes[size], i, j, sizes[size]);
@@ -314,8 +323,9 @@ static void checkasm_check_hevc_qpel_bi_w(void)
call_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
- if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
- fail();
+ checkasm_check_pixel(dst0, sizes[size] * SIZEOF_PIXEL,
+ dst1, sizes[size] * SIZEOF_PIXEL,
+ size[sizes], size[sizes], "dst");
bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
@@ -341,7 +351,7 @@ static void checkasm_check_hevc_epel(void)
LOCAL_ALIGNED_32(uint8_t, dst1, [BUF_SIZE]);
HEVCDSPContext h;
- int size, bit_depth, i, j, row;
+ int size, bit_depth, i, j;
declare_func(void, int16_t *dst, uint8_t *src, ptrdiff_t srcstride,
int height, intptr_t mx, intptr_t my, int width);
@@ -365,12 +375,9 @@ static void checkasm_check_hevc_epel(void)
randomize_buffers();
call_ref(dstw0, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
call_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
- for (row = 0; row < size[sizes]; row++) {
- if (memcmp(dstw0 + row * MAX_PB_SIZE,
- dstw1 + row * MAX_PB_SIZE,
- sizes[size] * sizeof(int16_t)))
- fail();
- }
+ checkasm_check(int16_t, dstw0, MAX_PB_SIZE * sizeof(int16_t),
+ dstw1, MAX_PB_SIZE * sizeof(int16_t),
+ size[sizes], size[sizes], "dst");
bench_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]);
}
}
@@ -415,8 +422,9 @@ static void checkasm_check_hevc_epel_uni(void)
call_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
sizes[size], i, j, sizes[size]);
- if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
- fail();
+ checkasm_check_pixel(dst0, sizes[size] * SIZEOF_PIXEL,
+ dst1, sizes[size] * SIZEOF_PIXEL,
+ size[sizes], size[sizes], "dst");
bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
sizes[size], i, j, sizes[size]);
@@ -467,8 +475,9 @@ static void checkasm_check_hevc_epel_uni_w(void)
call_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
- if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
- fail();
+ checkasm_check_pixel(dst0, sizes[size] * SIZEOF_PIXEL,
+ dst1, sizes[size] * SIZEOF_PIXEL,
+ size[sizes], size[sizes], "dst");
bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
sizes[size], *denom, *wx, *ox, i, j, sizes[size]);
@@ -521,8 +530,9 @@ static void checkasm_check_hevc_epel_bi(void)
call_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
ref1, sizes[size], i, j, sizes[size]);
- if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
- fail();
+ checkasm_check_pixel(dst0, sizes[size] * SIZEOF_PIXEL,
+ dst1, sizes[size] * SIZEOF_PIXEL,
+ size[sizes], size[sizes], "dst");
bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
ref1, sizes[size], i, j, sizes[size]);
@@ -577,8 +587,9 @@ static void checkasm_check_hevc_epel_bi_w(void)
call_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
- if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL))
- fail();
+ checkasm_check_pixel(dst0, sizes[size] * SIZEOF_PIXEL,
+ dst1, sizes[size] * SIZEOF_PIXEL,
+ size[sizes], size[sizes], "dst");
bench_new(dst1, sizes[size] * SIZEOF_PIXEL,
src1, sizes[size] * SIZEOF_PIXEL,
ref1, sizes[size], *denom, *wx, *wx, *ox, *ox, i, j, sizes[size]);
--
2.39.3 (Apple Git-146)
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm
2024-03-12 13:12 [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm Martin Storsjö
` (2 preceding siblings ...)
2024-03-12 13:12 ` [FFmpeg-devel] [PATCH 4/4] checkasm: hevc_pel: Use checkasm_check for printing failing output Martin Storsjö
@ 2024-03-14 12:47 ` J. Dekker
2024-03-14 13:16 ` Martin Storsjö
3 siblings, 1 reply; 6+ messages in thread
From: J. Dekker @ 2024-03-14 12:47 UTC (permalink / raw)
To: ffmpeg-devel
Martin Storsjö <martin@martin.st> writes:
> The first 32 elements of each row were correct, while the
> last 16 were scrambled.
>
> This hasn't been noticed, because the checkasm test erroneously
> only checked half of the output (for 8 bit functions), and
> apparently none of the samples as part of "fate-hevc" seem to
> trigger this specific function.
> ---
> libavcodec/aarch64/hevcdsp_epel_neon.S | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
Thanks for the fixes, wonder if we should use checkasm_check()
exclusively in checkasm rather than memcmp(), would probably be useful.
Pushed set
--
jd
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm
2024-03-14 12:47 ` [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm J. Dekker
@ 2024-03-14 13:16 ` Martin Storsjö
0 siblings, 0 replies; 6+ messages in thread
From: Martin Storsjö @ 2024-03-14 13:16 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Thu, 14 Mar 2024, J. Dekker wrote:
>
> Martin Storsjö <martin@martin.st> writes:
>
>> The first 32 elements of each row were correct, while the
>> last 16 were scrambled.
>>
>> This hasn't been noticed, because the checkasm test erroneously
>> only checked half of the output (for 8 bit functions), and
>> apparently none of the samples as part of "fate-hevc" seem to
>> trigger this specific function.
>> ---
>> libavcodec/aarch64/hevcdsp_epel_neon.S | 14 +++++++++-----
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> Thanks for the fixes, wonder if we should use checkasm_check()
> exclusively in checkasm rather than memcmp(), would probably be useful.
Wherever it makes sense and works, then yes, using checkasm_check()
probably is useful. (Within dav1d, we use it in most tests except for a
few.)
FWIW, many checkasm tests seem to have pretty naive setups, where e.g. all
rows are tightly packed. If they'd use a bigger stride with more padding
between rows, one can also detect some other cases of potential asm bugs.
> Pushed set
Thanks!
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-03-14 13:17 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-12 13:12 [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm Martin Storsjö
2024-03-12 13:12 ` [FFmpeg-devel] [PATCH 2/4] checkasm: hevc_pel: Check the full output in hevc_epel/hevc_qpel Martin Storsjö
2024-03-12 13:12 ` [FFmpeg-devel] [PATCH 3/4] checkasm: hevc_pel: Split a couple excessively long lines Martin Storsjö
2024-03-12 13:12 ` [FFmpeg-devel] [PATCH 4/4] checkasm: hevc_pel: Use checkasm_check for printing failing output Martin Storsjö
2024-03-14 12:47 ` [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm J. Dekker
2024-03-14 13:16 ` Martin Storsjö
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git