* [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
@ 2024-05-29 15:38 toqsxw
2024-05-29 16:23 ` Andreas Rheinhardt
2024-05-29 17:51 ` Ronald S. Bultje
0 siblings, 2 replies; 6+ messages in thread
From: toqsxw @ 2024-05-29 15:38 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: James Almer, Wu Jianhua
From: Wu Jianhua <toqsxw@outlook.com>
Some tests fails with certain seeds
tests/checkasm/checkasm 2325607578 --test=vvc_alf
checkasm: using random seed 2325607578
AVX2:
vvc_alf_filter_luma_120x20_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x24_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x28_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x32_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x36_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x40_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x44_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x48_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x52_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x56_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x60_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x64_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x68_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x72_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x76_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x80_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x84_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x88_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x92_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x96_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x100_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x104_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x108_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x112_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x116_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x120_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x124_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x128_12_avx2 (vvc_alf.c:104)
- vvc_alf.alf_filter [FAILED]
- vvc_alf.alf_classify [OK]
checkasm: 28 of 9216 tests have failed
Reported-by: James Almer <jamrial@gmail.com>
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
---
libavcodec/x86/vvc/vvc_alf.asm | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/libavcodec/x86/vvc/vvc_alf.asm b/libavcodec/x86/vvc/vvc_alf.asm
index 71e821c27b..91f158bac9 100644
--- a/libavcodec/x86/vvc/vvc_alf.asm
+++ b/libavcodec/x86/vvc/vvc_alf.asm
@@ -278,7 +278,9 @@ SECTION .text
psrad m0, SHIFT + 3
psrad m1, SHIFT + 3
%%shift_end:
+%if ps == 1
packssdw m0, m0, m1
+%endif
%endmacro
; FILTER_VB(line)
@@ -356,7 +358,18 @@ SECTION .text
FILTER_VB xq
+ ; sum += curr
+%if ps == 1
paddw m0, m2
+%else
+ vpunpcklqdq m11, m2, m2
+ vpunpckhqdq m12, m2, m2
+ vpunpcklwd m11, m11, m14
+ vpunpcklwd m12, m12, m14
+ paddd m0, m11
+ paddd m1, m12
+ packssdw m0, m0, m1
+%endif
; clip to pixel
CLIPW m0, m14, m15
--
2.44.0.windows.1
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
2024-05-29 15:38 [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow toqsxw
@ 2024-05-29 16:23 ` Andreas Rheinhardt
2024-05-29 17:51 ` Ronald S. Bultje
1 sibling, 0 replies; 6+ messages in thread
From: Andreas Rheinhardt @ 2024-05-29 16:23 UTC (permalink / raw)
To: ffmpeg-devel
toqsxw@outlook.com:
> From: Wu Jianhua <toqsxw@outlook.com>
>
> Some tests fails with certain seeds
>
> tests/checkasm/checkasm 2325607578 --test=vvc_alf
> checkasm: using random seed 2325607578
> AVX2:
> vvc_alf_filter_luma_120x20_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x24_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x28_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x32_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x36_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x40_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x44_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x48_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x52_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x56_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x60_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x64_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x68_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x72_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x76_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x80_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x84_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x88_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x92_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x96_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x100_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x104_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x108_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x112_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x116_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x120_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x124_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x128_12_avx2 (vvc_alf.c:104)
> - vvc_alf.alf_filter [FAILED]
> - vvc_alf.alf_classify [OK]
> checkasm: 28 of 9216 tests have failed
>
> Reported-by: James Almer <jamrial@gmail.com>
> Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
> ---
> libavcodec/x86/vvc/vvc_alf.asm | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/libavcodec/x86/vvc/vvc_alf.asm b/libavcodec/x86/vvc/vvc_alf.asm
> index 71e821c27b..91f158bac9 100644
> --- a/libavcodec/x86/vvc/vvc_alf.asm
> +++ b/libavcodec/x86/vvc/vvc_alf.asm
> @@ -278,7 +278,9 @@ SECTION .text
> psrad m0, SHIFT + 3
> psrad m1, SHIFT + 3
> %%shift_end:
> +%if ps == 1
> packssdw m0, m0, m1
> +%endif
> %endmacro
>
> ; FILTER_VB(line)
> @@ -356,7 +358,18 @@ SECTION .text
>
> FILTER_VB xq
>
> + ; sum += curr
> +%if ps == 1
> paddw m0, m2
> +%else
> + vpunpcklqdq m11, m2, m2
> + vpunpckhqdq m12, m2, m2
> + vpunpcklwd m11, m11, m14
> + vpunpcklwd m12, m12, m14
> + paddd m0, m11
> + paddd m1, m12
> + packssdw m0, m0, m1
> +%endif
>
> ; clip to pixel
> CLIPW m0, m14, m15
Can this happen with real inputs (like when called from the decoder)? If
not, then the test needs to be made more realistic.
Anyway, what is the performance impact of this?
- Andreas
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
2024-05-29 15:38 [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow toqsxw
2024-05-29 16:23 ` Andreas Rheinhardt
@ 2024-05-29 17:51 ` Ronald S. Bultje
2024-05-29 19:44 ` [FFmpeg-devel] 回复: " Wu Jianhua
1 sibling, 1 reply; 6+ messages in thread
From: Ronald S. Bultje @ 2024-05-29 17:51 UTC (permalink / raw)
To: FFmpeg development discussions and patches; +Cc: Wu Jianhua, James Almer
Hi,
On Wed, May 29, 2024 at 11:38 AM <toqsxw@outlook.com> wrote:
> +%else
> + vpunpcklqdq m11, m2, m2
> + vpunpckhqdq m12, m2, m2
> + vpunpcklwd m11, m11, m14
> + vpunpcklwd m12, m12, m14
> + paddd m0, m11
> + paddd m1, m12
> + packssdw m0, m0, m1
> +%endif
>
punpcklqdq a, src, src
punpckhqdq b, src, src
punpcklwd a, a, zero
punpcklwd b, b, zero
is the same as
punpcklwd a, src, zero
punpckhwd b, src, zero
Also, the whole thing just emulates a saturated add. Can't you use paddsw
instead of paddw and be done with it? To add to Andreas' question: is
saturating here normatively required?
Ronald
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] 回复: [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
2024-05-29 17:51 ` Ronald S. Bultje
@ 2024-05-29 19:44 ` Wu Jianhua
2024-05-29 20:56 ` [FFmpeg-devel] " Ronald S. Bultje
0 siblings, 1 reply; 6+ messages in thread
From: Wu Jianhua @ 2024-05-29 19:44 UTC (permalink / raw)
To: Ronald S. Bultje, FFmpeg development discussions and patches, Nuo Mi
Cc: James Almer
Ronald S. Bultje:
> 发件人: Ronald S. Bultje <rsbultje@gmail.com>
> 发送时间: 2024年5月29日 10:51
> 收件人: FFmpeg development discussions and patches
> 抄送: James Almer; Wu Jianhua
> 主题: Re: [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
>
> Hi,
>
> On Wed, May 29, 2024 at 11:38 AM <toqsxw@outlook.com<mailto:toqsxw@outlook.com>> wrote:
> +%else
> + vpunpcklqdq m11, m2, m2
> + vpunpckhqdq m12, m2, m2
> + vpunpcklwd m11, m11, m14
> + vpunpcklwd m12, m12, m14
> + paddd m0, m11
> + paddd m1, m12
> + packssdw m0, m0, m1
> +%endif
>
> punpcklqdq a, src, src
> punpckhqdq b, src, src
> punpcklwd a, a, zero
> punpcklwd b, b, zero
>
> is the same as
>
> punpcklwd a, src, zero
> punpckhwd b, src, zero
Thank you for pointing out this. This modification is really helpful for my improvement!
Andreas:
>Can this happen with real inputs (like when called from the decoder)? If
> not, then the test needs to be made more realistic.
> Anyway, what is the performance impact of this?
I didn't have a unit test, but the average FPS looks no change.
Ronald:
> Also, the whole thing just emulates a saturated add. Can't you use paddsw instead of paddw and be done with it? To add to Andreas' question: is saturating here normatively required?
We didn't have any sample that failed for this issue except for the checksum with specific seeds. I think we can keep not changing it until a real sample has something wrong.
@Nuomi to get more details.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
2024-05-29 19:44 ` [FFmpeg-devel] 回复: " Wu Jianhua
@ 2024-05-29 20:56 ` Ronald S. Bultje
2024-05-30 16:31 ` [FFmpeg-devel] 回复: " Wu Jianhua
0 siblings, 1 reply; 6+ messages in thread
From: Ronald S. Bultje @ 2024-05-29 20:56 UTC (permalink / raw)
To: Wu Jianhua
Cc: Nuo Mi, James Almer, FFmpeg development discussions and patches
Hi,
On Wed, May 29, 2024 at 3:44 PM Wu Jianhua <toqsxw@outlook.com> wrote:
> Ronald S. Bultje:
> > On Wed, May 29, 2024 at 11:38 AM <toqsxw@outlook.com<mailto:
> toqsxw@outlook.com>> wrote:
> > +%else
> > + vpunpcklqdq m11, m2, m2
> > + vpunpckhqdq m12, m2, m2
> > + vpunpcklwd m11, m11, m14
> > + vpunpcklwd m12, m12, m14
> > + paddd m0, m11
> > + paddd m1, m12
> > + packssdw m0, m0, m1
> > +%endif
> >
> [..]
>
> Also, the whole thing just emulates a saturated add. Can't you use paddsw
> instead of paddw and be done with it? To add to Andreas' question: is
> saturating here normatively required?
>
> We didn't have any sample that failed for this issue except for the
> checksum with specific seeds. I think we can keep not changing it until a
> real sample has something wrong.
>
> @Nuomi to get more details.
>
I think "just" replacing paddw with paddsw is correct, since the input
pixels are 12bit (so they could be either unsigned or signed), the filtered
output is the result of packssdw (so signed words), and the desired output
is 12bit pixels anyway, anything greater than that is clipped to 12bit
range. So to me, it seems paddsw is a cheaper way to accomplish the same
thing.
Ronald
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* [FFmpeg-devel] 回复: [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
2024-05-29 20:56 ` [FFmpeg-devel] " Ronald S. Bultje
@ 2024-05-30 16:31 ` Wu Jianhua
0 siblings, 0 replies; 6+ messages in thread
From: Wu Jianhua @ 2024-05-30 16:31 UTC (permalink / raw)
To: Ronald S. Bultje
Cc: Nuo Mi, James Almer, FFmpeg development discussions and patches
Ronald S. Bultje:
> 发件人: Ronald S. Bultje <rsbultje@gmail.com>
> 发送时间: 2024年5月29日 13:56
> 收件人: Wu Jianhua
> 抄送: FFmpeg development discussions and patches; Nuo Mi; James Almer
> 主题: Re: [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
>
> Hi,
>
> On Wed, May 29, 2024 at 3:44 PM Wu Jianhua <toqsxw@outlook.com<mailto:toqsxw@outlook.com>> wrote:
> Ronald S. Bultje:
>> On Wed, May 29, 2024 at 11:38 AM <toqsxw@outlook.com<mailto:toqsxw@outlook.com>> <mailto:toqsxw@outlook.com<mailto:toqsxw@outlook.com>>> wrote:
>> +%else
>> + vpunpcklqdq m11, m2, m2
>> + vpunpckhqdq m12, m2, m2
>> + vpunpcklwd m11, m11, m14
>> + vpunpcklwd m12, m12, m14
>> + paddd m0, m11
>> + paddd m1, m12
>> + packssdw m0, m0, m1
>> +%endif
>
> [..]
> > Also, the whole thing just emulates a saturated add. Can't you use paddsw instead of paddw and be done with it? To add to Andreas' question: is >> saturating here normatively required?
>
> > We didn't have any sample that failed for this issue except for the checksum with specific seeds. I think we can keep not changing it until a real sample has something wrong.
>
> @Nuomi to get more details.
>
> I think "just" replacing paddw with paddsw is correct, since the input pixels are 12bit (so they could be either unsigned or signed), the filtered output > is the result of packssdw (so signed words), and the desired output is 12bit pixels anyway, anything greater than that is clipped to 12bit range. So to > me, it seems paddsw is a cheaper way to accomplish the same thing.
>
> Ronald
Hi Ronald,
Yes, it does. I've test paddsw and everything works well. It must be a cheaper way to get minimum performance loss.
And v2 sent.
Thanks for this.
Jianhua
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-05-30 16:32 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-29 15:38 [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow toqsxw
2024-05-29 16:23 ` Andreas Rheinhardt
2024-05-29 17:51 ` Ronald S. Bultje
2024-05-29 19:44 ` [FFmpeg-devel] 回复: " Wu Jianhua
2024-05-29 20:56 ` [FFmpeg-devel] " Ronald S. Bultje
2024-05-30 16:31 ` [FFmpeg-devel] 回复: " Wu Jianhua
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git