From: Anton Khirnov <anton@khirnov.net>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/startcode: Avoid unaligned accesses
Date: Fri, 14 Oct 2022 08:42:50 +0200
Message-ID: <166572977016.12287.1847463242431058317@lain.khirnov.net> (raw)
In-Reply-To: =?utf-8?q?=3CAS8P250MB0744B14A56741101CC55826D8F239=40AS8P250MB?= =?utf-8?q?0744=2EEURP250=2EPROD=2EOUTLOOK=2ECOM=3E?=
Quoting Andreas Rheinhardt (2022-10-12 00:20:23)
> Up until now, ff_startcode_find_candidate_c() simply casts
> an uint8_t* to uint64_t*/uint32_t* to read 64/32 bits at a time
> in case HAVE_FAST_UNALIGNED is true. Yet this ignores the
> alignment requirement of these types as well as effective type
> rules of the C standard. This commit therefore replaces these
> direct accesses with AV_RN64/32; this also improves
> readability.
>
> UBSan reported these unaligned accesses which happened in 233
> FATE-tests involving H.264 and VC-1 (this has also been reported
> in tickets #8138 and #8485); these tests are fixed by this commit.
>
> The output of GCC with -O3 is unchanged for aarch64, alpha, arm,
> loongarch, ppc and x64. There was only a slight difference for mips.
>
> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
> ---
> This is v2 of https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200122145210.6898-1-andreas.rheinhardt@gmail.com/
>
> Here is the mips code before this change:
>
> startcode_old_O3.o: file format elf64-tradlittlemips
>
>
> Disassembly of section .text:
>
> 0000000000000000 <ff_startcode_find_candidate_c>:
> 0: 18a00029 blez a1,a8 <ff_startcode_find_candidate_c+0xa8>
> 4: 3c08ff7f lui a4,0xff7f
> 8: 3c07ff01 lui a3,0xff01
> c: 65087f7f daddiu a4,a4,32639
> 10: 64e70101 daddiu a3,a3,257
> 14: 00084438 dsll a4,a4,0x10
> 18: 00073c38 dsll a3,a3,0x10
> 1c: 65087f7f daddiu a4,a4,32639
> 20: 64e70101 daddiu a3,a3,257
> 24: 00084478 dsll a4,a4,0x11
> 28: 00073df8 dsll a3,a3,0x17
> 2c: 00803025 move a2,a0
> 30: 00001025 move v0,zero
> 34: 3508feff ori a4,a4,0xfeff
> 38: 10000005 b 50 <ff_startcode_find_candidate_c+0x50>
> 3c: 34e78080 ori a3,a3,0x8080
> 40: 24420008 addiu v0,v0,8
> 44: 0045182a slt v1,v0,a1
> 48: 10600015 beqz v1,a0 <ff_startcode_find_candidate_c+0xa0>
> 4c: 00000000 nop
> 50: dcc30000 ld v1,0(a2)
> 54: 0068482d daddu a5,v1,a4
> 58: 44a30000 dmtc1 v1,$f0
> 5c: 44a90800 dmtc1 a5,$f1
> 60: 4be10002 pandn $f0,$f0,$f1
> 64: 44230000 dmfc1 v1,$f0
> 68: 00671824 and v1,v1,a3
> 6c: 1060fff4 beqz v1,40 <ff_startcode_find_candidate_c+0x40>
> 70: 64c60008 daddiu a2,a2,8
> 74: 0045182a slt v1,v0,a1
> 78: 10600009 beqz v1,a0 <ff_startcode_find_candidate_c+0xa0>
> 7c: 0082182d daddu v1,a0,v0
> 80: 10000005 b 98 <ff_startcode_find_candidate_c+0x98>
> 84: 90640000 lbu a0,0(v1)
> 88: 24420001 addiu v0,v0,1
> 8c: 10a20008 beq a1,v0,b0 <ff_startcode_find_candidate_c+0xb0>
> 90: 00000000 nop
> 94: 90640000 lbu a0,0(v1)
> 98: 1480fffb bnez a0,88 <ff_startcode_find_candidate_c+0x88>
> 9c: 64630001 daddiu v1,v1,1
> a0: 03e00008 jr ra
> a4: 00000000 nop
> a8: 03e00008 jr ra
> ac: 00001025 move v0,zero
> b0: 03e00008 jr ra
> b4: 00a01025 move v0,a1
> ...
>
> And here after this change:
>
> startcode_new_O3.o: file format elf64-tradlittlemips
>
>
> Disassembly of section .text:
>
> 0000000000000000 <ff_startcode_find_candidate_c>:
> 0: 18a0002b blez a1,b0 <ff_startcode_find_candidate_c+0xb0>
> 4: 3c08ff7f lui a4,0xff7f
> 8: 3c07ff01 lui a3,0xff01
> c: 65087f7f daddiu a4,a4,32639
> 10: 64e70101 daddiu a3,a3,257
> 14: 00084438 dsll a4,a4,0x10
> 18: 00073c38 dsll a3,a3,0x10
> 1c: 65087f7f daddiu a4,a4,32639
> 20: 64e70101 daddiu a3,a3,257
> 24: 00084478 dsll a4,a4,0x11
> 28: 00073df8 dsll a3,a3,0x17
> 2c: 00803025 move a2,a0
> 30: 00001025 move v0,zero
> 34: 3508feff ori a4,a4,0xfeff
> 38: 10000005 b 50 <ff_startcode_find_candidate_c+0x50>
> 3c: 34e78080 ori a3,a3,0x8080
> 40: 24420008 addiu v0,v0,8
> 44: 0045182a slt v1,v0,a1
> 48: 10600017 beqz v1,a8 <ff_startcode_find_candidate_c+0xa8>
> 4c: 00000000 nop
> 50: 68c30007 ldl v1,7(a2)
> 54: 6cc30000 ldr v1,0(a2)
> 58: 0068482d daddu a5,v1,a4
> 5c: 44a30000 dmtc1 v1,$f0
> 60: 44a90800 dmtc1 a5,$f1
> 64: 4be10002 pandn $f0,$f0,$f1
> 68: 44230000 dmfc1 v1,$f0
> 6c: 00671824 and v1,v1,a3
> 70: 1060fff3 beqz v1,40 <ff_startcode_find_candidate_c+0x40>
> 74: 64c60008 daddiu a2,a2,8
> 78: 0045182a slt v1,v0,a1
> 7c: 1060000a beqz v1,a8 <ff_startcode_find_candidate_c+0xa8>
> 80: 0082182d daddu v1,a0,v0
> 84: 10000006 b a0 <ff_startcode_find_candidate_c+0xa0>
> 88: 90640000 lbu a0,0(v1)
> 8c: 00000000 nop
> 90: 24420001 addiu v0,v0,1
> 94: 10a20008 beq a1,v0,b8 <ff_startcode_find_candidate_c+0xb8>
> 98: 00000000 nop
> 9c: 90640000 lbu a0,0(v1)
> a0: 1480fffb bnez a0,90 <ff_startcode_find_candidate_c+0x90>
> a4: 64630001 daddiu v1,v1,1
> a8: 03e00008 jr ra
> ac: 00000000 nop
> b0: 03e00008 jr ra
> b4: 00001025 move v0,zero
> b8: 03e00008 jr ra
> bc: 00a01025 move v0,a1
>
> As one can see, the difference is that an ld has been replaced
> by a pair of ldl and ldr. I don't know the performance implications
> of this.
google tells me ld is an aligned load, so this change is then correct
whatever its performance implications are.
--
Anton Khirnov
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev reply other threads:[~2022-10-14 6:43 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-11 22:20 Andreas Rheinhardt
2022-10-14 6:42 ` Anton Khirnov [this message]
2022-10-14 23:42 ` Andreas Rheinhardt
2022-10-15 0:04 ` Andreas Rheinhardt
2022-10-17 14:48 ` Andreas Rheinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=166572977016.12287.1847463242431058317@lain.khirnov.net \
--to=anton@khirnov.net \
--cc=andreas.rheinhardt@outlook.com \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git