From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/startcode: Avoid unaligned accesses
Date: Sat, 15 Oct 2022 01:42:27 +0200
Message-ID: <AS8P250MB07443A9BDD40AE743CC88B798F249@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <166572977016.12287.1847463242431058317@lain.khirnov.net>
Anton Khirnov:
> Quoting Andreas Rheinhardt (2022-10-12 00:20:23)
>> Up until now, ff_startcode_find_candidate_c() simply casts
>> an uint8_t* to uint64_t*/uint32_t* to read 64/32 bits at a time
>> in case HAVE_FAST_UNALIGNED is true. Yet this ignores the
>> alignment requirement of these types as well as effective type
>> rules of the C standard. This commit therefore replaces these
>> direct accesses with AV_RN64/32; this also improves
>> readability.
>>
>> UBSan reported these unaligned accesses which happened in 233
>> FATE-tests involving H.264 and VC-1 (this has also been reported
>> in tickets #8138 and #8485); these tests are fixed by this commit.
>>
>> The output of GCC with -O3 is unchanged for aarch64, alpha, arm,
>> loongarch, ppc and x64. There was only a slight difference for mips.
>>
>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
>> ---
>> This is v2 of https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200122145210.6898-1-andreas.rheinhardt@gmail.com/
>>
>> Here is the mips code before this change:
>>
>> startcode_old_O3.o: file format elf64-tradlittlemips
>>
>>
>> Disassembly of section .text:
>>
>> 0000000000000000 <ff_startcode_find_candidate_c>:
>> 0: 18a00029 blez a1,a8 <ff_startcode_find_candidate_c+0xa8>
>> 4: 3c08ff7f lui a4,0xff7f
>> 8: 3c07ff01 lui a3,0xff01
>> c: 65087f7f daddiu a4,a4,32639
>> 10: 64e70101 daddiu a3,a3,257
>> 14: 00084438 dsll a4,a4,0x10
>> 18: 00073c38 dsll a3,a3,0x10
>> 1c: 65087f7f daddiu a4,a4,32639
>> 20: 64e70101 daddiu a3,a3,257
>> 24: 00084478 dsll a4,a4,0x11
>> 28: 00073df8 dsll a3,a3,0x17
>> 2c: 00803025 move a2,a0
>> 30: 00001025 move v0,zero
>> 34: 3508feff ori a4,a4,0xfeff
>> 38: 10000005 b 50 <ff_startcode_find_candidate_c+0x50>
>> 3c: 34e78080 ori a3,a3,0x8080
>> 40: 24420008 addiu v0,v0,8
>> 44: 0045182a slt v1,v0,a1
>> 48: 10600015 beqz v1,a0 <ff_startcode_find_candidate_c+0xa0>
>> 4c: 00000000 nop
>> 50: dcc30000 ld v1,0(a2)
>> 54: 0068482d daddu a5,v1,a4
>> 58: 44a30000 dmtc1 v1,$f0
>> 5c: 44a90800 dmtc1 a5,$f1
>> 60: 4be10002 pandn $f0,$f0,$f1
>> 64: 44230000 dmfc1 v1,$f0
>> 68: 00671824 and v1,v1,a3
>> 6c: 1060fff4 beqz v1,40 <ff_startcode_find_candidate_c+0x40>
>> 70: 64c60008 daddiu a2,a2,8
>> 74: 0045182a slt v1,v0,a1
>> 78: 10600009 beqz v1,a0 <ff_startcode_find_candidate_c+0xa0>
>> 7c: 0082182d daddu v1,a0,v0
>> 80: 10000005 b 98 <ff_startcode_find_candidate_c+0x98>
>> 84: 90640000 lbu a0,0(v1)
>> 88: 24420001 addiu v0,v0,1
>> 8c: 10a20008 beq a1,v0,b0 <ff_startcode_find_candidate_c+0xb0>
>> 90: 00000000 nop
>> 94: 90640000 lbu a0,0(v1)
>> 98: 1480fffb bnez a0,88 <ff_startcode_find_candidate_c+0x88>
>> 9c: 64630001 daddiu v1,v1,1
>> a0: 03e00008 jr ra
>> a4: 00000000 nop
>> a8: 03e00008 jr ra
>> ac: 00001025 move v0,zero
>> b0: 03e00008 jr ra
>> b4: 00a01025 move v0,a1
>> ...
>>
>> And here after this change:
>>
>> startcode_new_O3.o: file format elf64-tradlittlemips
>>
>>
>> Disassembly of section .text:
>>
>> 0000000000000000 <ff_startcode_find_candidate_c>:
>> 0: 18a0002b blez a1,b0 <ff_startcode_find_candidate_c+0xb0>
>> 4: 3c08ff7f lui a4,0xff7f
>> 8: 3c07ff01 lui a3,0xff01
>> c: 65087f7f daddiu a4,a4,32639
>> 10: 64e70101 daddiu a3,a3,257
>> 14: 00084438 dsll a4,a4,0x10
>> 18: 00073c38 dsll a3,a3,0x10
>> 1c: 65087f7f daddiu a4,a4,32639
>> 20: 64e70101 daddiu a3,a3,257
>> 24: 00084478 dsll a4,a4,0x11
>> 28: 00073df8 dsll a3,a3,0x17
>> 2c: 00803025 move a2,a0
>> 30: 00001025 move v0,zero
>> 34: 3508feff ori a4,a4,0xfeff
>> 38: 10000005 b 50 <ff_startcode_find_candidate_c+0x50>
>> 3c: 34e78080 ori a3,a3,0x8080
>> 40: 24420008 addiu v0,v0,8
>> 44: 0045182a slt v1,v0,a1
>> 48: 10600017 beqz v1,a8 <ff_startcode_find_candidate_c+0xa8>
>> 4c: 00000000 nop
>> 50: 68c30007 ldl v1,7(a2)
>> 54: 6cc30000 ldr v1,0(a2)
>> 58: 0068482d daddu a5,v1,a4
>> 5c: 44a30000 dmtc1 v1,$f0
>> 60: 44a90800 dmtc1 a5,$f1
>> 64: 4be10002 pandn $f0,$f0,$f1
>> 68: 44230000 dmfc1 v1,$f0
>> 6c: 00671824 and v1,v1,a3
>> 70: 1060fff3 beqz v1,40 <ff_startcode_find_candidate_c+0x40>
>> 74: 64c60008 daddiu a2,a2,8
>> 78: 0045182a slt v1,v0,a1
>> 7c: 1060000a beqz v1,a8 <ff_startcode_find_candidate_c+0xa8>
>> 80: 0082182d daddu v1,a0,v0
>> 84: 10000006 b a0 <ff_startcode_find_candidate_c+0xa0>
>> 88: 90640000 lbu a0,0(v1)
>> 8c: 00000000 nop
>> 90: 24420001 addiu v0,v0,1
>> 94: 10a20008 beq a1,v0,b8 <ff_startcode_find_candidate_c+0xb8>
>> 98: 00000000 nop
>> 9c: 90640000 lbu a0,0(v1)
>> a0: 1480fffb bnez a0,90 <ff_startcode_find_candidate_c+0x90>
>> a4: 64630001 daddiu v1,v1,1
>> a8: 03e00008 jr ra
>> ac: 00000000 nop
>> b0: 03e00008 jr ra
>> b4: 00001025 move v0,zero
>> b8: 03e00008 jr ra
>> bc: 00a01025 move v0,a1
>>
>> As one can see, the difference is that an ld has been replaced
>> by a pair of ldl and ldr. I don't know the performance implications
>> of this.
>
> google tells me ld is an aligned load, so this change is then correct
> whatever its performance implications are.
>
I was about to write the same after reading
https://www.cs.cmu.edu/afs/cs/academic/class/15740-f97/public/doc/mips-isa.pdf,
but then I found out that certain loongson processors allow unaligned
accesses (see
https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg08563.html) and
configure only enables fast_unaligned for
loongson2e|loongson2f|loongson3*); if fast_unaligned would lead to
crashes on these processors, it would probably have been noticed long
ago. So the question of performance implications on such processors is
still open.
If no one bothers to test this in a reasonable time, I will treat this
as sign that no one is interested in the affected arches and apply this.
- Andreas
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2022-10-14 23:42 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-11 22:20 Andreas Rheinhardt
2022-10-14 6:42 ` Anton Khirnov
2022-10-14 23:42 ` Andreas Rheinhardt [this message]
2022-10-15 0:04 ` Andreas Rheinhardt
2022-10-17 14:48 ` Andreas Rheinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AS8P250MB07443A9BDD40AE743CC88B798F249@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM \
--to=andreas.rheinhardt@outlook.com \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git