From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> To: ffmpeg-devel@ffmpeg.org Cc: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> Subject: [FFmpeg-devel] [PATCH v2] avcodec/startcode: Avoid unaligned accesses Date: Wed, 12 Oct 2022 00:20:23 +0200 Message-ID: <AS8P250MB0744B14A56741101CC55826D8F239@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM> (raw) Up until now, ff_startcode_find_candidate_c() simply casts an uint8_t* to uint64_t*/uint32_t* to read 64/32 bits at a time in case HAVE_FAST_UNALIGNED is true. Yet this ignores the alignment requirement of these types as well as effective type rules of the C standard. This commit therefore replaces these direct accesses with AV_RN64/32; this also improves readability. UBSan reported these unaligned accesses which happened in 233 FATE-tests involving H.264 and VC-1 (this has also been reported in tickets #8138 and #8485); these tests are fixed by this commit. The output of GCC with -O3 is unchanged for aarch64, alpha, arm, loongarch, ppc and x64. There was only a slight difference for mips. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> --- This is v2 of https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200122145210.6898-1-andreas.rheinhardt@gmail.com/ Here is the mips code before this change: startcode_old_O3.o: file format elf64-tradlittlemips Disassembly of section .text: 0000000000000000 <ff_startcode_find_candidate_c>: 0: 18a00029 blez a1,a8 <ff_startcode_find_candidate_c+0xa8> 4: 3c08ff7f lui a4,0xff7f 8: 3c07ff01 lui a3,0xff01 c: 65087f7f daddiu a4,a4,32639 10: 64e70101 daddiu a3,a3,257 14: 00084438 dsll a4,a4,0x10 18: 00073c38 dsll a3,a3,0x10 1c: 65087f7f daddiu a4,a4,32639 20: 64e70101 daddiu a3,a3,257 24: 00084478 dsll a4,a4,0x11 28: 00073df8 dsll a3,a3,0x17 2c: 00803025 move a2,a0 30: 00001025 move v0,zero 34: 3508feff ori a4,a4,0xfeff 38: 10000005 b 50 <ff_startcode_find_candidate_c+0x50> 3c: 34e78080 ori a3,a3,0x8080 40: 24420008 addiu v0,v0,8 44: 0045182a slt v1,v0,a1 48: 10600015 beqz v1,a0 <ff_startcode_find_candidate_c+0xa0> 4c: 00000000 nop 50: dcc30000 ld v1,0(a2) 54: 0068482d daddu a5,v1,a4 58: 44a30000 dmtc1 v1,$f0 5c: 44a90800 dmtc1 a5,$f1 60: 4be10002 pandn $f0,$f0,$f1 64: 44230000 dmfc1 v1,$f0 68: 00671824 and v1,v1,a3 6c: 1060fff4 beqz v1,40 <ff_startcode_find_candidate_c+0x40> 70: 64c60008 daddiu a2,a2,8 74: 0045182a slt v1,v0,a1 78: 10600009 beqz v1,a0 <ff_startcode_find_candidate_c+0xa0> 7c: 0082182d daddu v1,a0,v0 80: 10000005 b 98 <ff_startcode_find_candidate_c+0x98> 84: 90640000 lbu a0,0(v1) 88: 24420001 addiu v0,v0,1 8c: 10a20008 beq a1,v0,b0 <ff_startcode_find_candidate_c+0xb0> 90: 00000000 nop 94: 90640000 lbu a0,0(v1) 98: 1480fffb bnez a0,88 <ff_startcode_find_candidate_c+0x88> 9c: 64630001 daddiu v1,v1,1 a0: 03e00008 jr ra a4: 00000000 nop a8: 03e00008 jr ra ac: 00001025 move v0,zero b0: 03e00008 jr ra b4: 00a01025 move v0,a1 ... And here after this change: startcode_new_O3.o: file format elf64-tradlittlemips Disassembly of section .text: 0000000000000000 <ff_startcode_find_candidate_c>: 0: 18a0002b blez a1,b0 <ff_startcode_find_candidate_c+0xb0> 4: 3c08ff7f lui a4,0xff7f 8: 3c07ff01 lui a3,0xff01 c: 65087f7f daddiu a4,a4,32639 10: 64e70101 daddiu a3,a3,257 14: 00084438 dsll a4,a4,0x10 18: 00073c38 dsll a3,a3,0x10 1c: 65087f7f daddiu a4,a4,32639 20: 64e70101 daddiu a3,a3,257 24: 00084478 dsll a4,a4,0x11 28: 00073df8 dsll a3,a3,0x17 2c: 00803025 move a2,a0 30: 00001025 move v0,zero 34: 3508feff ori a4,a4,0xfeff 38: 10000005 b 50 <ff_startcode_find_candidate_c+0x50> 3c: 34e78080 ori a3,a3,0x8080 40: 24420008 addiu v0,v0,8 44: 0045182a slt v1,v0,a1 48: 10600017 beqz v1,a8 <ff_startcode_find_candidate_c+0xa8> 4c: 00000000 nop 50: 68c30007 ldl v1,7(a2) 54: 6cc30000 ldr v1,0(a2) 58: 0068482d daddu a5,v1,a4 5c: 44a30000 dmtc1 v1,$f0 60: 44a90800 dmtc1 a5,$f1 64: 4be10002 pandn $f0,$f0,$f1 68: 44230000 dmfc1 v1,$f0 6c: 00671824 and v1,v1,a3 70: 1060fff3 beqz v1,40 <ff_startcode_find_candidate_c+0x40> 74: 64c60008 daddiu a2,a2,8 78: 0045182a slt v1,v0,a1 7c: 1060000a beqz v1,a8 <ff_startcode_find_candidate_c+0xa8> 80: 0082182d daddu v1,a0,v0 84: 10000006 b a0 <ff_startcode_find_candidate_c+0xa0> 88: 90640000 lbu a0,0(v1) 8c: 00000000 nop 90: 24420001 addiu v0,v0,1 94: 10a20008 beq a1,v0,b8 <ff_startcode_find_candidate_c+0xb8> 98: 00000000 nop 9c: 90640000 lbu a0,0(v1) a0: 1480fffb bnez a0,90 <ff_startcode_find_candidate_c+0x90> a4: 64630001 daddiu v1,v1,1 a8: 03e00008 jr ra ac: 00000000 nop b0: 03e00008 jr ra b4: 00001025 move v0,zero b8: 03e00008 jr ra bc: 00a01025 move v0,a1 As one can see, the difference is that an ld has been replaced by a pair of ldl and ldr. I don't know the performance implications of this. libavcodec/startcode.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c index 9efdffe8c6..d84f326521 100644 --- a/libavcodec/startcode.c +++ b/libavcodec/startcode.c @@ -25,6 +25,7 @@ * @author Michael Niedermayer <michaelni@gmx.at> */ +#include "libavutil/intreadwrite.h" #include "startcode.h" #include "config.h" @@ -38,14 +39,14 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size) */ #if HAVE_FAST_64BIT while (i < size && - !((~*(const uint64_t *)(buf + i) & - (*(const uint64_t *)(buf + i) - 0x0101010101010101ULL)) & + !((~AV_RN64(buf + i) & + (AV_RN64(buf + i) - 0x0101010101010101ULL)) & 0x8080808080808080ULL)) i += 8; #else while (i < size && - !((~*(const uint32_t *)(buf + i) & - (*(const uint32_t *)(buf + i) - 0x01010101U)) & + !((~AV_RN32(buf + i) & + (AV_RN32(buf + i) - 0x01010101U)) & 0x80808080U)) i += 4; #endif -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next reply other threads:[~2022-10-11 22:20 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-10-11 22:20 Andreas Rheinhardt [this message] 2022-10-14 6:42 ` Anton Khirnov 2022-10-14 23:42 ` Andreas Rheinhardt 2022-10-15 0:04 ` Andreas Rheinhardt 2022-10-17 14:48 ` Andreas Rheinhardt
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=AS8P250MB0744B14A56741101CC55826D8F239@AS8P250MB0744.EURP250.PROD.OUTLOOK.COM \ --to=andreas.rheinhardt@outlook.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git