From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id DF06D444FA for ; Fri, 14 Oct 2022 06:43:05 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EA15C68BD6C; Fri, 14 Oct 2022 09:43:01 +0300 (EEST) Received: from mail0.khirnov.net (red.khirnov.net [176.97.15.12]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id D7A8C68BB72 for ; Fri, 14 Oct 2022 09:42:55 +0300 (EEST) Received: from localhost (localhost [IPv6:::1]) by mail0.khirnov.net (Postfix) with ESMTP id 6F4562404E4; Fri, 14 Oct 2022 08:42:54 +0200 (CEST) Received: from mail0.khirnov.net ([IPv6:::1]) by localhost (mail0.khirnov.net [IPv6:::1]) (amavisd-new, port 10024) with ESMTP id kHRcf-fe64rV; Fri, 14 Oct 2022 08:42:52 +0200 (CEST) Received: from lain.khirnov.net (lain.khirnov.net [IPv6:2001:67c:1138:4306::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "lain.khirnov.net", Issuer "smtp.khirnov.net SMTP CA" (verified OK)) by mail0.khirnov.net (Postfix) with ESMTPS id 4C98D2400F4; Fri, 14 Oct 2022 08:42:52 +0200 (CEST) Received: by lain.khirnov.net (Postfix, from userid 1000) id 2F6B91601B2; Fri, 14 Oct 2022 08:42:50 +0200 (CEST) From: Anton Khirnov To: FFmpeg development discussions and patches In-Reply-To: =?utf-8?q?=3CAS8P250MB0744B14A56741101CC55826D8F239=40AS8P250MB?= =?utf-8?q?0744=2EEURP250=2EPROD=2EOUTLOOK=2ECOM=3E?= References: =?utf-8?q?=3CAS8P250MB0744B14A56741101CC55826D8F239=40AS8P250MB0?= =?utf-8?q?744=2EEURP250=2EPROD=2EOUTLOOK=2ECOM=3E?= Mail-Followup-To: FFmpeg development discussions and patches , Andreas Rheinhardt Date: Fri, 14 Oct 2022 08:42:50 +0200 Message-ID: <166572977016.12287.1847463242431058317@lain.khirnov.net> User-Agent: alot/0.8.1 MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/startcode: Avoid unaligned accesses X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Quoting Andreas Rheinhardt (2022-10-12 00:20:23) > Up until now, ff_startcode_find_candidate_c() simply casts > an uint8_t* to uint64_t*/uint32_t* to read 64/32 bits at a time > in case HAVE_FAST_UNALIGNED is true. Yet this ignores the > alignment requirement of these types as well as effective type > rules of the C standard. This commit therefore replaces these > direct accesses with AV_RN64/32; this also improves > readability. > > UBSan reported these unaligned accesses which happened in 233 > FATE-tests involving H.264 and VC-1 (this has also been reported > in tickets #8138 and #8485); these tests are fixed by this commit. > > The output of GCC with -O3 is unchanged for aarch64, alpha, arm, > loongarch, ppc and x64. There was only a slight difference for mips. > > Signed-off-by: Andreas Rheinhardt > --- > This is v2 of https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200122145210.6898-1-andreas.rheinhardt@gmail.com/ > > Here is the mips code before this change: > > startcode_old_O3.o: file format elf64-tradlittlemips > > > Disassembly of section .text: > > 0000000000000000 : > 0: 18a00029 blez a1,a8 > 4: 3c08ff7f lui a4,0xff7f > 8: 3c07ff01 lui a3,0xff01 > c: 65087f7f daddiu a4,a4,32639 > 10: 64e70101 daddiu a3,a3,257 > 14: 00084438 dsll a4,a4,0x10 > 18: 00073c38 dsll a3,a3,0x10 > 1c: 65087f7f daddiu a4,a4,32639 > 20: 64e70101 daddiu a3,a3,257 > 24: 00084478 dsll a4,a4,0x11 > 28: 00073df8 dsll a3,a3,0x17 > 2c: 00803025 move a2,a0 > 30: 00001025 move v0,zero > 34: 3508feff ori a4,a4,0xfeff > 38: 10000005 b 50 > 3c: 34e78080 ori a3,a3,0x8080 > 40: 24420008 addiu v0,v0,8 > 44: 0045182a slt v1,v0,a1 > 48: 10600015 beqz v1,a0 > 4c: 00000000 nop > 50: dcc30000 ld v1,0(a2) > 54: 0068482d daddu a5,v1,a4 > 58: 44a30000 dmtc1 v1,$f0 > 5c: 44a90800 dmtc1 a5,$f1 > 60: 4be10002 pandn $f0,$f0,$f1 > 64: 44230000 dmfc1 v1,$f0 > 68: 00671824 and v1,v1,a3 > 6c: 1060fff4 beqz v1,40 > 70: 64c60008 daddiu a2,a2,8 > 74: 0045182a slt v1,v0,a1 > 78: 10600009 beqz v1,a0 > 7c: 0082182d daddu v1,a0,v0 > 80: 10000005 b 98 > 84: 90640000 lbu a0,0(v1) > 88: 24420001 addiu v0,v0,1 > 8c: 10a20008 beq a1,v0,b0 > 90: 00000000 nop > 94: 90640000 lbu a0,0(v1) > 98: 1480fffb bnez a0,88 > 9c: 64630001 daddiu v1,v1,1 > a0: 03e00008 jr ra > a4: 00000000 nop > a8: 03e00008 jr ra > ac: 00001025 move v0,zero > b0: 03e00008 jr ra > b4: 00a01025 move v0,a1 > ... > > And here after this change: > > startcode_new_O3.o: file format elf64-tradlittlemips > > > Disassembly of section .text: > > 0000000000000000 : > 0: 18a0002b blez a1,b0 > 4: 3c08ff7f lui a4,0xff7f > 8: 3c07ff01 lui a3,0xff01 > c: 65087f7f daddiu a4,a4,32639 > 10: 64e70101 daddiu a3,a3,257 > 14: 00084438 dsll a4,a4,0x10 > 18: 00073c38 dsll a3,a3,0x10 > 1c: 65087f7f daddiu a4,a4,32639 > 20: 64e70101 daddiu a3,a3,257 > 24: 00084478 dsll a4,a4,0x11 > 28: 00073df8 dsll a3,a3,0x17 > 2c: 00803025 move a2,a0 > 30: 00001025 move v0,zero > 34: 3508feff ori a4,a4,0xfeff > 38: 10000005 b 50 > 3c: 34e78080 ori a3,a3,0x8080 > 40: 24420008 addiu v0,v0,8 > 44: 0045182a slt v1,v0,a1 > 48: 10600017 beqz v1,a8 > 4c: 00000000 nop > 50: 68c30007 ldl v1,7(a2) > 54: 6cc30000 ldr v1,0(a2) > 58: 0068482d daddu a5,v1,a4 > 5c: 44a30000 dmtc1 v1,$f0 > 60: 44a90800 dmtc1 a5,$f1 > 64: 4be10002 pandn $f0,$f0,$f1 > 68: 44230000 dmfc1 v1,$f0 > 6c: 00671824 and v1,v1,a3 > 70: 1060fff3 beqz v1,40 > 74: 64c60008 daddiu a2,a2,8 > 78: 0045182a slt v1,v0,a1 > 7c: 1060000a beqz v1,a8 > 80: 0082182d daddu v1,a0,v0 > 84: 10000006 b a0 > 88: 90640000 lbu a0,0(v1) > 8c: 00000000 nop > 90: 24420001 addiu v0,v0,1 > 94: 10a20008 beq a1,v0,b8 > 98: 00000000 nop > 9c: 90640000 lbu a0,0(v1) > a0: 1480fffb bnez a0,90 > a4: 64630001 daddiu v1,v1,1 > a8: 03e00008 jr ra > ac: 00000000 nop > b0: 03e00008 jr ra > b4: 00001025 move v0,zero > b8: 03e00008 jr ra > bc: 00a01025 move v0,a1 > > As one can see, the difference is that an ld has been replaced > by a pair of ldl and ldr. I don't know the performance implications > of this. google tells me ld is an aligned load, so this change is then correct whatever its performance implications are. -- Anton Khirnov _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".