From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 77C6B40391 for ; Mon, 17 Oct 2022 14:48:20 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5808268BD2C; Mon, 17 Oct 2022 17:48:18 +0300 (EEST) Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-oln040092075032.outbound.protection.outlook.com [40.92.75.32]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2522868BCA6 for ; Mon, 17 Oct 2022 17:48:12 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iLecHQGY4xrEq/Xi4UmZrg7UCRsslQtXqpSFRX+1CiUPfFcNegttpnkKtnR+KCvGVeU3gWQMYzEXVXq6T/oR9Xf3/UQa3/sXOstTrJV01OSpIwobmk3J3TgIXf4UeumqyNoBPAAeM+B2kY0mUI0VfNDB5UtKLnyrNWMq5zPiVk/IQbzOz4qx9m68rrFVSTn4BvnEnA1RIJp7CyyIG3K8opU/oW9jqpBQEM6zMCQEMzjRhdmfD3Gdpk2kWcJAT3iksu6XIcx9uYZ9t/+pqaGuApNuG8brXKqLj5Kv3a25qRfH/hVbIRdYVcfTfvzcrQygN3xpP06rf/vB21w2zp5dOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lrmcMZNSafaURXoOehNaFJOauL4ODvK0HAMdnYNMHKE=; b=Lm/7MfEE2YkA3RDQbBKN8C3DLe0FdfwlQOPjCXc4f6s4yF6y0+8FrHdkoZbE1z9KZ5AG+NHDM7XO2C+iyJRcukXK+fSOGtlB1ABiCgp45GIUDVoSQLvXZJx7jaa3+D9CmwHvE2KVgZrSO6R3rsn6KJXyFc4XSjEnwfg1W/LZ5YdmGtDQJwBAk0hYXLlCJLu0J8Ov0ogLSFQLRM+L/djbEUjHKbJ7VazmIlyaFe60gLFgs1uwTNesE1jOLtGqdLzDwi2szJ/2Q+NTmyb87tok5spEoOW37L1x5NLEt0QRiX5/fqiNvexrIHRH4VYZCq2Bwd9TWoOTDCkgtbNbekcRJg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lrmcMZNSafaURXoOehNaFJOauL4ODvK0HAMdnYNMHKE=; b=Zj+ydLYyRBIY9ZUD8UnJs3dto5zzw5euz6uz1z8HHvUaQ9DURKdmipXiVVwuevcu9QGZqQwj928algg4U47sJuNNN+rbLRBdvx3pawAddqpnd3fq2NOnUNjZaLuuVUZhtMiFNsW9lZfqC8/Lvpkm3RUsp2bwte2Smn1X4q5kyxQXgdjN96glO7O242pCKBrZx83p0m2NbdNfK7xERlE1UVkfotJeF3jG1EYEfQvmwpQ69evOYgMJqxLCIqBkfVn8UH5iQB62iMefkTrC5FBTtMn8Fpz+yN53K65ik4AM1dq+RslVbh/4PM46x8U4FArN59gGBp7xmJwzbu/JKT7bIA== Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) by AM8P250MB0105.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:36c::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5723.29; Mon, 17 Oct 2022 14:48:10 +0000 Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::f9d7:680f:70c4:44fe]) by AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::f9d7:680f:70c4:44fe%7]) with mapi id 15.20.5709.015; Mon, 17 Oct 2022 14:48:10 +0000 Message-ID: Date: Mon, 17 Oct 2022 16:48:20 +0200 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: From: Andreas Rheinhardt In-Reply-To: X-TMN: [eHQlm/2I9kl8bjD74GN995YPK5oV/gLcKvP1UPwj6aw=] X-ClientProxiedBy: AS9PR01CA0007.eurprd01.prod.exchangelabs.com (2603:10a6:20b:540::12) To AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) X-Microsoft-Original-Message-ID: <402dfb84-6e5a-98b3-0df8-c8f1666c0f06@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AS8P250MB0744:EE_|AM8P250MB0105:EE_ X-MS-Office365-Filtering-Correlation-Id: 4534ba29-6e19-465f-65f6-08dab04e9c68 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 3vCWdh5qdzfFAyimeB1+7w1uQ86xEjZMX0y8L6pVUVpBxGaaTAJA5Y3K0su0txS5N8gDAUxIfsi0YWi53Zrq+fr+2y55LxlrRcGZFJymxb59yzflt/gOWMKluw/5pYCfeP1AeX0YqxdvJ6i8eE8ZBhm3noQNz89FvcTDqE6LSh5+otMOH6DsPxW/zPexDASrdSx/8wF9ZiOGvO+uSfQyCxESX16B+81ij80uoJEnPp7FFqmUYHAX69N1kYa+WHb7noXPdRMwjf3Ltud4aPJxno8U5BHlbJFqV25vN04eO3l+nGJ0zvh5hOvy9k3yOGadZgbtnga0GJyE8yJTn9DgLXDq12T/Lbh2723buX7IP7kj50Yjw4l0E1gc2DW1blH4CvejiEf9eaTx27DArql5dypbbCTvg92aJnXcSsB8obpIz5qcTQtbjb/77MozeA5NGvHuDzefxSl1D5Qh5UtpApbFL2weHOawjx7ZAxtZhHdDqWaew+vyhxlgcOa7p2RPRGNWOdLD5G6hAjN//G7EL3kk1Q/YWGLEiMBo090AzAeTHGku1mmWslh4Dk5pfOwo9WazTeaHguRguPs2ROsUsdfHcZnK26D7zbQl7ioU0PcgzbrayFfj/gPd9KetQyq41MX5VLBYp4ZOMJi3BlaN7VyK47nwQ9eTVVlBTIE3hypEfF7QNJSJ5o1h2GhxGDeO X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?Y3pzTUVXVzVCUWVNZTlTRGpxN3ZaSTQ1OFdSZVJuNHA5Q081ZCtsVzdLd0JI?= =?utf-8?B?ZmR4dnFleGFYUEFwZ0FtcW1ZaDRrL1FkQ2l1SFFyODNuTE9GcGZQYUI3WUlZ?= =?utf-8?B?STNISG9VeVdDWk1rd0xNRzVhL2VYeW0yMXdCY3Mzc082NkhmNmhQMG84eFZr?= =?utf-8?B?bmlOK29RRlF4VklzZGhRcWhPN09rbzB4cXAvVVBWanN5TXFUSGN2Q0NTcjhT?= =?utf-8?B?dWVkazBJTGV0MDVQNUxJdmVGNm8rU3JWVll3NDR2N0NrVVlXQ2pMS1pSRjhJ?= =?utf-8?B?YTRpbXJ3OTF0V2tSZTA0c3IxZWVGYkJRUmNQMEc1dmFWaEJHQXprL0pPd3Uv?= =?utf-8?B?Y00ybnBJWUFEWERSaGJXbUZqaEJpZW1YSUFONktlRm1SYUxDU1MxTlgvekx2?= =?utf-8?B?aTYvblRRNDBqbURLWHZXaFYyaFJ0dnY5RndYVUZqeHRtd3lyV3kzMFY4bGg0?= =?utf-8?B?Q0tlNDlvL1ZzZE8wS3FlTVJ2aG1sV1VUWEsyL1pYdDNXMnd3MDJpUDA5bWYw?= =?utf-8?B?RkJrdmVrbnlFTGxwV0NIZXVxc05tck1lRnN1KzlYNndXT25TL0FyNDlBUUVR?= =?utf-8?B?dy9rWGhHdFpQRldNWCtEeFE0eWMwYnlFMW1yeHJkV3E3aWlqb2IrN1pZYnN2?= =?utf-8?B?QkJSRDZWcEFuTXZkdmx4VklDQThaaEpBWDJVWG83V29YQUI0dk1xK0NhS05T?= =?utf-8?B?NjhrR2QrR2M5b3Z1cys4dXY4WXY5d1Y3SXRpZTVCK2lHSzFFaTA1SjJ5YUZr?= =?utf-8?B?RVhybkhncW5LSDNINmlBNkV2TUJKcEFicUdhTVFHdE9CRWlTcEJEZVZOYmtj?= =?utf-8?B?NVh5TU1TZ29nV0RVWkRwWXMzc0ZQeFpQSEE4ZW5nNzdoZmUxa2NGMVQ1N0Fq?= =?utf-8?B?U1dFTzZmNmJHaVJZaUNDR2k3N3pGSEJmejczMzBDd3RvTTJlTldYK1BUWGla?= =?utf-8?B?d05iRjl3WWp5OHNqSm93TGU4UlR2dzErcms4WTRmaWdQOWxKLzBCaExjS1NX?= =?utf-8?B?QzNMbDdZaFdNQjJOaDRWSzVpNEo4WEJVdnBid3lZZGlEbHlSV2R1N0xTdzgv?= =?utf-8?B?Y2EyV3dXNm03OGN1Q243TDYyZ0ZYMlgxYVRIVHVsZlJkL2dEWDJYNnZkUEZB?= =?utf-8?B?TEYvZTBIL2hmV1EzV1JRMFcwLzJnTmN0Ym1mOXduOHVrVmRVcEFMUkp3Zzln?= =?utf-8?B?RUxEQ1pUenZ4U3BiRjA5N3M0SGFNa2xDVmhmMUVYR1dhbzJVWjlweGVYY1l5?= =?utf-8?B?bkE3ZHhJWGlZOEhQTGR3NG10dWFOeTRiNStTOHZIazcyWG5NY1U3d1B5dkF2?= =?utf-8?B?YzFwa2x1OFlDMlZONm8xVEpsL1l5M2RCbDl6Zjc3NEVLd3pST1JWWUpHTitU?= =?utf-8?B?aGtHRjBudC80cFJHS0RFUTZxWVhLL0YyVTF5eWZjaFM0RHZ5bG5IVXY2SnFL?= =?utf-8?B?eVk4dzYwQ1BUenY2SDNWWGcyZ1BraWdDc3BGVUpJNEdiVzc1NERVQkpyN1M1?= =?utf-8?B?TFYrenhUNVBhQWZDelYydDFaRXhzYWgyYm1YU3ZqUUVRcjJWUTM3M3UwaE9j?= =?utf-8?B?MW9La095WXhocUJZdEFnbzBWd2pJZEgyUnNFNmZHdDEySllaOFJ2VW50clE4?= =?utf-8?B?K2dnUjNrcldsUlZoTXJhVEJJS0lZbDFJalJnaUN6L3VLZWVDcXZtSXcyOCtR?= =?utf-8?B?WitXWDdsd2dDSnIwaVR3MTBiRk1FakRKNVhjTmFoYVlQb1h6bGZxTHNRcXdI?= =?utf-8?Q?I5XmA9kexqF4zuPcVBMibwzNbweRfos9qaUHWEy?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4534ba29-6e19-465f-65f6-08dab04e9c68 X-MS-Exchange-CrossTenant-AuthSource: AS8P250MB0744.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Oct 2022 14:48:10.7028 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8P250MB0105 Subject: Re: [FFmpeg-devel] [PATCH v2] avcodec/startcode: Avoid unaligned accesses X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Andreas Rheinhardt: > Up until now, ff_startcode_find_candidate_c() simply casts > an uint8_t* to uint64_t*/uint32_t* to read 64/32 bits at a time > in case HAVE_FAST_UNALIGNED is true. Yet this ignores the > alignment requirement of these types as well as effective type > rules of the C standard. This commit therefore replaces these > direct accesses with AV_RN64/32; this also improves > readability. > > UBSan reported these unaligned accesses which happened in 233 > FATE-tests involving H.264 and VC-1 (this has also been reported > in tickets #8138 and #8485); these tests are fixed by this commit. > > The output of GCC with -O3 is unchanged for aarch64, alpha, arm, > loongarch, ppc and x64. There was only a slight difference for mips. Will apply this patch in two days with the last paragraph replaced by the following unless there are objections: The output of GCC with -O3 is unchanged for aarch64, loongarch, ppc and x64 (as well as for arches like alpha for which HAVE_FAST_UNALIGNED is never true in the first place). There was only a slight difference for mips and arm. I don't know about the speed impact of them. > > Signed-off-by: Andreas Rheinhardt > --- > This is v2 of https://patchwork.ffmpeg.org/project/ffmpeg/patch/20200122145210.6898-1-andreas.rheinhardt@gmail.com/ > > Here is the mips code before this change: > > startcode_old_O3.o: file format elf64-tradlittlemips > > > Disassembly of section .text: > > 0000000000000000 : > 0: 18a00029 blez a1,a8 > 4: 3c08ff7f lui a4,0xff7f > 8: 3c07ff01 lui a3,0xff01 > c: 65087f7f daddiu a4,a4,32639 > 10: 64e70101 daddiu a3,a3,257 > 14: 00084438 dsll a4,a4,0x10 > 18: 00073c38 dsll a3,a3,0x10 > 1c: 65087f7f daddiu a4,a4,32639 > 20: 64e70101 daddiu a3,a3,257 > 24: 00084478 dsll a4,a4,0x11 > 28: 00073df8 dsll a3,a3,0x17 > 2c: 00803025 move a2,a0 > 30: 00001025 move v0,zero > 34: 3508feff ori a4,a4,0xfeff > 38: 10000005 b 50 > 3c: 34e78080 ori a3,a3,0x8080 > 40: 24420008 addiu v0,v0,8 > 44: 0045182a slt v1,v0,a1 > 48: 10600015 beqz v1,a0 > 4c: 00000000 nop > 50: dcc30000 ld v1,0(a2) > 54: 0068482d daddu a5,v1,a4 > 58: 44a30000 dmtc1 v1,$f0 > 5c: 44a90800 dmtc1 a5,$f1 > 60: 4be10002 pandn $f0,$f0,$f1 > 64: 44230000 dmfc1 v1,$f0 > 68: 00671824 and v1,v1,a3 > 6c: 1060fff4 beqz v1,40 > 70: 64c60008 daddiu a2,a2,8 > 74: 0045182a slt v1,v0,a1 > 78: 10600009 beqz v1,a0 > 7c: 0082182d daddu v1,a0,v0 > 80: 10000005 b 98 > 84: 90640000 lbu a0,0(v1) > 88: 24420001 addiu v0,v0,1 > 8c: 10a20008 beq a1,v0,b0 > 90: 00000000 nop > 94: 90640000 lbu a0,0(v1) > 98: 1480fffb bnez a0,88 > 9c: 64630001 daddiu v1,v1,1 > a0: 03e00008 jr ra > a4: 00000000 nop > a8: 03e00008 jr ra > ac: 00001025 move v0,zero > b0: 03e00008 jr ra > b4: 00a01025 move v0,a1 > ... > > And here after this change: > > startcode_new_O3.o: file format elf64-tradlittlemips > > > Disassembly of section .text: > > 0000000000000000 : > 0: 18a0002b blez a1,b0 > 4: 3c08ff7f lui a4,0xff7f > 8: 3c07ff01 lui a3,0xff01 > c: 65087f7f daddiu a4,a4,32639 > 10: 64e70101 daddiu a3,a3,257 > 14: 00084438 dsll a4,a4,0x10 > 18: 00073c38 dsll a3,a3,0x10 > 1c: 65087f7f daddiu a4,a4,32639 > 20: 64e70101 daddiu a3,a3,257 > 24: 00084478 dsll a4,a4,0x11 > 28: 00073df8 dsll a3,a3,0x17 > 2c: 00803025 move a2,a0 > 30: 00001025 move v0,zero > 34: 3508feff ori a4,a4,0xfeff > 38: 10000005 b 50 > 3c: 34e78080 ori a3,a3,0x8080 > 40: 24420008 addiu v0,v0,8 > 44: 0045182a slt v1,v0,a1 > 48: 10600017 beqz v1,a8 > 4c: 00000000 nop > 50: 68c30007 ldl v1,7(a2) > 54: 6cc30000 ldr v1,0(a2) > 58: 0068482d daddu a5,v1,a4 > 5c: 44a30000 dmtc1 v1,$f0 > 60: 44a90800 dmtc1 a5,$f1 > 64: 4be10002 pandn $f0,$f0,$f1 > 68: 44230000 dmfc1 v1,$f0 > 6c: 00671824 and v1,v1,a3 > 70: 1060fff3 beqz v1,40 > 74: 64c60008 daddiu a2,a2,8 > 78: 0045182a slt v1,v0,a1 > 7c: 1060000a beqz v1,a8 > 80: 0082182d daddu v1,a0,v0 > 84: 10000006 b a0 > 88: 90640000 lbu a0,0(v1) > 8c: 00000000 nop > 90: 24420001 addiu v0,v0,1 > 94: 10a20008 beq a1,v0,b8 > 98: 00000000 nop > 9c: 90640000 lbu a0,0(v1) > a0: 1480fffb bnez a0,90 > a4: 64630001 daddiu v1,v1,1 > a8: 03e00008 jr ra > ac: 00000000 nop > b0: 03e00008 jr ra > b4: 00001025 move v0,zero > b8: 03e00008 jr ra > bc: 00a01025 move v0,a1 > > As one can see, the difference is that an ld has been replaced > by a pair of ldl and ldr. I don't know the performance implications > of this. > > libavcodec/startcode.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c > index 9efdffe8c6..d84f326521 100644 > --- a/libavcodec/startcode.c > +++ b/libavcodec/startcode.c > @@ -25,6 +25,7 @@ > * @author Michael Niedermayer > */ > > +#include "libavutil/intreadwrite.h" > #include "startcode.h" > #include "config.h" > > @@ -38,14 +39,14 @@ int ff_startcode_find_candidate_c(const uint8_t *buf, int size) > */ > #if HAVE_FAST_64BIT > while (i < size && > - !((~*(const uint64_t *)(buf + i) & > - (*(const uint64_t *)(buf + i) - 0x0101010101010101ULL)) & > + !((~AV_RN64(buf + i) & > + (AV_RN64(buf + i) - 0x0101010101010101ULL)) & > 0x8080808080808080ULL)) > i += 8; > #else > while (i < size && > - !((~*(const uint32_t *)(buf + i) & > - (*(const uint32_t *)(buf + i) - 0x01010101U)) & > + !((~AV_RN32(buf + i) & > + (AV_RN32(buf + i) - 0x01010101U)) & > 0x80808080U)) > i += 4; > #endif _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".