From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 6F5B94709B for ; Fri, 27 Oct 2023 03:09:25 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C670368CBBD; Fri, 27 Oct 2023 06:09:23 +0300 (EEST) Received: from EUR03-AM7-obe.outbound.protection.outlook.com (mail-am7eur03olkn2040.outbound.protection.outlook.com [40.92.59.40]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 49D0968CB11 for ; Fri, 27 Oct 2023 06:09:17 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EK4eOi8ZVahAilic6JLbWC4fl3hX0cQ5nuHir+6jMv35ci8R6QlgXIApNQKvvhLynMoXCRLLlhoefmLVATN7lzbWk9Vy4fuQ066n3m8F7urKW6OqmQwrcFOWqP4FCXyq7344j4GTLA9ZXYrj0RKCFSPUS8ye/PVsvP8xpXlRPTOdc7RDYtkTrMYzVMd09f3TzSJ8/7/tFwDnmT9Rxv7BStp7KtLsCK5Z6CUm8NnltU2RsXGkpE2a44Y9QOENbLT1UtCS3XqtHSwGiXWFYdT14SiR4/wRLnBcp0x6K56fMeAox73pfjyeMWcatRHghw21rAZony2UJB2/f0UesHg9Dw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+TEjPckDoDXws4ejLZVjPB24PHZnrMM3lyU1N2YitQE=; b=JcN1FZ0ZyZqiR2YVU3SjxKM+2UqeGFUyibGooMjFBmhVIbAM5kKEBkP+civw05XLf9TXYXwtdXqA3Jk8sUYwNW3IC/6EAAeJBX32NVYlP6v9BZ60LrJDIlIvoKgznX3olO00Ft6A0lA03kpNOIIxGL6n664I6HoJwIrogSjHgaHbjHjDbbFn7tO+LzRcPZhUqrNJDBBSwuyQC8lG7136lkwdHll2pckUfuU6DKXsVXowjvjCx8+whK4alBEfdeSR2/9/7oJn7b5WvyKvx4hWJbBdVQtrUsoS8ncWhj3hpYcK7ZJLmdxwW0+Jt7zWW8Wy0ndZoxKC8KEuFNN4qyv66A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+TEjPckDoDXws4ejLZVjPB24PHZnrMM3lyU1N2YitQE=; b=cSupsbdoiM+X8gREw7MYISB4Gc/kk1+zd+nH8V7LxASe+mfXftKtjO7WExCE7i6fyosYh697l3+wy6qFof9n49IEtpWXX+JiRZAx8tqNdF6plMpnZ3mAjDTizPp5owEakIbyM1Ul7EFHI3Syr532xhRmusDDzegfNaucwy5UjTg5PwmI+d3AIWjNmeBGhvzeKlng05WIZozxW3oGz1/puqq0eHmGkhNfMpaLRTfWwuhvEU4xFPYcPEfW2TJbpAu3tzr8Svpt1ozaTQBwAhdtAH+LoojCa7liSpV2D4aZvqjKSwSW3oMSZRrzxbmdwqEz+jgjqAr3KosbzEVt4mLM0A== Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) by PR3P250MB0051.EURP250.PROD.OUTLOOK.COM (2603:10a6:102:14e::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6933.23; Fri, 27 Oct 2023 03:09:14 +0000 Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::f59c:9cff:a42d:bde]) by AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::f59c:9cff:a42d:bde%3]) with mapi id 15.20.6907.030; Fri, 27 Oct 2023 03:09:14 +0000 Message-ID: Date: Fri, 27 Oct 2023 05:10:32 +0200 User-Agent: Mozilla Thunderbird To: ffmpeg-devel@ffmpeg.org References: <20231024150443.7438-1-michael@niedermayer.cc> <20231024150443.7438-2-michael@niedermayer.cc> Content-Language: en-US From: Andreas Rheinhardt In-Reply-To: <20231024150443.7438-2-michael@niedermayer.cc> X-TMN: [nr6POqBWxAIXQi0wOM/NhvLwc5Vga9kEGjChTfAitvo=] X-ClientProxiedBy: ZR0P278CA0120.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:20::17) To AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) X-Microsoft-Original-Message-ID: MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AS8P250MB0744:EE_|PR3P250MB0051:EE_ X-MS-Office365-Filtering-Correlation-Id: f50763cb-8ed8-4fd4-d6fd-08dbd69a1986 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: jYAAq+9G+pwlp0TiOoAUJOKzeITS/sW9J2pTye+pIEgzjK6RNLk18QlcBLshea0ir0wwf33A192607uCVrX3Nlxze2qS5UopyLYMgTfYsRy4RJKU26ujY77IdKbrVm+rmtxcBksynRwsE++Qe0PGoBJ88WZcEXqi5hiXzHO08YYlLntp9RwDZ6ifWefxRUocrsNk3qotCoJZZBVMub9Ibw1pJHeX+zNNpw7+tDKNCyHgT9/bk/I2KQ90q2IM4TbMU7kQD0OSTNT1I+Q9wST23EYnUpWSuD7kpGrTogd40qeB3S4xQ7CPtUJ8Vstb3M5XSjFmtrghzPc0zXo2o1sKuGppsodmBCFeB5OWl6xWFq3LNNCYD726aAUYhtnYldnuv0MCmEYaO0aZP9nMFVI5/RM6fP4opbkQw2TIocUjMamsOZeOAqSUM2E5DXgYCZAlVN0SAUPwf9+Ss9Z6Jwzr3sztTAfiqnkzlTr49ML/e1pi8QOvg8Shi/3UMKU8J4Xd/OIa581d7xFwS+uG6LuW56SqbCvjGZGbZePDk6eeW6b7IiTtxQVO/cka3rxuYk+246In/hBP46OJNEzeD9ES2TpA0y2cpO0lTe3Mkji60p5hyIQZZSfMnz9wugTaklsT95yYkBqdR7Q5m0W0XTtMFg== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?TFcvbk1yYmdYck1FNXltT1g3WlNaZDBVU2d0L3BuS205aTJkazI2ZThzVUhs?= =?utf-8?B?akQrNnRFTVZmR2FjaElweWJhT1N0MlVsT2htVXRkcGwrZURQQzhjS1ROQlAx?= =?utf-8?B?dWtNb2FBRjYzL2RMRSt3dVZ4bTNSYitZK2Q2V2Y0Qi9raEY3clZhZ2tRbFlP?= =?utf-8?B?T1FMZis4dnQycm1XYlQzOHEyWWhoeE5IVHNmUWF4WjYwK29SeDZTTUYvaTlC?= =?utf-8?B?bXdjM0VkeUpMSHhCR2VsNjQzVFFYZytQdXFjSlNrMW5jajZsRDBNT1NFQ283?= =?utf-8?B?S1doZ05Pbm1YN1d0ZFlrM0hxV0ltUTJkK3hyaCtGQ0xtRUp5SmRGRXlZTlc0?= =?utf-8?B?WTJ3dU80SW9uQW9DR0loYVdpNS9tbDVyWE1zay9NOXVVdnMwVTcrbDJCOFpw?= =?utf-8?B?YmpjTVFTU1VvOGtPcGkzSXVEOFU5aHpmM2Q0NUl4d3NkSmVzVUVDb3dnOG1p?= =?utf-8?B?akpwUlZ4UFFQclhjSXZGbmVBd1Z3UGRrM3dxSnhSbWJqTWlGUWE2bmpjQ0R1?= =?utf-8?B?OXNDWFpBczZnTlVXeW9XMTNoUVduOVdFSFVyL3BXUnAyM1lCckdOMkQwTjQ4?= =?utf-8?B?eUhZMzBXWmdCb1NxU1NvZFdHZUVaMUVpNzMxT2NaZVUybXhUVGhxVE1mV1dQ?= =?utf-8?B?NXVXNjdtcFV6WnZGNkZweWs2ZmtRcWY0cnF1K1VKbGxlRUxrRkZjUjB2T2c0?= =?utf-8?B?MXYxLy8rM1M2OWVaZSs1MkVRM3h1TUtsTFkvRjV3SW5UcGRjSTB4UXdWOUx3?= =?utf-8?B?THVWcHJOakh6dVJvaGV2Uzkwc21uVWZYREh0a1d3V25GSVZhS3NZU1VaZjdU?= =?utf-8?B?ZWdRTzNYalR3ZisyWTZmc2RMZ1hSb1c5Z1VvWnJFQTVOcUoxOFJQVzBmOFd4?= =?utf-8?B?YTdMRzFKYWRDZHJrQmU0VzFTNlBrK0IvWHJjY1libm5TRFRwSmI3cVl5U25z?= =?utf-8?B?SlE5SEMvdzhGWVJtYzZXbUN2R0F0T1E5Z1dWM2c0QllMMkErampUQWdtV3Rt?= =?utf-8?B?NXFNYkF0dnd0UHYyL3lVZXVNdmNmcEVQOTRUd2pIRy9ibm1UbGxqUzhZVkRB?= =?utf-8?B?S1NNaWJNL0ZMV05Eb1Q0bjEzSHVVZWRTR1NPVDNZbEdaay9FeWIvclN4UmMz?= =?utf-8?B?dUdQaHFRNFlNMzQrVHFzSkNPRlBmZ2xKd054aUZmME11TXFGWHdNY1p5WWsy?= =?utf-8?B?K2d4WmFTVTF1OXNpV2VxVHplSGkyWWRtUjZFanNrQSs2a2laWDY4YzJnT1JG?= =?utf-8?B?bHc5c2RKVjZyMzhUU3dSZk1xa0pxd1ZPMHlkWjZkOTNBYUtOQVpmdDNzYlJr?= =?utf-8?B?Y1dVVzBnZ2ZQOEk4MC8vb2trcGcyRVFUcmJRM0pnU3BqU3YrNDdSUVJ6N1Vy?= =?utf-8?B?SXlBN2NZalc0VVFUcUk0bWUrUXFZVE9URyttSmNFRkVlSENNcUdTTlJWNGlM?= =?utf-8?B?TXh6SU1PTkdLZW42WnRLdWFNL2ZxY2pheUlRRmVZanlSUmE3eUpBdWVDNzFl?= =?utf-8?B?MDRxTVlTbElUN2Q2L0h2Y3BJWS9zdFh2cjV1M3hIdzZpV1BkbEhCd2xkUGRV?= =?utf-8?B?OGRoOUthNk5DTkxBeGtrcDllNzdpcm9jRWg1WEdBaFptQzFDQmNSczl3aGJM?= =?utf-8?B?RVdva2xtMmhMV2dueWtyeVdnKzkwMlR4dDhCRUpPQk01ZVZDWkJYTEpjWHJD?= =?utf-8?Q?3hAKArgg/GZdr8xkKmIz?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: f50763cb-8ed8-4fd4-d6fd-08dbd69a1986 X-MS-Exchange-CrossTenant-AuthSource: AS8P250MB0744.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Oct 2023 03:09:14.7344 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3P250MB0051 Subject: Re: [FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Michael Niedermayer: > Signed-off-by: Michael Niedermayer > --- > libavcodec/get_bits.h | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h > index cfcf97c021c..86cea00494a 100644 > --- a/libavcodec/get_bits.h > +++ b/libavcodec/get_bits.h > @@ -581,8 +581,12 @@ static inline const uint8_t *align_get_bits(GetBitContext *s) > n = table[index].len; \ > \ > if (max_depth > 1 && n < 0) { \ > - LAST_SKIP_BITS(name, gb, bits); \ > - UPDATE_CACHE(name, gb); \ > + if (av_builtin_constant_p(bits <= MIN_CACHE_BITS/2) && bits <= MIN_CACHE_BITS/2) { \ > + SKIP_BITS(name, gb, bits); \ > + } else { \ > + LAST_SKIP_BITS(name, gb, bits); \ > + UPDATE_CACHE(name, gb); \ > + } \ > \ > nb_bits = -n; \ > \ This is problematic: The GET_VLC macro does not presume that MIN_CACHE_BITS are available; there is code that directly uses GET_VLC instead of get_vlc2(). I had the same idea when I made my VLC patchset, yet I wanted to first apply it (which I forgot). While investigating the above issue, I found out that all users of GET_VLC always call UPDATE_CACHE immediately before GET_VLC, so UPDATE_CACHE should be moved into GET_VLC; furthermore, no user of GET_VLC relies on the reloads inside of GET_VLC. The patches for this are here: https://github.com/mkver/FFmpeg/commits/vlc Shall I send them? Notice that making GET_VLC more standalone enables improvements over the current approach; yet it will not lead to optimal code: E.g. the VLCs in decode_alpha_block() in speedhqdec.c are so short that one could read both VLCs with only one UPDATE_CACHE(); another example is mjpegdec.c which currently does this: GET_VLC(code, re, &s->gb, s->vlcs[1][ac_index].table, 9, 2); i += ((unsigned)code) >> 4; code &= 0xf; if (code) { if (code > MIN_CACHE_BITS - 16) UPDATE_CACHE(re, &s->gb); { int cache = GET_CACHE(re, &s->gb); int sign = (~cache) >> 31; level = (NEG_USR32(sign ^ cache,code) ^ sign) - sign; } LAST_SKIP_BITS(re, &s->gb, code); Because of the reloads in GET_VLC, there will always be at least MIN_CACHE_BITS - 9 (= 16) bits available after GET_VLC, so one can read code (<= 15) bits without updating the cache at all (16 in MIN_CACHE_BITS - 16 is the maximum length of a VLC code used here); this will no longer be possible with this optimization. Btw: I am surprised that there is a branch before UPDATE_CACHE instead of an unconditional UPDATE_CACHE. I also do not really see why this uses these macros directly and not the functions. Given my objection to your patch #1, magicyuv will not benefit from this; a different approach (see https://github.com/mkver/FFmpeg/commit/9b5a977957968c0718dea55a5b15f060ef6201dc) is to add a get_vlc() that uses the nb of bits used to create the VLC and a compile-time upper bound for the maximum length of a VLC code as parameters instead of the maximum depth of the VLC. Reading VLCs for the cached bitstream reader can btw also be improved: https://github.com/mkver/FFmpeg/commit/fba57506a9cf6be2f4aa5eeee7b10d54729fd92a - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".