From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id B8400474AB for ; Fri, 8 Sep 2023 09:57:09 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2029C68C80A; Fri, 8 Sep 2023 12:57:06 +0300 (EEST) Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03olkn2081.outbound.protection.outlook.com [40.92.58.81]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 97B7668C577 for ; Fri, 8 Sep 2023 12:56:59 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=lcSW8igX3VvPqa2EI+IOx4tAD/Cub1r+1mtzfJC90h54I0umSlSvjEgDtZaF9Zb8GkQc3nef5Wyu5kHywJtxBKClBcUmUqZ1CpGPzO6gTwmgNs1kmuPdKPjcjvEL0URjyMqHX0Yc1BxiLPm3vaJeQl4tOKtgcfs7h0rnsYdfVG4MEpRHJcVz7tUerefEL3wtzVPIHvEeugyAaQ5F7PGYT6DiA4LT/1ahe/VmWKxBmkgW0CHTqsCLHQbq1duh6e4RDLahacTYkfCHeLYUGjaJfFqGDvJKELlV4HPPFZKDjuXHWFGcUv4WiRLTznpCTaCUcys8wamjAt/7zWZCRx2Qdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=X8fPAON8AGzerBtI1DZU8ToUVucV2N4tHu4MIq2cdhM=; b=fWoBNkTWTjS6qOpuG+G/yyZLdtkax9bOh/bCuoTpK4082PsG99wIfClTeWd4J9LmdZ5/O6C2wGhSwDzfJfvIqYXCi1wXgpGs972CpKUl8m+dKBMbuxAQXlyHMNZX88Anvqnoetofoe9SfZGoQp2qUaiWg/pxkXoA6IAHuV/k7X4JIs8C0/ShYMr+4Ckh1nkOmVpsCJKCy5rKz6an+nsdUckIy+a5CtzFfCSNVJLxR7d1kD1MkC726PYzdsAK04/mza+upQvYjhYmaSlOBPWRJ9LObX9jFg+SAC9O1Zt+sp2zI+8y6SSpgKwE+qo/QujB0rlaLeiStwnxaPNm5MW/qQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=X8fPAON8AGzerBtI1DZU8ToUVucV2N4tHu4MIq2cdhM=; b=jBXiT4WP9NH6VoA7INAcErtZByTxi1o/x67SM0/ioIBwCUir6sf4i9RcWL9BPmV2NMyg8HH/MZsl99z/Sw2bR1XIEG4BjZ42wTHFbnhurL4D/z+re2ognz7xSqwtUSe4EfKeFvxRKCiO8WbpF35GvpqP6m/L/fAODpdMbQoxAb+sfo3der25Lma/vJzDJpLbHbECsjqFQgbM8CBt9FXLvWBIggML7DHLsejkKL/+PRaNa/I4r3wwMUoXyvrT1Jy6prfklhlqvj3AAZKa9fIgG7blF0blxF+Ep+IiKyaLOt/zGjgQxOPhz3MQwdno7UErrvd5KGnlMNiyiSjQFKptPw== Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) by PR3P250MB0386.EURP250.PROD.OUTLOOK.COM (2603:10a6:102:17d::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6768.30; Fri, 8 Sep 2023 09:56:57 +0000 Received: from AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::5e01:aea5:d3a8:cafa]) by AS8P250MB0744.EURP250.PROD.OUTLOOK.COM ([fe80::5e01:aea5:d3a8:cafa%3]) with mapi id 15.20.6768.029; Fri, 8 Sep 2023 09:56:57 +0000 Message-ID: Date: Fri, 8 Sep 2023 11:58:16 +0200 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20230908081508.510-1-christophe.gisquet@gmail.com> <20230908081508.510-3-christophe.gisquet@gmail.com> From: Andreas Rheinhardt In-Reply-To: X-TMN: [SiRHee5KH1qtYyQDzZZ4aIIPVY38bTY/] X-ClientProxiedBy: FR3P281CA0124.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:94::15) To AS8P250MB0744.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:541::14) X-Microsoft-Original-Message-ID: <8e23a41c-b0a7-37e2-0472-689a8c0431e9@outlook.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AS8P250MB0744:EE_|PR3P250MB0386:EE_ X-MS-Office365-Filtering-Correlation-Id: 7948e218-508e-4dfc-e94b-08dbb051f048 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: KMzCCIG8OsccR9bfN5FfGpZqO0w7iXZ6lzbuJfZQEeJnt6DWXSOWsg9fwT0zMcZkii9r2lTZ2G2XqJEEkJ3r4YHCLERJJrlrCQfiXMnk+SPSUymNLgEEz266VLeeA6xRpeEDa5d6LqWyaUUn+csnCbbIKfVSsqJgtuzYYw0EuS37eDisPyyvVyK+WDEaBCjtTpWPxXbu5PTwX1KYGIdt4uWU9jkv3nIDNnosJz9MnUauf8ZSFWBPHRH1lbGFLjdxvvzB/r7Bmp3kqmGsWHOitaU8O4nQkIc8yz+wZCqrXPolS5kPVP36aSP1KR7nHDXCsStUv9ykO3OW3jJ9O2HTij3lcPS+OTVBj7hVEqK6emEei3r/NMcB3k0MLqxczxBb0QxtU+v+64kLYjKgypmETmJG84sVHgyn6gdl8CFEmghcKPgrLjtLxT5nZS6ukN0bf7gAM1/0HjGfqzgmd4VZhhqJXhCMljtO2DPe2I2jO99XbN5xhHrQpjoOAGbnabZsVtgD16oP/LnJZ0hEQPVK1yma5dfKpRyEM+Tjf2Zz/B+XAfyeG+6piWZHwssv/G5c X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?WHA0Q0Fzc2EyQjBvZG8wUWs1RVpXem1DaGlMSHhmNWxGK3Q2UXczd1kvU00y?= =?utf-8?B?TUpjUkUvWWh0czBEZGI2clpmV05YL2o3TG80Ty95aHFDNU4rM1ZqeFlseUlK?= =?utf-8?B?RkVQUnB0T2FpYUF2UXNzc2J4WGYwR1ZVZ2hnWSs0Z2paZVlpSWgrYk1DOSti?= =?utf-8?B?bnBTa25CL215MmZtM0g5Z2dZYXpmZTkwMTFrdDQrQnVkL2JzNmR4Qm91NnVK?= =?utf-8?B?NlFSY3NEZWxod1AyZklEZVNhSXlVejJaRjZNM3laK1ZObHdEamhKNHpQSVAw?= =?utf-8?B?TkUxem4xTUFFMXVSRVUrcnBRRjRHS3UyeVhWSUcwNXF5OHhBbUhYMGswYzNt?= =?utf-8?B?K2xnbmRYb0xQRllDMEZDVHJ4VVJnMTFXOS8wcW9sNjZuRzNVYUFobjBUMTNh?= =?utf-8?B?NmkxYU03RmZ4NmY4STBPOXFGbmd5S3p1aVkrUXlDNkFaSUNCeU9rbGtma3lN?= =?utf-8?B?YTNQVkNuaC9KTWpicEFpQ0p6R1hTeS9RU0NGSldsK1lNMHFWTlQybHBYZHRK?= =?utf-8?B?NmU1cGE2NDYwOTBrUGc4OWFDMFlucXFpdjVKeXhIdElQd0pvVm5QSGhPbGor?= =?utf-8?B?VmFhTXA0MXpPbCtxWGNwQkUzS1orbHI4YXJYWjVDcVZnT3JncEJhcFJyT3Fj?= =?utf-8?B?OXI4MTdsRVBZQS9sMFk5MnhUTks0TVlVcGR3elB6WjZZeDhlWi8vSnpnaDF5?= =?utf-8?B?S1cwcHU0cDhXTE1BYmo5cktCMUxwTTJpVzVObzVaS2taUjZCQ1g1a2ZwVGJh?= =?utf-8?B?Mk1GOW05RE93R2k5V1FpOGhsM1ZaWjZqZFhOZVNPSmtuOVREMGwzNTcyYjFO?= =?utf-8?B?YWJIeHV0dk9TNzlsQXYyUUd3ZTE5NjRXZkF5TDI0WEhRdStYUUoyOEE2bFB1?= =?utf-8?B?Wm5VNUc3RXJkWFY2eThxbjZReUkyNUtXejZlVXJnSUllVnV5NTd1SVdEdXQr?= =?utf-8?B?a3hobmtvSzV2aCtrV2lORWx1azR3a1dWSVNoN1VVSkhNQzFlRWoxdi84OUNO?= =?utf-8?B?SDl5VXg4RUcyMGxLMHFRVHVRUHFiYUhBSHdQQUxBSjMrZnNJZlN5VW1JSXpK?= =?utf-8?B?czhaOGJVUVVtUThkVUVkSmJDd1A3dU91ZjhFNFZTQThDblRGZ2o4UTlFZUYz?= =?utf-8?B?QTRXUDZiZFVONng0TlZrRjlTNlJLU2hvelJYa1ZDYWJOVDV1RHBic0tuakRK?= =?utf-8?B?Q0N5UzFqWVBQSmNjWitBT0ZCSnBrTkhieHg0alRTY1IvQVNCLzkyeWMyZllp?= =?utf-8?B?RlJOc1JYMzEzaXJBS0psU0NwaS9USm9kS2pMc3NscXhDQXo1MnQxRU5ob010?= =?utf-8?B?NjR6Sng1REpJWG5GRWNnOUJ4MHpwck45eFBRWmJOZGdubUZLWHpqREh6TFJa?= =?utf-8?B?RXB1ZDlPd2FOOTZMWm1EZXFtYnQ0KzRwak5NdFJjRFlSZnQyV0NYTVdsM1A1?= =?utf-8?B?WVFTZFhGVG5PT0JJVVBUM1JHN1g2TnBGcDJPZ2hsd2JtbDNtejhIeUtPMi82?= =?utf-8?B?VDd6R3RPUW95WHhVTFpvT3ZWTjZSSXNVREF4Vm9XNGdVYmdlelV4UXVzRE95?= =?utf-8?B?TUxER0RWZENyRkZMZ21yck05eTVteGp5cU0wWXdlWElNd09SWUJhV3RRcHJH?= =?utf-8?B?TVBNeGpaWXQzQ1NHeG51MUMrVjlFQlJUZHN2b0F0bUcva3dSQkx3TW1ORWJS?= =?utf-8?Q?aj4x8hNpD5mFV9VPfIEk?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7948e218-508e-4dfc-e94b-08dbb051f048 X-MS-Exchange-CrossTenant-AuthSource: AS8P250MB0744.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Sep 2023 09:56:57.5441 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3P250MB0386 Subject: Re: [FFmpeg-devel] [PATCH 3/7] proresdec2: use VLC for level instead of EC switch X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Andreas Rheinhardt: > Christophe Gisquet: >> x86/x64: 61/52 -> 55/46 >> Around 7-10% speedup. >> >> Run and DC do not lend themselves to such changes, likely because >> their distribution is less skewed, and need larger average vlc read >> iterations. >> --- >> libavcodec/proresdec.h | 1 + >> libavcodec/proresdec2.c | 77 ++++++++++++++++++++++++++++++++++------- >> 2 files changed, 66 insertions(+), 12 deletions(-) >> >> diff --git a/libavcodec/proresdec.h b/libavcodec/proresdec.h >> index 1e48752e6f..7ebacaeb21 100644 >> --- a/libavcodec/proresdec.h >> +++ b/libavcodec/proresdec.h >> @@ -22,6 +22,7 @@ >> #ifndef AVCODEC_PRORESDEC_H >> #define AVCODEC_PRORESDEC_H >> >> +#define CACHED_BITSTREAM_READER 1 > > This should be in the commit switching to the cached bitstream reader. Correction: This header is included in videotoolbox.c and there is other stuff that also includes get_bits.h included in said file (and currently gets included before proresdec.h). This means that proresdec2.c and videotoolbox.c will have different opinions on what a GetBitContext is: It will be the non-cached one in videotoolbox.c and the cached one in proresdec2.c. This will work in practice, because ProresContext does not need the complete GetBitContext type at all (it does not contain a GetBitContext at all), so offsets are not affected. But it is nevertheless undefined behaviour and could become dangerous when using LTO. So you should switch the type of the pointer to BitstreamContextBE* in proresdec2.h. Furthermore, you can either include bitstream.h in proresdec.h or (IMO better) use a forward declaration and struct BitstreamContextBE* in the function pointer without including get_bits.h in the header at all. > >> #include "get_bits.h" >> #include "blockdsp.h" >> #include "proresdsp.h" >> diff --git a/libavcodec/proresdec2.c b/libavcodec/proresdec2.c >> index 65e8b01755..91c689d9ef 100644 >> --- a/libavcodec/proresdec2.c >> +++ b/libavcodec/proresdec2.c >> @@ -24,17 +24,17 @@ >> * Known FOURCCs: 'apch' (HQ), 'apcn' (SD), 'apcs' (LT), 'apco' (Proxy), 'ap4h' (4444), 'ap4x' (4444 XQ) >> */ >> >> -#define CACHED_BITSTREAM_READER 1 >> +//#define DEBUG >> >> #include "config_components.h" >> >> #include "libavutil/internal.h" >> #include "libavutil/mem_internal.h" >> +#include "libavutil/thread.h" >> >> #include "avcodec.h" >> #include "codec_internal.h" >> #include "decode.h" >> -#include "get_bits.h" >> #include "hwaccel_internal.h" >> #include "hwconfig.h" >> #include "idctdsp.h" >> @@ -129,8 +129,64 @@ static void unpack_alpha_12(GetBitContext *gb, uint16_t *dst, int num_coeffs, >> } >> } >> >> +#define AC_BITS 12 >> +#define PRORES_LEV_BITS 9 >> + >> +static const uint8_t ac_info[] = { 0x04, 0x0A, 0x05, 0x06, 0x28, 0x4C }; >> +static VLC ac_vlc[6]; >> + >> +static av_cold void init_vlcs(void) >> +{ >> + int i; >> + for (i = 0; i < sizeof(ac_info); i++) { > > FF_ARRAY_ELEMS() is cleaner; also we support and prefer declarations > inside for-loops: for (int i = 0; > >> + uint32_t ac_codes[1<> + uint8_t ac_bits[1<> + unsigned int rice_order, exp_order, switch_bits, switch_val; >> + int ac, max_bits = 0, codebook = ac_info[i]; >> + >> + /* number of prefix bits to switch between Rice and expGolomb */ >> + switch_bits = (codebook & 3); >> + rice_order = codebook >> 5; /* rice code order */ >> + exp_order = (codebook >> 2) & 7; /* exp golomb code order */ >> + >> + switch_val = (switch_bits+1) << rice_order; >> + >> + // Values are actually transformed, but this is more a wrapping >> + for (ac = 0; ac <1<> + int exponent, bits, val = ac; >> + unsigned int code; >> + >> + if (val >= switch_val) { >> + val += (1 << exp_order) - switch_val; >> + exponent = av_log2(val); >> + bits = exponent+1+switch_bits-exp_order/*0*/ + exponent+1/*val*/; >> + code = val; >> + } else if (rice_order) { >> + bits = (val >> rice_order)/*0*/ + 1/*1*/ + rice_order/*val*/; >> + code = (1 << rice_order) | val; >> + } else { >> + bits = val/*0*/ + 1/*1*/; >> + code = 1; >> + } >> + if (bits > max_bits) max_bits = bits; >> + ac_bits [ac] = bits; >> + ac_codes[ac] = code; >> + } >> + >> + ff_free_vlc(ac_vlc+i); > > This is unnecessary, as the VLC is initially blank and is not > initialized multiple times. > >> + >> + if (init_vlc(ac_vlc+i, PRORES_LEV_BITS, 1<> + ac_bits, 1, 1, ac_codes, 4, 4, 0) < 0) { >> + av_log(NULL, AV_LOG_ERROR, "Error for %d(0x%02X), max bits %d\n", >> + i, codebook, max_bits); >> + break; //return AVERROR_BUG; > > This is not how you initialize a static table (you miss the > INIT_VLC_USE_NEW_STATIC flag and don't set the static store buffer). > Search for INIT_VLC_STATIC_OVERLONG for an idea of how to do it. > >> + } >> + } >> +} >> + >> static av_cold int decode_init(AVCodecContext *avctx) >> { >> + static AVOnce init_static_once = AV_ONCE_INIT; >> int ret = 0; >> ProresContext *ctx = avctx->priv_data; >> uint8_t idct_permutation[64]; >> @@ -184,6 +240,9 @@ static av_cold int decode_init(AVCodecContext *avctx) >> >> ctx->pix_fmt = AV_PIX_FMT_NONE; >> >> + // init dc_tables >> + ff_thread_once(&init_static_once, init_vlcs); >> + >> if (avctx->bits_per_raw_sample == 10){ >> ctx->unpack_alpha = unpack_alpha_10; >> } else if (avctx->bits_per_raw_sample == 12){ >> @@ -510,7 +569,7 @@ static av_always_inline int decode_dc_coeffs(GetBitContext *gb, int16_t *out, >> return 0; >> } >> >> -// adaptive codebook switching lut according to previous run/level values >> +// adaptive codebook switching lut according to previous run values >> static const char run_to_cb[16][4] = { >> { 2, 0, -1, 1 }, { 2, 0, -1, 1 }, { 1, 0, 0, 0 }, { 1, 0, 0, 0 }, { 0, 0, 1, -1 }, >> { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, { 1, 1, 1, 0 }, >> @@ -518,12 +577,6 @@ static const char run_to_cb[16][4] = { >> { 0, 2, 3, -4 } >> }; >> >> -static const char lev_to_cb[10][4] = { >> - { 0, 0, 1, -1 }, { 2, 0, 0, -1 }, { 1, 0, 0, 0 }, { 2, 0, -1, 1 }, { 0, 0, 1, -1 }, >> - { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, { 0, 1, 2, -2 }, >> - { 0, 2, 3, -4 } >> -}; >> - >> static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContext *gb, >> int16_t *out, int blocks_per_slice) >> { >> @@ -540,8 +593,9 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex >> block_mask = blocks_per_slice - 1; >> >> for (pos = block_mask;;) { >> + static const uint8_t ctx_to_tbl[] = { 0, 1, 2, 3, 0, 4, 4, 4, 4, 5 }; >> + const VLC* tbl = ac_vlc + ctx_to_tbl[FFMIN(level, 9)]; >> unsigned int runcb = FFMIN(run, 15); >> - unsigned int levcb = FFMIN(level, 9); >> bits_rem = get_bits_left(gb); >> if (!bits_rem || (bits_rem < 16 && !show_bits(gb, bits_rem))) >> break; >> @@ -554,8 +608,7 @@ static av_always_inline int decode_ac_coeffs(AVCodecContext *avctx, GetBitContex >> return AVERROR_INVALIDDATA; >> } >> >> - DECODE_CODEWORD2(level, lev_to_cb[levcb][0], lev_to_cb[levcb][1], >> - lev_to_cb[levcb][2], lev_to_cb[levcb][3]); >> + level = get_vlc2(gb, tbl->table, PRORES_LEV_BITS, 3); >> level += 1; >> >> i = pos >> log2_block_count; > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".