From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 175F747A73 for ; Sun, 28 Jul 2024 03:20:38 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7FB1D68D7A1; Sun, 28 Jul 2024 06:19:29 +0300 (EEST) Received: from APC01-TYZ-obe.outbound.protection.outlook.com (mail-tyzapc01olkn2077.outbound.protection.outlook.com [40.92.107.77]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BADC268D7CC for ; Sun, 28 Jul 2024 06:19:25 +0300 (EEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=uWIa9MeoXwbE54Wy8Q61xuHZrXXn/tP3/e050j3dNoK4b5volB+ZHBPP1775+CvuBFo0dgmQYXX5ZDsUo0x80UogUGyzqzd2OIgA5YWbfy4h2t3Y1NbcUB4CH5as7Z/YjnXrjl8/+QXSQ+cTZQGzYzdk4XqbQgeTXcDQDXQA/JhTFVLshKy/88YZn1Vpj4kwH5h/hsZE2KnYlnyZMouLznWlVqSTmY6YgAjCr9jFuM8vyu2miDGGNeUOqZ6wdlNpgpc/8AIsOOuypAnWOPfZhgYvtaLFGMxO1ysNI3u/J+gXRbRipLowRQGqZ3qsgb+mega5mAYgGOiHHInpJrINKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pMTDBR1Y9Tgm+jKrhjjhG9NipneFI83um1ZnJrlhwes=; b=TeraWrPoGRA/Ijh83lGbxavwifu5RnMtPbu6kp+nJR+Lq+shiT0kz+iiDtptzkZ35BQGCoOHylkEJVlW4t1HzMHbOIFIogzRy89QYtkO7OFkLOHuebACMAk+ojYfKaKV15jrfCoAow6hfZT1khM7alGlj2z9Drp2R9JWxjhZ/E8N3dDOpoga9ZNDf+V8JyOmfkYz1Fq0AZQCIhK1nB62g2fu621gAjPXyjadChqsm7rcO5ecKaPucoM1V88A1JkR8xlB4XeA3O0FUmGvrz1PIbV102TPUM+SaYDCu5eOVA2P/cnkn+b/btEoUPRDk/wi2NaKnQ+vTZ/tZPuC8CKIsA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pMTDBR1Y9Tgm+jKrhjjhG9NipneFI83um1ZnJrlhwes=; b=tLH+9JdWontAR/81qxM8KZupgXDjKrg9lMGVr8E7MOwH10kjCCDz9Y4MPVRTAcIaCQW5IM4aJlV1vazFU/zYbBPvnXoo6wQ/bPg9+wEdGFpbE9WJTNurO/s9UeyAzdGWd5cy1iHCdfCyhqjQ6JbOMz08Vj1RYTAc7uX8Znz96tOVNW6bgv4Tz2M5w8SLFBxpzqAa6ro4E0tT/pfpo1V2k0+ms98hcZoWfdLw899HQNBERXTNGW5wJUh3P7y7j6yti3V3yV7EKyni66zEvMRiCFEAwkSNfaEYL3SDRdu1UzoelHHZGu3GDE8fNxCnjPPSwQqBNiDBNBsQ77mjd9oemw== Received: from TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) by PUZPR06MB5747.apcprd06.prod.outlook.com (2603:1096:301:f4::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.28; Sun, 28 Jul 2024 03:19:11 +0000 Received: from TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca]) by TYSPR06MB6433.apcprd06.prod.outlook.com ([fe80::81f7:9125:583a:1cca%3]) with mapi id 15.20.7784.020; Sun, 28 Jul 2024 03:19:11 +0000 From: Nuo Mi To: ffmpeg-devel@ffmpeg.org Date: Sun, 28 Jul 2024 11:18:07 +0800 Message-ID: X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240728031807.462810-1-nuomi2021@gmail.com> References: <20240728031807.462810-1-nuomi2021@gmail.com> X-TMN: [jE548/LNeR9N3i85JDcAKMVXAkiI4FmZ] X-ClientProxiedBy: SI1PR02CA0016.apcprd02.prod.outlook.com (2603:1096:4:1f4::17) To TYSPR06MB6433.apcprd06.prod.outlook.com (2603:1096:400:47a::6) X-Microsoft-Original-Message-ID: <20240728031807.462810-11-nuomi2021@gmail.com> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 2 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: TYSPR06MB6433:EE_|PUZPR06MB5747:EE_ X-MS-Office365-Filtering-Correlation-Id: 6e087e1f-20df-44ab-015b-08dcaeb40ca5 X-Microsoft-Antispam: BCL:0; ARA:14566002|461199028|8060799006|19110799003|5072599006|3412199025|440099028|1710799026; X-Microsoft-Antispam-Message-Info: CSxSGoyjAOWAksP5v5W/oJjMhZI4Ud2icLgihiB6gu1atQxgNO8nt+j6kkrNdtZeahHKpqqYWlkjdrSxYRZOQrSV7SmFCi08pJcv7cE+vAq2ouB1eHXqInS+wQsmkfI4tiWW5euZUY3uYyTX5HXv6l2OxziOKtM6iITbLtiL3RjHEEuQvjmXVLX/+Wu8MW3ub9zyWt/02CqLJ6b36ifSalDQgJoUhxylMF0zCwsfzIYigdlUBYDNHVKyh+ugLTT1LM+kswcWZAQL/f8//sfSedSqDLXT8+EqpkAxkRrN8ZiGiY0cqBW5e56RBwa6qenU4KWZB0rIx14nHhsR7/Hz9PQgAV0V/NzfsacyIhjEQAjU6xwkVa6icBOAnDPH38ohim+phZcejL/VU1Y0mE7PWF9FvQ6a0epsVDaSgdPmJneNXcQjUdiSF58haoeUuN3A5QHNoBjtFTPIS+R4KU+4TaXaCucYRvgat6jEXAzpr6pIf5JTNG/wHENcQk8aD5+17X9pLqK0ORZQ7192k/JCmWuRYZPUC0Mg6awu0CNeiu/hZskYk2xlj1zCARlogvWanmGo+vabb/qOGZjjd2X9u5THSYXVa0Cx2Lgi/uz+XQmbB2vgtP0LiLBCVb4ffsuyFBnygG7u1QbIofSv3CXk2wauButdUqyjf1dAjOlZaGMwADPxelTUg4UXAHnp8P2Rc/G23QKnnkxQVqDvOfOX4Q== X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?iXci2v+lDOJCEsTOvqBZYRkmBNsAX5CZvhsJBrBSdUdDraSHjK2Ig8W1zE+O?= =?us-ascii?Q?baOVRYd3D8krOEQHhMLPPia3kEWNy1DUiMlCwH3JRym0yfN75E2EdFmyiAhE?= =?us-ascii?Q?TCKeK97iPajp721PpPZEvM1asfZ8D7+IrNgU9DFPvyPnPjSQlzFMH4PQBgN8?= =?us-ascii?Q?6tu9kNDZq/eIJ5yQokze7Kn6uuK7v9kl0S1e6TsHeOpm5H2mOW0PlLstEDHd?= =?us-ascii?Q?cxF4VKY7GRraEKz4NuWR7bRoCz0hrSH072Jry3zP3afTlUlaJwDtplUObktO?= =?us-ascii?Q?pv+FAiDt2Zq2g56fQU+6iPhuztZR9zjNjEDuJwksO+1PRxANZopcxgPyisDn?= =?us-ascii?Q?Z6fUnfrvPUI3M2EEC0nWsFlhvoOq+6IbU4M08WAXmAA99opkCuiS2K43rEBa?= =?us-ascii?Q?ayWl9zO/xSPhvy34Jm/6aJjRgER3qxiDQzbVyu1ZsOTMUT+sLTSCx7DY2v1p?= =?us-ascii?Q?ooX4WoClLe2HIbuEmtlxLEarT9hG8dHxOwCslqU8AHL6l1/KtJyuB+4+QJbq?= =?us-ascii?Q?m0P0d5HOT7I7yBZrocaJfQpSbLuwP8dpix1XYCyecEk9/I6NkQEyStIvg6bZ?= =?us-ascii?Q?+p2Uw1dkYTpcbH8AFoLdW7OmbUlqRkGoSwjbOQBms8S4Kra5hrGI6ekUNpcC?= =?us-ascii?Q?u3eK0LqlSvZGEUD5+yaQjNgQYK4OjYU5huMO/OAKBacAes1cCJdZoCLx5fvX?= =?us-ascii?Q?6kJ/kjM4I6lblRK6NpjiHJetNU86qarWFK8pOWQiqUdnMzUfzxQnZLVQS+Ne?= =?us-ascii?Q?kFFdiRKObH/KbHx5dZIDC6zH4gku4M7zvDNWyNP/VDll3HWrNCVWYc7uTS5r?= =?us-ascii?Q?9HJEjfFGdnaedO/PocZyB1Hg3sqd6H4bCP4gHth8HIenZ4Ls/yH74TiNvxKI?= =?us-ascii?Q?UbUGJSBeMncHI2cOSUQm7mD315DWSuzd1C+qHzCEWSYfdKHgsvU2EoNVjFwe?= =?us-ascii?Q?l6o3iAl7nLMMfUZdJycDCwuAhp+8mNNKPYfzYF8ogkOcH2POIQ+zUCU+4pHS?= =?us-ascii?Q?RkN+KJg2iZqOKyBiklYa4jmsM2cbml3r2icQOMyDq5in4KwnCQps823xGLDA?= =?us-ascii?Q?L8+w8ST8mliRKJwnqlvxsiEttCmZE/XuX3x0wkmNZU3UcWvT0rPnEQYahzCz?= =?us-ascii?Q?h6BPfqvqI3DTsGsJdKcGYjWlNe9mElJ2TS0woS1p4pP6ONwZGNVgjWEN+HHl?= =?us-ascii?Q?HlfqyxuzDu9eZd/gTxwePdLUZZmV+7Egis6XvC7fj+g1Flb5W6Fj57pH1Ozd?= =?us-ascii?Q?Nb16w4bhMvLSfkUwUNxM2Yf+iEsgbZHwdMY1fmtASA=3D=3D?= X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6e087e1f-20df-44ab-015b-08dcaeb40ca5 X-MS-Exchange-CrossTenant-AuthSource: TYSPR06MB6433.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jul 2024 03:19:11.1020 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PUZPR06MB5747 Subject: [FFmpeg-devel] [PATCH 11/11] avcodec/vvcdec: move frame tab memset from the main thread to worker threads X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Nuo Mi Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: memset tables in the main thread can become a bottleneck for the decoder. For example, if it takes 1% of the processing time for one core, the maximum achievable FPS will be 100. Move the memeset to worker threads will fix the issue. --- libavcodec/vvc/dec.c | 13 ++++- libavcodec/vvc/thread.c | 122 ++++++++++++++++++++++++---------------- libavcodec/vvc/thread.h | 1 + 3 files changed, 85 insertions(+), 51 deletions(-) diff --git a/libavcodec/vvc/dec.c b/libavcodec/vvc/dec.c index 575bcfa33d..d34713296d 100644 --- a/libavcodec/vvc/dec.c +++ b/libavcodec/vvc/dec.c @@ -82,7 +82,13 @@ static int tl_create(TabList *l) if (!*t->tab) return AVERROR(ENOMEM); } - } else if (l->zero) { + } + return 0; +} + +static int tl_zero(TabList *l) +{ + if (l->zero) { for (int i = 0; i < l->nb_tabs; i++) { Tab *t = l->tabs + i; memset(*t->tab, 0, t->size); @@ -404,6 +410,11 @@ static int pic_arrays_init(VVCContext *s, VVCFrameContext *fc) return 0; } +int ff_vvc_per_frame_init(VVCFrameContext *fc) +{ + return frame_context_for_each_tl(fc, tl_zero); +} + static int min_positive(const int idx, const int diff, const int min_diff) { return diff > 0 && (idx < 0 || diff < min_diff); diff --git a/libavcodec/vvc/thread.c b/libavcodec/vvc/thread.c index 28065d726f..74f8e4e9d0 100644 --- a/libavcodec/vvc/thread.c +++ b/libavcodec/vvc/thread.c @@ -40,6 +40,7 @@ typedef struct ProgressListener { } ProgressListener; typedef enum VVCTaskStage { + VVC_TASK_STAGE_INIT, // for CTU(0, 0) only VVC_TASK_STAGE_PARSE, VVC_TASK_STAGE_INTER, VVC_TASK_STAGE_RECON, @@ -175,10 +176,14 @@ static int task_has_target_score(VVCTask *t, const VVCTaskStage stage, const uin uint8_t target = 0; VVCFrameContext *fc = t->fc; + if (stage == VVC_TASK_STAGE_INIT) + return 1; + if (stage == VVC_TASK_STAGE_PARSE) { - const H266RawSPS *rsps = fc->ps.sps->r; - const int wpp = rsps->sps_entropy_coding_sync_enabled_flag && !is_first_row(fc, t->rx, t->ry); - target = 2 + wpp - 1; //left parse + colocation + wpp - no previous stage + const H266RawSPS *rsps = fc->ps.sps->r; + const int wpp = rsps->sps_entropy_coding_sync_enabled_flag && !is_first_row(fc, t->rx, t->ry); + const int no_prev_stage = t->rs > 0; + target = 2 + wpp - no_prev_stage; //left parse + colocation + wpp - no_prev_stage } else if (stage == VVC_TASK_STAGE_INTER) { target = atomic_load(&t->target_inter_score); } else { @@ -399,6 +404,55 @@ static int task_priority_higher(const AVTask *_a, const AVTask *_b) return a->ry < b->ry; } +static void check_colocation(VVCContext *s, VVCTask *t) +{ + const VVCFrameContext *fc = t->fc; + + if (fc->ps.ph.r->ph_temporal_mvp_enabled_flag || fc->ps.sps->r->sps_sbtmvp_enabled_flag) { + VVCFrame *col = fc->ref->collocated_ref; + const int first_col = t->rx == fc->ps.pps->ctb_to_col_bd[t->rx]; + if (col && first_col) { + //we depend on bottom and right boundary, do not - 1 for y + const int y = (t->ry << fc->ps.sps->ctb_log2_size_y); + add_progress_listener(col, &t->col_listener, t, s, VVC_PROGRESS_MV, y); + return; + } + } + frame_thread_add_score(s, fc->ft, t->rx, t->ry, VVC_TASK_STAGE_PARSE); +} + +static void submit_entry_point(VVCContext *s, VVCFrameThread *ft, SliceContext *sc, EntryPoint *ep) +{ + const int rs = sc->sh.ctb_addr_in_curr_slice[ep->ctu_start]; + VVCTask *t = ft->tasks + rs; + + frame_thread_add_score(s, ft, t->rx, t->ry, VVC_TASK_STAGE_PARSE); +} + +static int run_init(VVCContext *s, VVCLocalContext *lc, VVCTask *t) +{ + VVCFrameContext *fc = lc->fc; + VVCFrameThread *ft = fc->ft; + const int ret = ff_vvc_per_frame_init(fc); + + if (ret < 0) + return ret; + + for (int i = 0; i < fc->nb_slices; i++) { + SliceContext *sc = fc->slices[i]; + for (int j = 0; j < sc->nb_eps; j++) { + EntryPoint *ep = sc->eps + j; + for (int k = ep->ctu_start; k < ep->ctu_end; k++) { + const int rs = sc->sh.ctb_addr_in_curr_slice[k]; + VVCTask *t = ft->tasks + rs; + check_colocation(s, t); + } + submit_entry_point(s, ft, sc, ep); + } + } + return 0; +} + static void report_frame_progress(VVCFrameContext *fc, const int ry, const VVCProgress idx) { @@ -547,6 +601,7 @@ static int run_alf(VVCContext *s, VVCLocalContext *lc, VVCTask *t) #define VVC_THREAD_DEBUG #ifdef VVC_THREAD_DEBUG const static char* task_name[] = { + "INIT", "P", "I", "R", @@ -567,6 +622,7 @@ static void task_run_stage(VVCTask *t, VVCContext *s, VVCLocalContext *lc) VVCFrameThread *ft = fc->ft; const VVCTaskStage stage = t->stage; static const run_func run[] = { + run_init, run_parse, run_inter, run_recon, @@ -726,7 +782,7 @@ int ff_vvc_frame_thread_init(VVCFrameContext *fc) for (int rs = 0; rs < ft->ctu_count; rs++) { VVCTask *t = ft->tasks + rs; - task_init(t, VVC_TASK_STAGE_PARSE, fc, rs % ft->ctu_width, rs / ft->ctu_width); + task_init(t, rs ? VVC_TASK_STAGE_PARSE : VVC_TASK_STAGE_INIT, fc, rs % ft->ctu_width, rs / ft->ctu_width); } memset(&ft->row_progress[0], 0, sizeof(ft->row_progress)); @@ -745,59 +801,25 @@ fail: return AVERROR(ENOMEM); } -static void check_colocation(VVCContext *s, VVCTask *t) -{ - const VVCFrameContext *fc = t->fc; - - if (fc->ps.ph.r->ph_temporal_mvp_enabled_flag || fc->ps.sps->r->sps_sbtmvp_enabled_flag) { - VVCFrame *col = fc->ref->collocated_ref; - const int first_col = t->rx == fc->ps.pps->ctb_to_col_bd[t->rx]; - if (col && first_col) { - //we depend on bottom and right boundary, do not - 1 for y - const int y = (t->ry << fc->ps.sps->ctb_log2_size_y); - add_progress_listener(col, &t->col_listener, t, s, VVC_PROGRESS_MV, y); - return; - } - } - frame_thread_add_score(s, fc->ft, t->rx, t->ry, VVC_TASK_STAGE_PARSE); -} - -static void submit_entry_point(VVCContext *s, VVCFrameThread *ft, SliceContext *sc, EntryPoint *ep) -{ - const int rs = sc->sh.ctb_addr_in_curr_slice[ep->ctu_start]; - VVCTask *t = ft->tasks + rs; - - frame_thread_add_score(s, ft, t->rx, t->ry, VVC_TASK_STAGE_PARSE); -} - int ff_vvc_frame_submit(VVCContext *s, VVCFrameContext *fc) { VVCFrameThread *ft = fc->ft; - // We'll handle this in two passes: - // Pass 0 to initialize tasks with parser, this will help detect bit stream error - // Pass 1 to shedule location check and submit the entry point - for (int pass = 0; pass < 2; pass++) { - for (int i = 0; i < fc->nb_slices; i++) { - SliceContext *sc = fc->slices[i]; - for (int j = 0; j < sc->nb_eps; j++) { - EntryPoint *ep = sc->eps + j; - for (int k = ep->ctu_start; k < ep->ctu_end; k++) { - const int rs = sc->sh.ctb_addr_in_curr_slice[k]; - VVCTask *t = ft->tasks + rs; - if (pass) { - check_colocation(s, t); - } else { - const int ret = task_init_parse(t, sc, ep, k); - if (ret < 0) - return ret; - } - } - if (pass) - submit_entry_point(s, ft, sc, ep); + for (int i = 0; i < fc->nb_slices; i++) { + SliceContext *sc = fc->slices[i]; + for (int j = 0; j < sc->nb_eps; j++) { + EntryPoint *ep = sc->eps + j; + for (int k = ep->ctu_start; k < ep->ctu_end; k++) { + const int rs = sc->sh.ctb_addr_in_curr_slice[k]; + VVCTask *t = ft->tasks + rs; + const int ret = task_init_parse(t, sc, ep, k); + if (ret < 0) + return ret; } } } + frame_thread_add_score(s, ft, 0, 0, VVC_TASK_STAGE_INIT); + return 0; } diff --git a/libavcodec/vvc/thread.h b/libavcodec/vvc/thread.h index 8ac59b2ecf..7b15dbee59 100644 --- a/libavcodec/vvc/thread.h +++ b/libavcodec/vvc/thread.h @@ -32,5 +32,6 @@ int ff_vvc_frame_thread_init(VVCFrameContext *fc); void ff_vvc_frame_thread_free(VVCFrameContext *fc); int ff_vvc_frame_submit(VVCContext *s, VVCFrameContext *fc); int ff_vvc_frame_wait(VVCContext *s, VVCFrameContext *fc); +int ff_vvc_per_frame_init(VVCFrameContext *fc); #endif // AVCODEC_VVC_THREAD_H -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".