From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 01822409E9 for ; Sat, 25 Dec 2021 05:49:36 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id D2B9068B08A; Sat, 25 Dec 2021 07:49:33 +0200 (EET) Received: from mail-vk1-f172.google.com (mail-vk1-f172.google.com [209.85.221.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 98CE968B04E for ; Sat, 25 Dec 2021 07:49:27 +0200 (EET) Received: by mail-vk1-f172.google.com with SMTP id o2so5840692vkn.0 for ; Fri, 24 Dec 2021 21:49:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=2r69kccKEcmTBqt8CPegbK8NlXcZ43nLXeTKO/Vam1c=; b=dGgLnlpbCqhfVorpMlDf5Mf+23KCufyXMQ+Z2ayZQg/okAieLpi3gnFZ3HeQttw+hp jSBmmdSLt8qBdHM1+yPxJLfBag0iZx5QBlFAzq3yS6AKkgxp/6HBReCrQgwW2qJGCtbW Dc9pOKFYd/WRC/IQJFI/KOwqv3fQiTvMlwCmw88uD53gBvFMkowi17COlb1hAU+x2grv WKTHw9VrjCtOuARxMHBEQ8Q5ejFcMqXKlQywcsA7p+srtudlj5YNnA2GHtvci6MwDNd8 EnitlSLIlhylMvz0oM0SkggM2EUCWpGf1CYQis8HDChXzObjqnYOl7sYpSZUTghj85c9 6wZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=2r69kccKEcmTBqt8CPegbK8NlXcZ43nLXeTKO/Vam1c=; b=FovJLoByDEyag/F9J3LOzLS/kre42R1wHIXTbSp5WgPyv7XtzvFYHxdpkDhJxiQtSt yeJR1Bf1MKPqp2vBgnRNm7hd4YklQeHFSK5eOJMBDVOh8FWvsXdHS2EhcEB1ST65i0u/ 4NrA3k5e95Ba7V9XBQMIdFPpwgUbRxX4b5i2daDoHzALKqtEVHeCqPpDxzuJ7NFIIwuj XVSQZT/syhA8gzCPxMFxYloXlQDArSvu9AeyUx6nl00ArxGZuj8N8iIEWxfztMozjY5b CLpSfawH6RKSGHlrI4EhXiptgoHc5vEfnxsnbmDy8Cpt7aVXVMpTcylGtotKP1CUMTTN ENbw== X-Gm-Message-State: AOAM532naRLfI7x8Lhv3ILpocjUWhJ+0BhGrB6+yNpoecWXS5FrUivzM 1yVSaK+qJwyUyYz8Lforp/enq+l7ysmSiLtRCYz97v+b X-Google-Smtp-Source: ABdhPJxT3GLb0XpYrMIs44DjaQtnw6DXisDKL1vVBdamou05Sj5Um+/cMUKW07O/DVC463syQZm7uVRYvP0FqqbyLR0= X-Received: by 2002:a1f:c9c5:: with SMTP id z188mr2844157vkf.6.1640411365568; Fri, 24 Dec 2021 21:49:25 -0800 (PST) MIME-Version: 1.0 References: <20211027085705.4114165-1-wenbin.chen@intel.com> <20211027085705.4114165-3-wenbin.chen@intel.com> <0100017deec0f420-2dd9f1a7-c38f-455f-8195-caf2195f7643-000000@email.amazonses.com> In-Reply-To: <0100017deec0f420-2dd9f1a7-c38f-455f-8195-caf2195f7643-000000@email.amazonses.com> From: Dennis Mungai Date: Sat, 25 Dec 2021 08:49:14 +0300 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH 3/3] libavcodec/vaapi_encode: Add async_depth to vaapi_encoder to increase performance X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Sat, 25 Dec 2021, 02:23 Ed Martin, wrote: > On 10/31/21 22:14, Chen, Wenbin wrote: > >> Add async_depth to increase encoder's performance. Reuse encode_fifo as > >> async buffer. Encoder puts all reordered frame to HW and then check > >> fifo size. If fifo < async_depth and the top frame is not ready, it will > >> return AVERROR(EAGAIN) to require more frames. > >> > >> 1080p transcoding (no B frames) with -async_depth=4 can increase 20% > >> performance on my environment. > >> The async increases performance but also introduces frame delay. > >> > >> Signed-off-by: Wenbin Chen > >> --- > >> libavcodec/vaapi_encode.c | 20 +++++++++++++++----- > >> libavcodec/vaapi_encode.h | 12 ++++++++++-- > >> 2 files changed, 25 insertions(+), 7 deletions(-) > >> > >> diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c > >> index db0ae136a1..616fb7c089 100644 > >> --- a/libavcodec/vaapi_encode.c > >> +++ b/libavcodec/vaapi_encode.c > >> @@ -1158,7 +1158,8 @@ static int > >> vaapi_encode_send_frame(AVCodecContext *avctx, AVFrame *frame) > >> if (ctx->input_order == ctx->decode_delay) > >> ctx->dts_pts_diff = pic->pts - ctx->first_pts; > >> if (ctx->output_delay > 0) > >> - ctx->ts_ring[ctx->input_order % (3 * ctx->output_delay)] = > pic->pts; > >> + ctx->ts_ring[ctx->input_order % > >> + (3 * ctx->output_delay + ctx->async_depth)] = > pic->pts; > >> > >> pic->display_order = ctx->input_order; > >> ++ctx->input_order; > >> @@ -1212,7 +1213,8 @@ int > >> ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) > >> return AVERROR(EAGAIN); > >> } > >> > >> - while (av_fifo_size(ctx->encode_fifo) <= MAX_PICTURE_REFERENCES * > >> sizeof(VAAPIEncodePicture *)) { > >> + while (av_fifo_size(ctx->encode_fifo) < > >> + MAX_ASYNC_DEPTH * sizeof(VAAPIEncodePicture *)) { > >> pic = NULL; > >> err = vaapi_encode_pick_next(avctx, &pic); > >> if (err < 0) > >> @@ -1234,6 +1236,14 @@ int > >> ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) > >> if (!av_fifo_size(ctx->encode_fifo)) > >> return err; > >> > >> + if (av_fifo_size(ctx->encode_fifo) < ctx->async_depth * > >> sizeof(VAAPIEncodePicture *) && > >> + !ctx->end_of_stream) { > >> + av_fifo_generic_peek(ctx->encode_fifo, &pic, sizeof(pic), > NULL); > >> + err = vaapi_encode_wait(avctx, pic, 0); > >> + if (err < 0) > >> + return err; > >> + } > >> + > >> av_fifo_generic_read(ctx->encode_fifo, &pic, sizeof(pic), NULL); > >> ctx->encode_order = pic->encode_order + 1; > >> > >> @@ -1252,7 +1262,7 @@ int > >> ff_vaapi_encode_receive_packet(AVCodecContext *avctx, AVPacket *pkt) > >> pkt->dts = ctx->ts_ring[pic->encode_order] - > ctx->dts_pts_diff; > >> } else { > >> pkt->dts = ctx->ts_ring[(pic->encode_order - > ctx->decode_delay) % > >> - (3 * ctx->output_delay)]; > >> + (3 * ctx->output_delay + > ctx->async_depth)]; > >> } > >> av_log(avctx, AV_LOG_DEBUG, "Output packet: pts %"PRId64" > >> dts %"PRId64".\n", > >> pkt->pts, pkt->dts); > >> @@ -2566,8 +2576,8 @@ av_cold int ff_vaapi_encode_init(AVCodecContext > >> *avctx) > >> } > >> } > >> > >> - ctx->encode_fifo = av_fifo_alloc((MAX_PICTURE_REFERENCES + 1) * > >> - sizeof(VAAPIEncodePicture *)); > >> + ctx->encode_fifo = av_fifo_alloc(MAX_ASYNC_DEPTH * > >> + sizeof(VAAPIEncodePicture *)); > >> if (!ctx->encode_fifo) > >> return AVERROR(ENOMEM); > >> > >> diff --git a/libavcodec/vaapi_encode.h b/libavcodec/vaapi_encode.h > >> index 89fe8de466..1bf5d7c337 100644 > >> --- a/libavcodec/vaapi_encode.h > >> +++ b/libavcodec/vaapi_encode.h > >> @@ -48,6 +48,7 @@ enum { > >> MAX_TILE_ROWS = 22, > >> // A.4.1: table A.6 allows at most 20 tile columns for any level. > >> MAX_TILE_COLS = 20, > >> + MAX_ASYNC_DEPTH = 64, > >> }; > >> > >> extern const AVCodecHWConfigInternal *const > >> ff_vaapi_encode_hw_configs[]; > >> @@ -298,7 +299,8 @@ typedef struct VAAPIEncodeContext { > >> // Timestamp handling. > >> int64_t first_pts; > >> int64_t dts_pts_diff; > >> - int64_t ts_ring[MAX_REORDER_DELAY * 3]; > >> + int64_t ts_ring[MAX_REORDER_DELAY * 3 + > >> + MAX_ASYNC_DEPTH]; > >> > >> // Slice structure. > >> int slice_block_rows; > >> @@ -348,6 +350,8 @@ typedef struct VAAPIEncodeContext { > >> AVFrame *frame; > >> > >> AVFifoBuffer *encode_fifo; > >> + > >> + int async_depth; > >> } VAAPIEncodeContext; > >> > >> enum { > >> @@ -458,7 +462,11 @@ int ff_vaapi_encode_close(AVCodecContext *avctx); > >> { "b_depth", \ > >> "Maximum B-frame reference depth", \ > >> OFFSET(common.desired_b_depth), AV_OPT_TYPE_INT, \ > >> - { .i64 = 1 }, 1, INT_MAX, FLAGS } > >> + { .i64 = 1 }, 1, INT_MAX, FLAGS }, \ > >> + { "async_depth", "Maximum processing parallelism. " \ > >> + "Increase this to improve single channel performance", \ > >> + OFFSET(common.async_depth), AV_OPT_TYPE_INT, \ > >> + { .i64 = 4 }, 0, MAX_ASYNC_DEPTH, FLAGS } > >> > >> #define VAAPI_ENCODE_RC_MODE(name, desc) \ > >> { #name, desc, 0, AV_OPT_TYPE_CONST, { .i64 = RC_MODE_ ## name }, > \ > >> -- > >> 2.25.1 > > ping > > I tested this patchset and I can confirm that it solves my bug that I > thought was a mesa bug > (https://gitlab.freedesktop.org/mesa/mesa/-/issues/1235) > > > I would love if this feature is incorporated into ffmpeg > > > Indeed, this is the only patch that makes AMD GPUs usable with VAAPI. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".