From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id AFADF45AA1 for ; Sun, 13 Aug 2023 21:43:54 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C586C68C597; Mon, 14 Aug 2023 00:43:51 +0300 (EEST) Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4CDD768C17A for ; Mon, 14 Aug 2023 00:43:45 +0300 (EEST) Received: by mail-lj1-f169.google.com with SMTP id 38308e7fff4ca-2b9338e4695so57312181fa.2 for ; Sun, 13 Aug 2023 14:43:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jkqxz-net.20221208.gappssmtp.com; s=20221208; t=1691963024; x=1692567824; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=lBlEUdSLbwf7L6i4kpjlEsSZ4Za00hsRr4iep+zhJTY=; b=fA3xW4n9SFJUu74yxBZYj+y6WEN/Fg8PMUOSfYIiq7HDygDb5aMrTMISqmImNGycVY 5NMqm9DYEebmmvx6XOU4no/A56/FreMrWfTFli0iYx2C4cu7CrBJeNYlujwPbaFToj8u daGskB35+2eDMDaCANL34igZkl1JcOwKI91RiG5nL/BtwG24D4O4CRWpHPZHTSUiYIl+ Bk3Bpx3/J+Mrt8r4roi4Jz5ZwdmJI80GLT4+sXpkpHZGn8gA7uVZVYVIRHivneRlWuNB lP+KRMTriMNPSkNZwRAbB+b7UbALVt5wNiwkKdJ+WtPiqMg4msFUCcaYXtD0vpZM5AEI R49Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691963024; x=1692567824; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lBlEUdSLbwf7L6i4kpjlEsSZ4Za00hsRr4iep+zhJTY=; b=kFBf+brTpewFkeqK5PJRs+HuyONrCcJhun5cAkxR3zTd0Y7EXKfENutFDPAC2VDn2H 8VKTIqSihXM3JdFduUgmSm0qwTq1awN6RgO6fAJCqoAkFEJ2rQS6UlnkODc2h5FRQtLM VsY5ppvGdOMxiJb3Zhb96OjEmrbHoXYzbWtS+rTM+YOnDg48v048YwMlQBFEtPkBZ4LO wfePigLV+z4Ogw4T40MTbEamXFkzsctUZD24z9HBVdovXYN34MbnAJ0Vupj+yVyqvYgY l8RoAADUNzpPBzd4PWQf+tnAhcB4VfJduDyCyp29pIU1YSSJ4ww5eQK8rchbWoG6iy0+ CayA== X-Gm-Message-State: AOJu0YyB1PtcI70khUlxNrN39hQ2NhTBDfgJHxCGWxA9nPwbz9ancSK4 wKwNqx2hhQjxnbtL8836ViNAFAnMfkya0f9Sfpk= X-Google-Smtp-Source: AGHT+IFfSybQEujxCIvoyP+Mg3DBwqcHRmZUcf/Klfpkt/lRKbgKC0gBpv+SLrHitE1DHoGWvwLtFw== X-Received: by 2002:a2e:809a:0:b0:2b4:6f0c:4760 with SMTP id i26-20020a2e809a000000b002b46f0c4760mr5088680ljg.11.1691963023616; Sun, 13 Aug 2023 14:43:43 -0700 (PDT) Received: from [192.168.0.15] (cpc92320-cmbg19-2-0-cust383.5-4.cable.virginm.net. [82.13.65.128]) by smtp.gmail.com with ESMTPSA id f21-20020a1c6a15000000b003fc02218d6csm15129630wmc.25.2023.08.13.14.43.43 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 13 Aug 2023 14:43:43 -0700 (PDT) Message-ID: <1acd766d-e0dc-69f6-56e5-0d84e62f83c8@jkqxz.net> Date: Sun, 13 Aug 2023 22:43:56 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20230803060132.501741-1-fei.w.wang@intel.com> <20230803060132.501741-6-fei.w.wang@intel.com> <336d0b1463430f3591a9444b60191cf00f595700.camel@intel.com> From: Mark Thompson In-Reply-To: <336d0b1463430f3591a9444b60191cf00f595700.camel@intel.com> Subject: Re: [FFmpeg-devel] [PATCH v3 6/6] lavc/vaapi_encode: Add VAAPI AV1 encoder X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 10/08/2023 03:54, Wang, Fei W wrote: > On Mon, 2023-08-07 at 22:21 +0100, Mark Thompson wrote: >> On 03/08/2023 07:01, fei.w.wang-at-intel.com@ffmpeg.org wrote: >>> From: Fei Wang >>> >>> Signed-off-by: Fei Wang >>> --- >>> Changelog | 1 + >>> configure | 3 + >>> doc/encoders.texi | 13 + >>> libavcodec/Makefile | 1 + >>> libavcodec/allcodecs.c | 1 + >>> libavcodec/vaapi_encode.c | 125 +++- >>> libavcodec/vaapi_encode.h | 12 + >>> libavcodec/vaapi_encode_av1.c | 1229 >>> +++++++++++++++++++++++++++++++++ >>> libavcodec/version.h | 2 +- >>> 9 files changed, 1368 insertions(+), 19 deletions(-) >>> create mode 100644 libavcodec/vaapi_encode_av1.c >> >> I assume this is tested on Intel hardware. Is it tested on AMD as >> well? (Apparently working in recent Mesa.) > > AMD tested the patchset months ago: > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22585 > > This patch changed a little compare with the version in Cartwheel, > @Ruijing, Could you help to review this version in ML? To avoid the > diffs break you. Thanks. > >>> ... >>> @@ -669,6 +669,15 @@ static int >>> vaapi_encode_set_output_timestamp(AVCodecContext *avctx, >>> { >>> VAAPIEncodeContext *ctx = avctx->priv_data; >>> >>> + // AV1 packs P frame and next B frame into one pkt, and uses >>> the other >>> + // repeat frame header pkt at the display order position of >>> the P frame >>> + // to indicate its frame index. Each frame has a corresponding >>> pkt in its >>> + // display order position. So don't need to consider delay for >>> AV1 timestamp. >>> + if (avctx->codec_id == AV_CODEC_ID_AV1) { >>> + pkt->dts = pkt->pts - ctx->dts_pts_diff; >>> + return 0; >>> + } >> >> This doesn't get you the right result, though? The PTS and DTS on >> every AV1 packet want to be the same. > > The result tested good which can be played normally. Just aligned with > other vaapi encoders that the 1st frame start with 0/-1 of PTS/DTS if > have B frame. Set PTS/DTS to same also looks good. DTS != PTS should only be true when the stream has externally-visible frame reordering in, which AV1 doesn't. The other two AV1 encoders both output DTS == PTS; I think you should match that. >> In any case, please don't put tests for codec ID in the common >> code. I suggest that the timestamp behaviour wants to be a new >> FLAG_*. >> >>> + >>> ... >>> @@ -1128,9 +1182,19 @@ static int >>> vaapi_encode_pick_next(AVCodecContext *avctx, >>> >>> vaapi_encode_add_ref(avctx, pic, pic, 0, 1, 0); >>> if (pic->type != PICTURE_TYPE_IDR) { >>> - vaapi_encode_add_ref(avctx, pic, start, >>> - pic->type == PICTURE_TYPE_P, >>> - b_counter > 0, 0); >>> + // TODO: apply both previous and forward multi reference >>> for all vaapi encoders. >>> + // And L0/L1 reference frame number can be set dynamically >>> through query >>> + // VAConfigAttribEncMaxRefFrames attribute. >>> + if (avctx->codec_id == AV_CODEC_ID_AV1) { >>> + for (i = 0; i < ctx->nb_next_prev; i++) >>> + vaapi_encode_add_ref(avctx, pic, ctx- >>>> next_prev[i], >>> + pic->type == PICTURE_TYPE_P, >>> + b_counter > 0, 0); >> >> So an undisclosed aim of the previous patch was to make this extra >> list of the most recent N frames for this to use with P frames only? > > Currently, yes. As the TODO comment says, I will apply it to all other > codecs and B frame next, it will need lots of test on different > hardware. Add this for AV1 here is to align with VPL which use 2 > previous reference for P frame. Base on our test, 2 reference frame for > P will improve ~5% BD-rate for quality, so that can make it possible to > let user get same quality with VPL after adjusting and aligning encoder > options. Hmm. My Intel test machine says it can take 8 L0 and 2 L1 references in H.264. I might try this, seems like it should be easy to test. >> >> Why this partcular structure for P frames only, though? Seems like >> if you support extra references then other codecs could also make use >> of them in either direction, so we would be better off hinting that >> the codec would like up to N references and then using the contents >> of the existing DPB, plus keep extras if there is space. >> >>> ... >>> + >>> +static av_cold int vaapi_encode_av1_configure(AVCodecContext >>> *avctx) >>> +{ >>> + VAAPIEncodeContext *ctx = avctx->priv_data; >>> + VAAPIEncodeAV1Context *priv = avctx->priv_data; >>> + int ret; >>> + >>> + ret = ff_cbs_init(&priv->cbc, AV_CODEC_ID_AV1, avctx); >>> + if (ret < 0) >>> + return ret; >>> + >>> + if (ctx->rc_mode->quality) { >>> + priv->q_idx_p = av_clip(ctx->rc_quality, 0, >>> AV1_MAX_QUANT); >>> + if (fabs(avctx->i_quant_factor) > 0.0) >>> + priv->q_idx_idr = >>> + av_clip((fabs(avctx->i_quant_factor) * priv- >>>> q_idx_p + >>> + avctx->i_quant_offset) + 0.5, >>> + 0, AV1_MAX_QUANT); >>> + else >>> + priv->q_idx_idr = priv->q_idx_p; >>> + >>> + if (fabs(avctx->b_quant_factor) > 0.0) >>> + priv->q_idx_b = >>> + av_clip((fabs(avctx->b_quant_factor) * priv- >>>> q_idx_p + >>> + avctx->b_quant_offset) + 0.5, >>> + 0, AV1_MAX_QUANT); >>> + else >>> + priv->q_idx_b = priv->q_idx_p; >>> + } else { >>> + /** Arbitrary value */ >>> + priv->q_idx_idr = priv->q_idx_p = priv->q_idx_b = 128; >>> + } >> >> Are there any alignment requirements here? (If I gave it an input >> which is, say, 1361x1, would that work?) > > I didn't get you. Which parameter here need to be align? Max/Min > resolution HW supported will be checked somewhere else. If use 1361x1 > there will be the error like: > > "Hardware does not support encoding at size 1376x16 (constraints: width > 32-8192 height 32-8192)." I mean of padding for the reference surfaces. I was expecting this to be rounded up to a multiple of 8 (to match MiCols/MiRows) - see this function in other codecs. >>> ... >>> + >>> + /** update obu size in bitstream */ >>> + if (fh_obu->header.obu_has_size_field) { >>> + obu_size_len = priv->attr_ext2.bits.obu_size_bytes_minus1 >>> + 1; >>> + for (i = 0; i < obu_size_len; i++) { >>> + byte = obu_size >> (7 * i) & 0x7f; >>> + if (i < obu_size_len - 1) >>> + byte |= 0x80; >>> + put_bits(&pbc_tmp, 8, byte); >>> + } >>> + flush_put_bits(&pbc_tmp); >>> + memmove(pbc_tmp.buf_ptr, pbc_tmp.buf_ptr + (8 - >>> obu_size_len), obu_size); >>> + *data_len -= (8 - obu_size_len) * 8; >>> + } >> >> Why is there an incomplete duplicate of the cbs_av1 header writing >> code here? > > To record some position/size in bitstream that needed for VAAPI. Like > qp_index/loopfilter/cdef offset and cdef parameters size in bit. It's > not reasonable to add the specific parameters into CBS. How about with ? >>> ... >>> + >>> +static int vaapi_encode_av1_set_tile(AVCodecContext *avctx) >>> +{ >>> + VAAPIEncodeAV1Context *priv = avctx->priv_data; >>> + int mi_cols, mi_rows, sb_shift, sb_size; >>> + int max_tile_area_sb, max_tile_area_sb_varied; >>> + int tile_width_sb, tile_height_sb, widest_tile_sb; >>> + int min_log2_tiles; >>> + int tile_rows_tmp, i; >>> + >>> + if (priv->tile_cols > AV1_MAX_TILE_COLS || >>> + priv->tile_rows > AV1_MAX_TILE_ROWS) { >>> + av_log(avctx, AV_LOG_ERROR, "Invalid tile number %dx%d, >>> should less than %dx%d.\n", >>> + priv->tile_cols, priv->tile_rows, >>> AV1_MAX_TILE_COLS, AV1_MAX_TILE_ROWS); >> >> Can't this be a constraint on the option rather than a special check >> here? > > Tiles set with IMAGE_SIZE option type which aligns with HEVC. It > doesn't support Min/MAX check. Hmm, yeah - fair enough. >>> ... >>> + >>> + sh->color_config = (AV1RawColorConfig) { >>> + .high_bitdepth = desc->comp[0].depth == 8 >>> ? 0 : 1, >>> + .color_primaries = avctx->color_primaries, >>> + .transfer_characteristics = avctx->color_trc, >>> + .matrix_coefficients = avctx->colorspace, >>> + .color_description_present_flag = (avctx->color_primaries >>> != AVCOL_PRI_UNSPECIFIED || >>> + avctx- >>>> color_trc != AVCOL_TRC_UNSPECIFIED || >>> + avctx- >>>> colorspace != AVCOL_SPC_UNSPECIFIED), >>> + .color_range = avctx->color_range == >>> AVCOL_RANGE_JPEG, >>> + .subsampling_x = desc->log2_chroma_w, >>> + .subsampling_y = desc->log2_chroma_h, >> >> .chroma_sample_position = some function of chroma_sample_location. > > There is no chroma_sample_position supported in VAAPI yet. VAAPI won't care. It should be marked correctly in the headers so that the output stream has the right value. (CFL not having any ability to deal with the sample location is a known deficiency of AV1.) >>> ... >>> + fh->tile_cols = priv->tile_cols; >>> + fh->tile_rows = priv->tile_rows; >>> + fh->tile_cols_log2 = priv->tile_cols_log2; >>> + fh->tile_rows_log2 = priv->tile_rows_log2; >>> + fh->uniform_tile_spacing_flag = priv->uniform_tile; >>> + fh->tile_size_bytes_minus1 = priv- >>>> attr_ext2.bits.tile_size_bytes_minus1; >>> + fh->reduced_tx_set = 1; >> >> This not being present in the capabilities feels like an omission in >> the API. That's a large loss for an implementation which does >> support the full transform set. > > Personally understanding, "not being present" means it must be > supported. If any exception driver, then they can ask to add new > capability to libva. reduced_tx_set is a negative property - you want to set it to zero to allow all transforms, unless the implementation can't cope with that. Does the Intel implementation require it to be set? (That is, does not support the full transform set.) >>> ... >>> + >>> + for (i = 0; i < fh->tile_cols; i++) >>> + fh->width_in_sbs_minus_1[i] = vpic- >>>> width_in_sbs_minus_1[i] = priv->width_in_sbs_minus_1[i]; >>> + >>> + for (i = 0; i < fh->tile_rows; i++) >>> + fh->height_in_sbs_minus_1[i] = vpic- >>>> height_in_sbs_minus_1[i] = priv->height_in_sbs_minus_1[i]; >> >> Er, what? Doesn't this fail the inference on the last tile >> width/height if you don't split exactly? > > What's the meaning of "split exactly"? All the tile w/h will be split > in vaapi_encode_av1_set_tile include last tile. Just make sure the info > of tiles w/h in frame header are same with the info passed to VAAPI > should be fine. Sorry, I misread this. The values are set elsewhere, it's not setting all tiles to have the same SB width/height. >>> ... >>> + >>> + vpic->base_qindex = fh->base_q_idx; >>> + vpic->frame_width_minus_1 = fh->frame_width_minus_1; >>> + vpic->frame_height_minus_1 = fh->frame_height_minus_1; >>> + vpic->primary_ref_frame = fh->primary_ref_frame; >>> + vpic->reconstructed_frame = pic->recon_surface; >>> + vpic->coded_buf = pic->output_buffer; >>> + vpic->tile_cols = fh->tile_cols; >>> + vpic->tile_rows = fh->tile_rows; >>> + vpic->order_hint = fh->order_hint; >>> +#if VA_CHECK_VERSION(1, 15, 0) >>> + vpic->refresh_frame_flags = fh->refresh_frame_flags; >>> +#endif >> >> What will the difference be between running in the two versions? (Is >> it worth supporting the older one? Cf. bit_depth on VP9.) > > I'd prefer to support older one. In case of someone using old version > and works well with ffmpeg-vpl(as vpl already support av1 encode long > time ago), otherwise he must be confuse why ffmpeg-vaapi need higher > VAAPI version but VPL doesn't. Ok. So the driver doesn't use the value of refresh_frame_flags if it doesn't affect the result? (Seems like it shouldn't, not sure why it would be added to the API.) >>> ... Thanks, - Mark _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".