From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id AFC2948429 for ; Mon, 4 Dec 2023 05:37:04 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id B6C3468CDE0; Mon, 4 Dec 2023 07:36:51 +0200 (EET) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 89D6968CDE0 for ; Mon, 4 Dec 2023 07:36:44 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701668209; x=1733204209; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=Z74JKBvWp+S5Y6Qe6neSKIcyb2IFotjVhTFu1YPhHjo=; b=PBE3ccVmuMaUmDTEbuZJK9i/rHl1veiH+cGZ+KYFa9V46LJxPNjFIKuh aMGXtNMfYI1oCATejdC9N6pMPLJo+ZsHEiAAIsnoOzsCX0vxGe0DMz2Hx W4hV2mu+l+0Efa2AFFOS0BJnEC/R78zT0VviXnQU1Iy2quA5INRHlAJbO rjN7fbS1SgmeaK2aVzYRIomQJnMfYWkh1+4yfbpLoen4klgefe/r+4Htn 1dEawQLZnwm2O6Um/ig2Vxe4Tb0hTGrXGeKT3SKuLclmzj0rVub/9Kpg+ /WStz/0ULYrnsh8DF6SbT/SVB6mAng75p9uwMilgP9M8LFp7VolDW8IRe g==; X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="397574029" X-IronPort-AV: E=Sophos;i="6.04,248,1695711600"; d="scan'208";a="397574029" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2023 21:36:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10913"; a="914308728" X-IronPort-AV: E=Sophos;i="6.04,249,1695711600"; d="scan'208";a="914308728" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by fmsmga001.fm.intel.com with ESMTP; 03 Dec 2023 21:36:36 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Mon, 4 Dec 2023 13:36:32 +0800 Message-Id: <20231204053633.1743228-3-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231204053633.1743228-1-wenbin.chen@intel.com> References: <20231204053633.1743228-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 3/4] libavfilter/vf_dnn_detect: Add yolov3 support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: From: Wenbin Chen Add yolov3 support. The difference of yolov3 is that it has multiple outputs in different scale to perform better on both large and small object. The model detail refer to: https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tf Signed-off-by: Wenbin Chen --- libavfilter/vf_dnn_detect.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index 86f61c9907..7a32b191c3 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -35,6 +35,7 @@ typedef enum { DDMT_SSD, DDMT_YOLOV1V2, + DDMT_YOLOV3 } DNNDetectionModelType; typedef struct DnnDetectContext { @@ -73,6 +74,7 @@ static const AVOption dnn_detect_options[] = { { "model_type", "DNN detection model type", OFFSET2(model_type), AV_OPT_TYPE_INT, { .i64 = DDMT_SSD }, INT_MIN, INT_MAX, FLAGS, "model_type" }, { "ssd", "output shape [1, 1, N, 7]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_SSD }, 0, 0, FLAGS, "model_type" }, { "yolo", "output shape [1, N*Cx*Cy*DetectionBox]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV1V2 }, 0, 0, FLAGS, "model_type" }, + { "yolov3", "outputs shape [1, N*D, Cx, Cy]", 0, AV_OPT_TYPE_CONST, { .i64 = DDMT_YOLOV3 }, 0, 0, FLAGS, "model_type" }, { "cell_w", "cell width", OFFSET2(cell_w), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, { "cell_h", "cell height", OFFSET2(cell_h), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, { "nb_classes", "The number of class", OFFSET2(nb_classes), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INTMAX_MAX, FLAGS }, @@ -146,6 +148,11 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out cell_h = ctx->cell_h; scale_w = cell_w; scale_h = cell_h; + } else { + cell_w = output[output_index].width; + cell_h = output[output_index].height; + scale_w = ctx->scale_width; + scale_h = ctx->scale_height; } box_size = nb_classes + 5; @@ -173,6 +180,7 @@ static int dnn_detect_parse_yolo_output(AVFrame *frame, DNNData *output, int out output[output_index].height * output[output_index].width / box_size / cell_w / cell_h; + anchors = anchors + (detection_boxes * output_index * 2); /** * find all candidate bbox * yolo output can be reshaped to [B, N*D, Cx, Cy] @@ -284,6 +292,21 @@ static int dnn_detect_post_proc_yolo(AVFrame *frame, DNNData *output, AVFilterCo return 0; } +static int dnn_detect_post_proc_yolov3(AVFrame *frame, DNNData *output, + AVFilterContext *filter_ctx, int nb_outputs) +{ + int ret = 0; + for (int i = 0; i < nb_outputs; i++) { + ret = dnn_detect_parse_yolo_output(frame, output, i, filter_ctx); + if (ret < 0) + return ret; + } + ret = dnn_detect_fill_side_data(frame, filter_ctx); + if (ret < 0) + return ret; + return 0; +} + static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv; @@ -380,8 +403,11 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, int nb_outpu ret = dnn_detect_post_proc_yolo(frame, output, filter_ctx); if (ret < 0) return ret; + case DDMT_YOLOV3: + ret = dnn_detect_post_proc_yolov3(frame, output, filter_ctx, nb_outputs); + if (ret < 0) + return ret; } - return 0; } -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".