From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id B89C64795D for ; Wed, 27 Dec 2023 04:17:35 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 1059A68CC35; Wed, 27 Dec 2023 06:17:30 +0200 (EET) Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C3E8068CC09 for ; Wed, 27 Dec 2023 06:17:22 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703650649; x=1735186649; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=xX1GTHjUu3oPm/EADgKPzmymDjTcq6reRAPElMD6VEk=; b=LSPgUe0LwDaCzruVNyhsy/8NiQexCga9hkyWgFAilxw0i6YOA5F70h7G Z2v/t3HaOQ6x2L1BQ1WHwKFrfxy2kJ9VhHyBbzWOkdauNubWHNyRCa4Qw RB8YguwEyLq6E9MLEV9wokWUseTWWLIfUh3eorrT6klITQa5ZsFDEXpkd GPn0yHpxwn52w0LlXRKgvB13t8CmewSkURNKrB++AbtO83mRzodh2nWpf /tRlQL9d7hPZePaUH4PnG2whvmTKGXr9MdBfpTb4Iq7FvqAhi6gWeKACq +u5BDxkqtp/y95UU3Ak8Ijy+Sa05S0tMmWKiwr7LBiU8Jc8xCp4GrWo6Z w==; X-IronPort-AV: E=McAfee;i="6600,9927,10935"; a="15082291" X-IronPort-AV: E=Sophos;i="6.04,308,1695711600"; d="scan'208";a="15082291" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Dec 2023 20:17:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10935"; a="848581554" X-IronPort-AV: E=Sophos;i="6.04,308,1695711600"; d="scan'208";a="848581554" Received: from wenbin-z390-aorus-ultra.sh.intel.com ([10.239.156.43]) by fmsmga004.fm.intel.com with ESMTP; 26 Dec 2023 20:17:01 -0800 From: wenbin.chen-at-intel.com@ffmpeg.org To: ffmpeg-devel@ffmpeg.org Date: Wed, 27 Dec 2023 12:16:58 +0800 Message-Id: <20231227041658.392174-2-wenbin.chen@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231227041658.392174-1-wenbin.chen@intel.com> References: <20231227041658.392174-1-wenbin.chen@intel.com> MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] libavfilter/vf_dnn_detect: Add two outputs ssd support X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: From: Wenbin Chen For this kind of model, we can directly use its output as final result just like ssd model. The difference is that it splits output into two tensors. [x_min, y_min, x_max, y_max, confidence] and [lable_id]. Model example refer to: https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/person-detection-0106 Signed-off-by: Wenbin Chen --- libavfilter/vf_dnn_detect.c | 64 +++++++++++++++++++++++++++++-------- 1 file changed, 50 insertions(+), 14 deletions(-) diff --git a/libavfilter/vf_dnn_detect.c b/libavfilter/vf_dnn_detect.c index 88865c8a8e..249cbba0f7 100644 --- a/libavfilter/vf_dnn_detect.c +++ b/libavfilter/vf_dnn_detect.c @@ -359,24 +359,48 @@ static int dnn_detect_post_proc_yolov3(AVFrame *frame, DNNData *output, return 0; } -static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, AVFilterContext *filter_ctx) +static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, int nb_outputs, + AVFilterContext *filter_ctx) { DnnDetectContext *ctx = filter_ctx->priv; float conf_threshold = ctx->confidence; - int proposal_count = output->height; - int detect_size = output->width; - float *detections = output->data; + int proposal_count = 0; + int detect_size = 0; + float *detections = NULL, *labels = NULL; int nb_bboxes = 0; AVDetectionBBoxHeader *header; AVDetectionBBox *bbox; - - if (output->width != 7) { + int scale_w = ctx->scale_width; + int scale_h = ctx->scale_height; + + if (nb_outputs == 1 && output->width == 7) { + proposal_count = output->height; + detect_size = output->width; + detections = output->data; + } else if (nb_outputs == 2 && output[0].width == 5) { + proposal_count = output[0].height; + detect_size = output[0].width; + detections = output[0].data; + labels = output[1].data; + } else if (nb_outputs == 2 && output[1].width == 5) { + proposal_count = output[1].height; + detect_size = output[1].width; + detections = output[1].data; + labels = output[0].data; + } else { av_log(filter_ctx, AV_LOG_ERROR, "Model output shape doesn't match ssd requirement.\n"); return AVERROR(EINVAL); } + if (proposal_count == 0) + return 0; + for (int i = 0; i < proposal_count; ++i) { - float conf = detections[i * detect_size + 2]; + float conf; + if (nb_outputs == 1) + conf = detections[i * detect_size + 2]; + else + conf = detections[i * detect_size + 4]; if (conf < conf_threshold) { continue; } @@ -398,12 +422,24 @@ static int dnn_detect_post_proc_ssd(AVFrame *frame, DNNData *output, AVFilterCon for (int i = 0; i < proposal_count; ++i) { int av_unused image_id = (int)detections[i * detect_size + 0]; - int label_id = (int)detections[i * detect_size + 1]; - float conf = detections[i * detect_size + 2]; - float x0 = detections[i * detect_size + 3]; - float y0 = detections[i * detect_size + 4]; - float x1 = detections[i * detect_size + 5]; - float y1 = detections[i * detect_size + 6]; + int label_id; + float conf, x0, y0, x1, y1; + + if (nb_outputs == 1) { + label_id = (int)detections[i * detect_size + 1]; + conf = detections[i * detect_size + 2]; + x0 = detections[i * detect_size + 3]; + y0 = detections[i * detect_size + 4]; + x1 = detections[i * detect_size + 5]; + y1 = detections[i * detect_size + 6]; + } else { + label_id = (int)labels[i]; + x0 = detections[i * detect_size] / scale_w; + y0 = detections[i * detect_size + 1] / scale_h; + x1 = detections[i * detect_size + 2] / scale_w; + y1 = detections[i * detect_size + 3] / scale_h; + conf = detections[i * detect_size + 4]; + } if (conf < conf_threshold) { continue; @@ -447,7 +483,7 @@ static int dnn_detect_post_proc_ov(AVFrame *frame, DNNData *output, int nb_outpu switch (ctx->model_type) { case DDMT_SSD: - ret = dnn_detect_post_proc_ssd(frame, output, filter_ctx); + ret = dnn_detect_post_proc_ssd(frame, output, nb_outputs, filter_ctx); if (ret < 0) return ret; break; -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".