From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 52D6A4C2A0 for ; Sat, 8 Mar 2025 14:59:15 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id AFF7068F337; Sat, 8 Mar 2025 16:59:12 +0200 (EET) Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id E4CA668F2B6 for ; Sat, 8 Mar 2025 16:59:10 +0200 (EET) Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-43bc4b1603fso16915665e9.0 for ; Sat, 08 Mar 2025 06:59:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741445950; x=1742050750; darn=ffmpeg.org; h=content-language:thread-index:content-transfer-encoding :mime-version:message-id:date:subject:to:from:from:to:cc:subject :date:message-id:reply-to; bh=MXRLkgYsDdXp0DDvUAn1VlU5ra9NjFuuv1ZnFhPh8V4=; b=jdwcNY0FJICzFd1c8AAXgl68r3Asldf52KI7kGj6GjvMgz4OUuuLJbajPPovwZohSX SxdfIFRLIRtJPAepJAOQMA67GnVjiBsDH7CArjo3SZixv02ywOgJI5wVZGs8CNpzKVbU YRfJB0nyeFFBqQE3oQwhXeivPWQ2+7i7CQUx39zpZJPRfsJDqcMXFWGBh8XlTqW5Syz9 TadgyVUM6QKCiwyUwyGTCQOTGSb3yljjzLM30Fw+jT9072HitjqJ7k3Jw/zZCOhD/wfR n3dJKez9dPmyZKOtOjXVPwUzMDIi7yw2obEkDmc7AskHekj6YdB74toutOblJUqMuWnq LqCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741445950; x=1742050750; h=content-language:thread-index:content-transfer-encoding :mime-version:message-id:date:subject:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=MXRLkgYsDdXp0DDvUAn1VlU5ra9NjFuuv1ZnFhPh8V4=; b=hKRur+2TSIvrYcozoeeB9LmyTjE8FMXr5xOSin/18Bcy7NvfjLO2xE3nbkmnFgtLv9 WOPmQdIEl8rbi40K7TfMmC56R5EEx5kM/qSwz8A196jJva/pN3ZDju5yDD/zgbHRphk8 0KHjbdMPqeNX0DJVf0rvX54xT4DRKMb7fnzwBWPgLha3Thmhp4ZKOF+EV/Xgz0GR6qQm wUcGWcKHraZoaxCSMfnjVE4TrScHIgu1lV+cW8RQ05k50ZfZUv7GO8CtZPL8tQojSi0U sREat19yz3jE8jr+KpqruIuFmGXNeMZTJ6dQQHR34S8WUbtcb4c65TP1Gg6hjmas7Lek NkTg== X-Gm-Message-State: AOJu0YxfGbgv5xmJD0EkQLlTOuRiSl64rBN9Qwx5eZaK8T9czVYXRVYd RJgovLEUemY1RqXzmkRiXmmYYWD4avZBGNTPzxtRY6hM3ioV7/SYHgtTbw== X-Gm-Gg: ASbGncvyoNtibj8cPlqPgxqmKLn6G7EI3hMyCp6wScQqOS6AESv9R+FHpgALKC4H4QX tfdCXXjjybHouNonYtfuiHtCcTBtQ0Ug19fTqisXX8zrZV8JVCQMNmBKxt6xsR07UlowCIPyLMW 81VSwWh7+iKYiYZHTY1gVoghELPI8MPTaR//UX6xZ7KwKm3U98XlSE6JKFpAljSZrosgmPN42V0 N/hed9rmC1OXgPFYGzKOJteHXipbARODwAwdLTK0kxFegcTiEC/g999rFVM0gKXQ4qTLhhccDt9 U3Zlph+Od6E1Tk10cuXYaXNofnkiOksEyca/tkczy+GpYmch/erFhgKYPtDtU409tImbioRTNHg xMbtSx5m50d164cEZ X-Google-Smtp-Source: AGHT+IGDG5nS8Q9qkUsoSiSvWJFaNwY/orJfjciUccLZMJxFfgYddQqEm1LZZpLWSSANtCJbCEXH6Q== X-Received: by 2002:a05:600c:4446:b0:43b:cb12:ba6d with SMTP id 5b1f17b1804b1-43c5a5e901fmr44581195e9.3.1741445949629; Sat, 08 Mar 2025 06:59:09 -0800 (PST) Received: from MK2 (80-108-16-220.cable.dynamic.surfer.at. [80.108.16.220]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43bdd8b0461sm89444905e9.4.2025.03.08.06.59.09 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 08 Mar 2025 06:59:09 -0800 (PST) From: To: Date: Sat, 8 Mar 2025 15:59:12 +0100 Message-ID: <007601db903a$a78425c0$f68c7140$@gmail.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AduQOIrrjtMyMm7RQEefMIMnDoyUaA== Content-Language: en-at Subject: [FFmpeg-devel] [PATCH FFmpeg 4/15] libavfilter: dnn interface definitions for CLIP/CLAP Inference X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Defines new DNNFunctionType enums for CLIP and CLAP inference and adds new data structures like DNNExecZeroShotClassificationParams to support zero-shot classification models. Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedback is appreciated! Signed-off-by: MaximilianKaindl --- libavfilter/dnn_interface.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/libavfilter/dnn_interface.h b/libavfilter/dnn_interface.h index 66086409be..2125348c6b 100644 --- a/libavfilter/dnn_interface.h +++ b/libavfilter/dnn_interface.h @@ -58,6 +58,8 @@ typedef enum { DFT_PROCESS_FRAME, // process the whole frame DFT_ANALYTICS_DETECT, // detect from the whole frame DFT_ANALYTICS_CLASSIFY, // classify for each bounding box + DFT_ANALYTICS_CLIP, // classify whole frame with zero-shot classification + DFT_ANALYTICS_CLAP // classify whole audio frame with zero-shot classification }DNNFunctionType; typedef enum { @@ -90,6 +92,16 @@ typedef struct DNNExecClassificationParams { const char *target; } DNNExecClassificationParams; +typedef struct DNNExecZeroShotClassificationParams { + DNNExecBaseParams base; + const char **labels; + const int label_count; + const char *target; + const char *tokenizer_path; + const int *softmax_units; + const int softmax_units_count; +} DNNExecZeroShotClassificationParams; + typedef int (*FramePrePostProc)(AVFrame *frame, DNNData *model, AVFilterContext *filter_ctx); typedef int (*DetectPostProc)(AVFrame *frame, DNNData *output, uint32_t nb, AVFilterContext *filter_ctx); typedef int (*ClassifyPostProc)(AVFrame *frame, DNNData *output, uint32_t bbox_index, AVFilterContext *filter_ctx); @@ -136,6 +148,16 @@ typedef struct OVOptions { typedef struct THOptions { const AVClass *clazz; int optimize; + + // Contrastive Language-X Pre-training options + float logit_scale; + float temperature; + int forward_order; // Order of forward output (0: media text, 1: text media) + int normalize; // Normalize the input tensor + int64_t token_dimension; + int64_t input_resolution; + int64_t sample_rate; + int64_t sample_duration; } THOptions; typedef struct DNNModule DNNModule; @@ -177,6 +199,8 @@ struct DNNModule { DNNBackendType type; // Loads model and parameters from given file. Returns NULL if it is not possible. DNNModel *(*load_model)(DnnContext *ctx, DNNFunctionType func_type, AVFilterContext *filter_ctx); + // Loads model, tokenizer and parameters from given file. Returns NULL if it is not possible. + DNNModel *(*load_model_with_tokenizer)(DnnContext *ctx, DNNFunctionType func_type, const char** labels, int label_count, int* softmax_units, int softmax_units_count, const char* tokenizer_path, AVFilterContext *filter_ctx); // Executes model with specified input and output. Returns the error code otherwise. int (*execute_model)(const DNNModel *model, DNNExecBaseParams *exec_params); // Retrieve inference result. -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".