From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 3075D4E2EC for ; Mon, 10 Mar 2025 19:48:59 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 2CCD268DF07; Mon, 10 Mar 2025 21:48:55 +0200 (EET) Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4541368DF64 for ; Mon, 10 Mar 2025 21:48:49 +0200 (EET) Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-39143200ddaso1081084f8f.1 for ; Mon, 10 Mar 2025 12:48:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741636128; x=1742240928; darn=ffmpeg.org; h=thread-index:content-language:content-transfer-encoding :mime-version:message-id:date:subject:in-reply-to:references:to:from :from:to:cc:subject:date:message-id:reply-to; bh=E75qERnpOzf2Wq51pJVi6Q2Z2ryw6URUCgaxV0v98vo=; b=jTT3wbqkX7eq/B/VmJG0Sz/tPdldA0h7WqJEqks8IksK0INms+xCaMMYXWSTQ3g7WV 4QE1/vG7qzlc59J/vEweufhZok8+7qbjU9VKt71YJ0YeHIi8ksz8R2UdPZb58Ik0fmnl cZVbYvhiD4B5ItswKr263Ps27v70Bma/NKZEKi0Qa+jkVrPj83BrZ645CvmDmbAZXVJj uOOwCi4/JU630CimMTfNhJLJUPDExqmto9OVVPmTo1+DAE75Crygm2mAxi/E2UYInke7 0VGVpGqiqHaxhSuGrcs5AEFbKq7DRC7eWxWkFuNypJs4d0mNxYxzDTnItqLAsf5SdL8I qBaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741636128; x=1742240928; h=thread-index:content-language:content-transfer-encoding :mime-version:message-id:date:subject:in-reply-to:references:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=E75qERnpOzf2Wq51pJVi6Q2Z2ryw6URUCgaxV0v98vo=; b=t3SfSZaDWwTC2ZKvb3eWcUIF2gx8xilHu/GwjQ4o313LcjT10ZipSiML0dGfwxLgTQ PejdSe4ZS6Fm1EPFyiFjhpw4JWCsZU8PKArNz7j9Bdq80r9MgMVrQcwIWFKbg5eyq7qn BicmBu6OqeOWeS/fz7TRclXVU6ciPEl9T408wUtQ1WJtKtUxLcgj/0ClpXgwo1Khh3SY vX0ErlwSNLn+VxtZFMLyy7HjhDlwVP/KEWc2xwxeVCxgxx84fOj98fLkol6+DqYEqRUR Xi3RnWdL+jX7pRn+JYuAZWhXZYe8nfcMHuiDxQrgkawWmNZTRiJOehomsO659sgC0Yid 6WVQ== X-Gm-Message-State: AOJu0YyGYZA51mERYsU50nhydAGREfTMTWsijPwLdKbM/uoHaQcN8wHA QcNt0fZ1QD8EriiTG/WMvr8et/sJL27SUgLxujWin84Pbkp1ESFcx4GKWw== X-Gm-Gg: ASbGnctRJxRtzni8PEQFBv5e+i4zxP+xBIy2906pRyHnJRao4+VXnQ1+D3Wdfl/fPoQ gpgsw8k3TH+3/6hntkfhclgsvkaGbHB2ZP+/6iQfNvW9mAi89VBzYJxXpOWMg8qMcOKDEueMwAE /Qjpirht7sRhkxC7NCRIPARVQIoTrCkQqLa8tvjLRvDvpWpLbqqSzNaIsS5C2tzmMlogehvT10c TeyBUyInlsfcwBJHtA2Et8ldeR5P/Idc3RQWdXNM6ahwY8rEWaWNXvnQx1j+HLPlgygBmTzWF8Q 2AZP7GviixdZ7/Pkblzp5lTY5aQhomtXPjxgz6ephykbNRb1dCSfD1xNjNjxLiA7GCLYZuD/l/k 8rgbX5OWD7GS55QKN X-Google-Smtp-Source: AGHT+IFi+fSjWnyT4WKNheE2dKYs1CeD7TmsKI7oodTZSno0tiOKkRvORJk6jZKAER7Wrl3EKeHiOA== X-Received: by 2002:a05:6000:401e:b0:391:10c5:d1a8 with SMTP id ffacd0b85a97d-39263b006demr1024844f8f.6.1741636127982; Mon, 10 Mar 2025 12:48:47 -0700 (PDT) Received: from MK2 (80-108-16-220.cable.dynamic.surfer.at. [80.108.16.220]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3912bfba679sm15666364f8f.8.2025.03.10.12.48.47 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Mar 2025 12:48:47 -0700 (PDT) From: To: References: In-Reply-To: Date: Mon, 10 Mar 2025 20:48:47 +0100 Message-ID: <003301db91f5$70868240$519386c0$@gmail.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Content-Language: en-at Thread-Index: AduR9Tvvuakhnx3kTY+4oNNDb+zgyQAACDxg Subject: [FFmpeg-devel] [PATCH v2 FFmpeg 0/20] Zero-Shot Classification Support for FFMPEG (CLIP and CLAP) X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Hi, I'm excited to propose a series of patches adding support for modern zero-shot classification models to FFmpeg. These patches enable FFmpeg to leverage CLIP (Contrastive Language-Image Pre-training) and CLAP (Contrastive Language-Audio Pre-training) models for media classification. Key Features: Zero-shot classification support: Use text prompts to classify media without training specific models Audio classification with CLAP: Extend FFmpeg's DNN capabilities to audio content Hierarchical classification: Group classification categories with a new category file format Stream classification averaging: New avgclass filter for averaging classification results Implementation Details: The implementation adds tokenizer support to the LibTorch backend using the tokenizers-cpp library The existing dnn_classify filter has been transformed from a video-only filter to a multimedia filter, now supporting both video and audio inputs based on a configuration flag. For video, the implementation supports both standard/original classification (OpenVINO backend) and CLIP (Torch backend). For audio, it adds CLAP support via the Torch backend. For further details, please refer to the documentation. For model conversion/scripting or step-by-step installation, see my GitHub project: https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification Regarding CLAP models, they unfortunately need to be traced due to NumPy weak references, which seems to lock in the device used for tracing. For audio preprocessing, I've implemented two functions: handle_long_audio and handle_short_audio, which imitate the original CLAP Preprocessor. These functions aren't used by default since classify automatically buffers frames to the desired length, but they might improve performance, especially handle_short_audio which repeats parts of the audio. That's why I've kept them in place. I could use help ensuring my implementation doesn't interfere with the original dnn_classification or dnn_processing functionality. Thanks! Furthermore, should I upload tests for this functionality? Model sizes are big around >500 Mb. This time the patches should be fine, I could apply them on my machine. Signed-off-by: MaximilianKaindl _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".