From: <m.kaindl0208@gmail.com> To: <ffmpeg-devel@ffmpeg.org> Subject: [FFmpeg-devel] [PATCH v2 FFmpeg 0/20] Zero-Shot Classification Support for FFMPEG (CLIP and CLAP) Date: Mon, 10 Mar 2025 20:48:47 +0100 Message-ID: <003301db91f5$70868240$519386c0$@gmail.com> (raw) In-Reply-To: Hi, I'm excited to propose a series of patches adding support for modern zero-shot classification models to FFmpeg. These patches enable FFmpeg to leverage CLIP (Contrastive Language-Image Pre-training) and CLAP (Contrastive Language-Audio Pre-training) models for media classification. Key Features: Zero-shot classification support: Use text prompts to classify media without training specific models Audio classification with CLAP: Extend FFmpeg's DNN capabilities to audio content Hierarchical classification: Group classification categories with a new category file format Stream classification averaging: New avgclass filter for averaging classification results Implementation Details: The implementation adds tokenizer support to the LibTorch backend using the tokenizers-cpp library The existing dnn_classify filter has been transformed from a video-only filter to a multimedia filter, now supporting both video and audio inputs based on a configuration flag. For video, the implementation supports both standard/original classification (OpenVINO backend) and CLIP (Torch backend). For audio, it adds CLAP support via the Torch backend. For further details, please refer to the documentation. For model conversion/scripting or step-by-step installation, see my GitHub project: https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification Regarding CLAP models, they unfortunately need to be traced due to NumPy weak references, which seems to lock in the device used for tracing. For audio preprocessing, I've implemented two functions: handle_long_audio and handle_short_audio, which imitate the original CLAP Preprocessor. These functions aren't used by default since classify automatically buffers frames to the desired length, but they might improve performance, especially handle_short_audio which repeats parts of the audio. That's why I've kept them in place. I could use help ensuring my implementation doesn't interfere with the original dnn_classification or dnn_processing functionality. Thanks! Furthermore, should I upload tests for this functionality? Model sizes are big around >500 Mb. This time the patches should be fine, I could apply them on my machine. Signed-off-by: MaximilianKaindl <m.kaindl0208@gmail.com> _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
reply other threads:[~2025-03-10 19:48 UTC|newest] Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='003301db91f5$70868240$519386c0$@gmail.com' \ --to=m.kaindl0208@gmail.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git