From: <m.kaindl0208@gmail.com> To: <ffmpeg-devel@ffmpeg.org> Subject: [FFmpeg-devel] [PATCH FFmpeg 9/15] doc: Filters.texi updated classify Date: Sat, 8 Mar 2025 16:01:07 +0100 Message-ID: <007b01db903a$ebc0ddf0$c34299d0$@gmail.com> (raw) Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. Any Feedback is appreciated! Signed-off-by: MaximilianKaindl <m.kaindl0208@gmail.com> --- doc/filters.texi | 106 +++++++++++++++++++++++++++++++++-------------- 1 file changed, 76 insertions(+), 30 deletions(-) diff --git a/doc/filters.texi b/doc/filters.texi index 0ba7d3035f..b6cccbacb6 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -11971,43 +11971,89 @@ ffmpeg -i INPUT -f lavfi -i nullsrc=hd720,geq='r=128+80*(sin(sqrt((X-W/2)*(X-W/2 @end itemize @section dnn_classify +Analyze media (video frames or audio) using deep neural networks to apply classifications based on the content. +This filter supports three classification modes: -Do classification with deep neural networks based on bounding boxes. +@itemize @bullet +@item Standard image classification (OpenVINO backend) +@item CLIP (Contrastive Language-Image Pre-training) classification (Torch backend) +@item CLAP (Contrastive Language-Audio Pre-training) classification (Torch backend) +@end itemize The filter accepts the following options: - @table @option @item dnn_backend -Specify which DNN backend to use for model loading and execution. This option accepts -only openvino now, tensorflow backends will be added. - -@item model -Set path to model file specifying network architecture and its parameters. -Note that different backends use different file formats. - -@item input -Set the input name of the dnn network. - -@item output -Set the output name of the dnn network. - +Specify which DNN backend to use for model loading and execution. Currently supports: +@table @samp +@item openvino +Use OpenVINO backend (standard image classification only). +@item torch +Use LibTorch backend (supports CLIP for images and CLAP for audio). +@end table @item confidence -Set the confidence threshold (default: 0.5). - +Set the confidence threshold (default: 0.5). Classifications with confidence below this value will be filtered out. @item labels -Set path to label file specifying the mapping between label id and name. -Each label name is written in one line, tailing spaces and empty lines are skipped. -The first line is the name of label id 0, -and the second line is the name of label id 1, etc. -The label id is considered as name if the label file is not provided. - -@item backend_configs -Set the configs to be passed into backend - -For tensorflow backend, you can set its configs with @option{sess_config} options, -please use tools/python/tf_sess_config.py to get the configs for your system. - -@end table +Set path to a label file specifying classification labels. This is required for standard classification and can be used for CLIP/CLAP classification. +Each label is written on a separate line in the file. Trailing spaces and empty lines are skipped. +@item categories +Path to a categories file for hierarchical classification (CLIP/CLAP only). This allows classification to be organized into multiple category units with individual categories containing related labels. +@item tokenizer +Path to the text tokenizer.json file (CLIP/CLAP only). Required for text embedding generation. +@item target +Specify which objects to classify. When omitted, the entire frame is classified. When specified, only bounding boxes with detection labels matching this value are classified. +@item is_audio +Enable audio processing mode for CLAP models (default: 0). Set to 1 to process audio input instead of video frames. +@item logit_scale +Logit scale for similarity calculation in CLIP/CLAP (default: 4.6052 for CLIP, 33.37 for CLAP). Values below 0 use the default. +@item temperature +Softmax temperature for CLIP/CLAP models (default: 1.0). Lower values make the output more peaked, higher values make it smoother. +@item forward_order +Order of forward output for CLIP/CLAP: 0 for media-text order, 1 for text-media order (default depends on model type). +@item normalize +Whether to normalize the input tensor for CLIP/CLAP (default depends on model type). Some scripted models already do this in the forward, so this is not necessary in some cases. +@item input_res +Expected input resolution for video processing models (default: automatically detected). +@item sample_rate +Expected sample rate for audio processing models (default: 44100). +@item sample_duration +Expected sample duration in seconds for audio processing models (default: 7). +@item token_dimension +Dimension of token vector for text embeddings (default: 77). +@item optimize +Enable graph executor optimization (0: disabled, 1: enabled). +@end table +@subsection Category Files Format +For CLIP/CLAP models, a hierarchical categories file can be provided with the following format: +@example +[RecordingSystem] +(Professional) +a photo with high level of detail +a professionally recorded sound +(HomeRecording) +a photo with low level of detail +an amateur recording +[ContentType] +(Nature) +trees +mountains +birds singing +(Urban) +buildings +street noise +traffic sounds +@end example +Each unit enclosed in square brackets [] creates a classification group. Within each group, categories are defined with parentheses () and the labels under each category are used to classify the input. +@subsection Examples +@example +Classify video using OpenVINO +ffmpeg -i input.mp4 -vf "dnn_classify=dnn_backend=openvino:model=model.xml:labels=labels.txt" output.mp4 +Classify video using CLIP +ffmpeg -i input.mp4 -vf "dnn_classify=dnn_backend=torch:model=clip_model.pt:categories=categories.txt:tokenizer=tokenizer.json" output.mp4 +Classify only person objects in a video +ffmpeg -i input.mp4 -vf "dnn_detect=model=detection.xml:input=data:output=detection_out:confidence=0.5,dnn_classify=model=clip_model.pt:dnn_backend=torch:tokenizer=tokenizer.json:labels=labels.txt:target=person" output.mp4 +Classify audio using CLAP +ffmpeg -i input.mp3 -af "dnn_classify=dnn_backend=torch:model=clap_model.pt:categories=audio_categories.txt:tokenizer=tokenizer.json:is_audio=1:sample_rate=44100:sample_duration=7" output.mp3 +@end example @section dnn_detect -- 2.34.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
reply other threads:[~2025-03-08 15:01 UTC|newest] Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='007b01db903a$ebc0ddf0$c34299d0$@gmail.com' \ --to=m.kaindl0208@gmail.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git