From: Michael Niedermayer <michael@niedermayer.cc> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Cc: Vittorio Palmisano <vpalmisano@gmail.com> Subject: Re: [FFmpeg-devel] [PATCH] Whisper audio filter Date: Thu, 10 Jul 2025 01:37:46 +0200 Message-ID: <20250709233746.GM29660@pb2> (raw) In-Reply-To: <20250709072350.578693-1-vpalmisano@gmail.com> [-- Attachment #1.1: Type: text/plain, Size: 5592 bytes --] Hi Vittorio On Wed, Jul 09, 2025 at 09:23:48AM +0200, Vittorio Palmisano wrote: > It adds a new audio filter for running audio transcriptions with the whisper model. Iam happy to see someone contribute a whisper filter! [...] > +@example > +ffmpeg -i input.mp4 -vn -af "aformat=sample_rates=16000:channel_layouts=mono,whisper= Is there a reason why we convert to 16khz mono here ? > +model=../whisper.cpp/models/ggml-base.en.bin\ It would be nice if the models would be in a standard location, so teh user just has to specify the model name and not the path Maybe teh filter could check some "standard" locations I dont know what path is standard, but maybe something like: /usr/local/share/whisper.cpp/models ~/.whisper.cpp/models > +:language=en\ > +:queue=3000\ > +:destination=output.srt\ > +:format=srt" -f null - format can be deducted from the destination file extension. I tried this: ./ffmpeg -i matrixbench_mpeg2.mpg -vn -af "aformat=sample_rates=16000:channel_layouts=mono,whisper=model=/home/michael/whisper.cpp/models/ggml-base.en.bin:language=en:queue=3000:destination=output.srt:format=srt" -f null - but the output.srt is empty (0 bytes) [...] > +static void cb_log_disable(enum ggml_log_level, const char *, void *) {} libavfilter/af_whisper.c: In function ‘cb_log_disable’: libavfilter/af_whisper.c:75:28: error: parameter name omitted 75 | static void cb_log_disable(enum ggml_log_level, const char *, void *) {} libavfilter/af_whisper.c:75:49: error: parameter name omitted 75 | static void cb_log_disable(enum ggml_log_level, const char *, void *) {} | ^~~~~~~~~~~~ libavfilter/af_whisper.c:75:63: error: parameter name omitted 75 | static void cb_log_disable(enum ggml_log_level, const char *, void *) {} > + > +static int init(AVFilterContext *ctx) > +{ > + WhisperContext *wctx = ctx->priv; > + > + ggml_backend_load_all(); > + whisper_log_set(cb_log_disable, NULL); > + > + // Init whisper context > + if (!wctx->model_path) > + { > + av_log(ctx, AV_LOG_ERROR, "No whisper model path specified. Use the 'model' option.\n"); > + return AVERROR(EINVAL); > + } > + > + struct whisper_context_params params = whisper_context_default_params(); > + params.use_gpu = wctx->use_gpu; > + params.gpu_device = wctx->gpu_device; > + > + wctx->ctx_wsp = whisper_init_from_file_with_params(wctx->model_path, params); > + if (wctx->ctx_wsp == NULL) > + { > + av_log(ctx, AV_LOG_ERROR, "Failed to initialize whisper context from model: %s\n", wctx->model_path); > + return AVERROR(EIO); > + } > + > + wctx->whisper_state = whisper_init_state(wctx->ctx_wsp); > + if (wctx->whisper_state == NULL) > + { > + av_log(ctx, AV_LOG_ERROR, "Failed to get whisper state from context\n"); > + whisper_free(wctx->ctx_wsp); > + wctx->ctx_wsp = NULL; > + return AVERROR(EIO); > + } > + > + // Init VAD model context > + if (wctx->vad_model_path) > + { > + struct whisper_vad_context_params ctx_params = whisper_vad_default_context_params(); > + ctx_params.n_threads = 4; > + // ctx_params.use_gpu = wctx->use_gpu; TODO (see: whisper_vad_init_context) > + ctx_params.gpu_device = wctx->gpu_device; > + wctx->ctx_vad = whisper_vad_init_from_file_with_params( > + wctx->vad_model_path, > + ctx_params); > + > + wctx->vad_params = whisper_vad_default_params(); > + wctx->vad_params.threshold = wctx->vad_threshold; > + wctx->vad_params.min_speech_duration_ms = wctx->vad_min_speech_duration; > + wctx->vad_params.min_silence_duration_ms = wctx->vad_min_silence_duration; > + wctx->vad_params.max_speech_duration_s = (float)(wctx->audio_buffer_queue_size / 1000.0f); teh float cast is unneeded > + wctx->vad_params.speech_pad_ms = 0; > + wctx->vad_params.samples_overlap = 0; > + } > + > + // Init buffer > + wctx->audio_buffer_queue_size = WHISPER_SAMPLE_RATE * wctx->queue / 1000; > + wctx->audio_buffer = av_malloc(wctx->audio_buffer_queue_size * sizeof(float)); > + if (!wctx->audio_buffer) > + { > + return AVERROR(ENOMEM); > + } > + > + wctx->audio_buffer_fill_size = 0; > + > + wctx->next_pts = AV_NOPTS_VALUE; > + > + wctx->avio_context = NULL; arent things already initialized to 0 ? > + if (wctx->destination && strcmp("", wctx->destination)) > + { > + int ret = 0; useless initialization > + > + if (!strcmp("-", wctx->destination)) > + { > + ret = avio_open(&wctx->avio_context, "pipe:1", AVIO_FLAG_WRITE); > + } > + else > + { > + ret = avio_open(&wctx->avio_context, wctx->destination, AVIO_FLAG_WRITE); > + } const char *dst = wctx->destination; if (!strcmp("-", wctx->destination)) dst = "pipe:1"; int ret = avio_open(&wctx->avio_context, dst, AVIO_FLAG_WRITE); [...] > + if (segments_text) > + { > + av_free(segments_text); > + } the NULL check isnt needed and please use av_freep(&) instead of av_free() as it clears the pointer and thats just more robust thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB The educated differ from the uneducated as much as the living from the dead. -- Aristotle [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] [-- Attachment #2: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2025-07-09 23:37 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2025-07-09 7:23 Vittorio Palmisano 2025-07-09 13:36 ` Marvin Scholz 2025-07-09 15:24 ` Zhao Zhili 2025-07-09 23:37 ` Michael Niedermayer [this message] 2025-07-09 23:41 ` Michael Niedermayer
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20250709233746.GM29660@pb2 \ --to=michael@niedermayer.cc \ --cc=ffmpeg-devel@ffmpeg.org \ --cc=vpalmisano@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git