From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 915804E422 for ; Wed, 9 Jul 2025 23:37:59 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 47A2268F80F; Thu, 10 Jul 2025 02:37:55 +0300 (EEST) Received: from relay2-d.mail.gandi.net (relay2-d.mail.gandi.net [217.70.183.194]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 52BC268E364 for ; Thu, 10 Jul 2025 02:37:48 +0300 (EEST) Received: by mail.gandi.net (Postfix) with ESMTPSA id 6632243184; Wed, 9 Jul 2025 23:37:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc; s=gm1; t=1752104267; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=lJijpDMVxZ1cwGIl1EJa9ZVRWp1DFKBs7D8glNpUWjA=; b=UDM813SlUY/4vUv0C1qu8pdCKudCzVmqlMH2LSw5+EiSTC6j10In2VmAm24gkX4aRN8E+l skaRr11efeyeW+bE7UKGrD8KQRjv+pyEB24f7fnmRczBByYgxKuwK28A52+Ng8BGXsetQy /FVZXiSeqdoIF7hO5gspsw/8jSy1UYJ1WASPcO9R8d2XMl6iCK3GdIE7nEMGiDXMHeSZ8f 8O2AvkTYb3K9+mwyq6lq2TCkeMULAv6OVlRozozJ8YJx2q/KAfcGWsDbBQVNhwCxM962bq WrR0vgNRVAUUdspI8h+Xp6lfetktK1Vd+8fBsiZPSREZCmXefKiH81WtD3302Q== Date: Thu, 10 Jul 2025 01:37:46 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20250709233746.GM29660@pb2> References: <20250709072350.578693-1-vpalmisano@gmail.com> MIME-Version: 1.0 In-Reply-To: <20250709072350.578693-1-vpalmisano@gmail.com> X-GND-State: clean X-GND-Score: -70 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgdefkeeltdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfitefpfffkpdcuggftfghnshhusghstghrihgsvgenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnegfrhhlucfvnfffucdlfedtmdenucfjughrpeffhffvvefukfhfgggtuggjsehgtderredttdejnecuhfhrohhmpefoihgthhgrvghlucfpihgvuggvrhhmrgihvghruceomhhitghhrggvlhesnhhivgguvghrmhgrhigvrhdrtggtqeenucggtffrrghtthgvrhhnpeeiiedtteduteeujefgieetffffffelgffgteetgeejkeeuieelffegieeitedvteenucfkphepgedurdeiiedrieehrddujeeinecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepgedurdeiiedrieehrddujeeipdhhvghloheplhhotggrlhhhohhsthdpmhgrihhlfhhrohhmpehmihgthhgrvghlsehnihgvuggvrhhmrgihvghrrdgttgdpnhgspghrtghpthhtohepvddprhgtphhtthhopehffhhmphgvghdquggvvhgvlhesfhhfmhhpvghgrdhorhhgpdhrtghpthhtohepvhhprghlmhhishgrnhhosehgmhgrihhlrdgtohhm X-GND-Sasl: michael@niedermayer.cc Subject: Re: [FFmpeg-devel] [PATCH] Whisper audio filter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Vittorio Palmisano Content-Type: multipart/mixed; boundary="===============6884395291205220527==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============6884395291205220527== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="DDVzwrm+FWlZjJBm" Content-Disposition: inline --DDVzwrm+FWlZjJBm Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Vittorio On Wed, Jul 09, 2025 at 09:23:48AM +0200, Vittorio Palmisano wrote: > It adds a new audio filter for running audio transcriptions with the whis= per model. Iam happy to see someone contribute a whisper filter! [...] > +@example > +ffmpeg -i input.mp4 -vn -af "aformat=3Dsample_rates=3D16000:channel_layo= uts=3Dmono,whisper=3D Is there a reason why we convert to 16khz mono here ? > +model=3D../whisper.cpp/models/ggml-base.en.bin\ It would be nice if the models would be in a standard location, so teh user just has to specify the model name and not the path Maybe teh filter could check some "standard" locations I dont know what path is standard, but maybe something like: /usr/local/share/whisper.cpp/models ~/.whisper.cpp/models > +:language=3Den\ > +:queue=3D3000\ > +:destination=3Doutput.srt\ > +:format=3Dsrt" -f null - format can be deducted from the destination file extension. I tried this: =2E/ffmpeg -i matrixbench_mpeg2.mpg -vn -af "aformat=3Dsample_rates=3D16000= :channel_layouts=3Dmono,whisper=3Dmodel=3D/home/michael/whisper.cpp/models/= ggml-base.en.bin:language=3Den:queue=3D3000:destination=3Doutput.srt:format= =3Dsrt" -f null - but the output.srt is empty (0 bytes) [...] > +static void cb_log_disable(enum ggml_log_level, const char *, void *) {} libavfilter/af_whisper.c: In function =E2=80=98cb_log_disable=E2=80=99: libavfilter/af_whisper.c:75:28: error: parameter name omitted 75 | static void cb_log_disable(enum ggml_log_level, const char *, void = *) {} libavfilter/af_whisper.c:75:49: error: parameter name omitted 75 | static void cb_log_disable(enum ggml_log_level, const char *, void = *) {} | ^~~~~~~~~~~~ libavfilter/af_whisper.c:75:63: error: parameter name omitted 75 | static void cb_log_disable(enum ggml_log_level, const char *, void = *) {} > + > +static int init(AVFilterContext *ctx) > +{ > + WhisperContext *wctx =3D ctx->priv; > + > + ggml_backend_load_all(); > + whisper_log_set(cb_log_disable, NULL); > + > + // Init whisper context > + if (!wctx->model_path) > + { > + av_log(ctx, AV_LOG_ERROR, "No whisper model path specified. Use = the 'model' option.\n"); > + return AVERROR(EINVAL); > + } > + > + struct whisper_context_params params =3D whisper_context_default_par= ams(); > + params.use_gpu =3D wctx->use_gpu; > + params.gpu_device =3D wctx->gpu_device; > + > + wctx->ctx_wsp =3D whisper_init_from_file_with_params(wctx->model_pat= h, params); > + if (wctx->ctx_wsp =3D=3D NULL) > + { > + av_log(ctx, AV_LOG_ERROR, "Failed to initialize whisper context = =66rom model: %s\n", wctx->model_path); > + return AVERROR(EIO); > + } > + > + wctx->whisper_state =3D whisper_init_state(wctx->ctx_wsp); > + if (wctx->whisper_state =3D=3D NULL) > + { > + av_log(ctx, AV_LOG_ERROR, "Failed to get whisper state from cont= ext\n"); > + whisper_free(wctx->ctx_wsp); > + wctx->ctx_wsp =3D NULL; > + return AVERROR(EIO); > + } > + > + // Init VAD model context > + if (wctx->vad_model_path) > + { > + struct whisper_vad_context_params ctx_params =3D whisper_vad_def= ault_context_params(); > + ctx_params.n_threads =3D 4; > + // ctx_params.use_gpu =3D wctx->use_gpu; TODO (see: whisper_vad_= init_context) > + ctx_params.gpu_device =3D wctx->gpu_device; > + wctx->ctx_vad =3D whisper_vad_init_from_file_with_params( > + wctx->vad_model_path, > + ctx_params); > + > + wctx->vad_params =3D whisper_vad_default_params(); > + wctx->vad_params.threshold =3D wctx->vad_threshold; > + wctx->vad_params.min_speech_duration_ms =3D wctx->vad_min_speech= _duration; > + wctx->vad_params.min_silence_duration_ms =3D wctx->vad_min_silen= ce_duration; > + wctx->vad_params.max_speech_duration_s =3D (float)(wctx->audio_b= uffer_queue_size / 1000.0f); teh float cast is unneeded > + wctx->vad_params.speech_pad_ms =3D 0; > + wctx->vad_params.samples_overlap =3D 0; > + } > + > + // Init buffer > + wctx->audio_buffer_queue_size =3D WHISPER_SAMPLE_RATE * wctx->queue = / 1000; > + wctx->audio_buffer =3D av_malloc(wctx->audio_buffer_queue_size * siz= eof(float)); > + if (!wctx->audio_buffer) > + { > + return AVERROR(ENOMEM); > + } > + > + wctx->audio_buffer_fill_size =3D 0; > + > + wctx->next_pts =3D AV_NOPTS_VALUE; > + > + wctx->avio_context =3D NULL; arent things already initialized to 0 ? > + if (wctx->destination && strcmp("", wctx->destination)) > + { > + int ret =3D 0; useless initialization > + > + if (!strcmp("-", wctx->destination)) > + { > + ret =3D avio_open(&wctx->avio_context, "pipe:1", AVIO_FLAG_W= RITE); > + } > + else > + { > + ret =3D avio_open(&wctx->avio_context, wctx->destination, AV= IO_FLAG_WRITE); > + } const char *dst =3D wctx->destination; if (!strcmp("-", wctx->destination)) dst =3D "pipe:1"; int ret =3D avio_open(&wctx->avio_context, dst, AVIO_FLAG_WRITE); [...] > + if (segments_text) > + { > + av_free(segments_text); > + } the NULL check isnt needed and please use av_freep(&) instead of av_free() as it clears the pointer and thats just more robust thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB The educated differ from the uneducated as much as the living from the dead. -- Aristotle=20 --DDVzwrm+FWlZjJBm Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCaG79RwAKCRBhHseHBAsP q51RAJ4xG3XCFd/S8N3Wc1j+64i6r5C5DgCeIrFSQl+MUUy+DgyedELhhms6AME= =Jpid -----END PGP SIGNATURE----- --DDVzwrm+FWlZjJBm-- --===============6884395291205220527== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============6884395291205220527==--