From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTPS id 5921A4E5ED
	for <ffmpegdev@gitmailbox.com>; Sat, 12 Jul 2025 00:03:41 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id D306069040C;
	Sat, 12 Jul 2025 03:03:38 +0300 (EEST)
Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net
 [217.70.183.195])
 by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id CAB676903E6
 for <ffmpeg-devel@ffmpeg.org>; Sat, 12 Jul 2025 03:03:31 +0300 (EEST)
Received: by mail.gandi.net (Postfix) with ESMTPSA id 1E4761FD3A
 for <ffmpeg-devel@ffmpeg.org>; Sat, 12 Jul 2025 00:03:30 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc;
 s=gm1; t=1752278611;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=3ipWxGJhvaP1OCWMsCtsVpNoyLrV5cvB50L2cqNb9iY=;
 b=NQaBTVcpC52XIJ5lhgzLvwpTWC6RO0DnUHxFtgA8Fn0up755RApAo4MqinxCzvxo8SdkDM
 o6Nr74eu2MhnNzFNyvCac7N8K23GN3tyrerc4q2TaIXRbCexcWXY55zv7AabFKiupKa7Iy
 YV7gFi+aIy0tJw8sZyPs3JmpvXMhnvScclXvHf+xk+ROSpibsojnZd2CQU8iMjs+7BOEk9
 oVhehR52LRqZG0r0nkPFLsP8ZKVcGbl+hlfUtZTKDnzbasuX2UevWjRYHIjBxmprGGlufJ
 LZWC/Bf7tcS9AfbkGdGSmQtQv1dQ6OGpg7EuC8X3s7n4fDp3R/U23HFh5SW/bw==
Date: Sat, 12 Jul 2025 02:03:30 +0200
From: Michael Niedermayer <michael@niedermayer.cc>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Message-ID: <20250712000330.GX29660@pb2>
References: <CADv15W-ojbSUc7oPLhNPBdDf_00TZPAtDqTZT2kG=gdwaB9Hdw@mail.gmail.com>
 <20250710102543.1002696-1-vpalmisano@gmail.com>
 <20250710122008.GP29660@pb2>
 <CADv15W-W3=VkGcJfnnbD7mw5JdxhB7Vn+Dr_O4+4Pt47YSnHqg@mail.gmail.com>
MIME-Version: 1.0
In-Reply-To: <CADv15W-W3=VkGcJfnnbD7mw5JdxhB7Vn+Dr_O4+4Pt47YSnHqg@mail.gmail.com>
X-GND-State: clean
X-GND-Score: -85
X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgdeggeejudcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfitefpfffkpdcuggftfghnshhusghstghrihgsvgenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnegfrhhlucfvnfffucdludehmdenucfjughrpeffhffvuffkfhggtggujgesghdtreertddtvdenucfhrhhomhepofhitghhrggvlhcupfhivgguvghrmhgrhigvrhcuoehmihgthhgrvghlsehnihgvuggvrhhmrgihvghrrdgttgeqnecuggftrfgrthhtvghrnheptefggedvffeiueffvefhiedtgfefjedukeefgeetgeevgeejgeekvdevjeelveeknecuffhomhgrihhnpehgihhthhhusgdrtghomhenucfkphepgedurdeiiedrieehrddujeeinecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepgedurdeiiedrieehrddujeeipdhhvghloheplhhotggrlhhhohhsthdpmhgrihhlfhhrohhmpehmihgthhgrvghlsehnihgvuggvrhhmrgihvghrrdgttgdpnhgspghrtghpthhtohepuddprhgtphhtthhopehffhhmphgvghdquggvvhgvlhesfhhfmhhpvghgrdhorhhg
X-GND-Sasl: michael@niedermayer.cc
Subject: Re: [FFmpeg-devel] [PATCH] Whisper audio filter
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: multipart/mixed; boundary="===============0948437990630988581=="
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/20250712000330.GX29660@pb2/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>


--===============0948437990630988581==
Content-Type: multipart/signed; micalg=pgp-sha512;
	protocol="application/pgp-signature"; boundary="8VuCyDveSafOkxcj"
Content-Disposition: inline


--8VuCyDveSafOkxcj
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Vittorio

On Fri, Jul 11, 2025 at 10:41:04AM +0200, Vittorio Palmisano wrote:
> > > +
> > > +    memcpy(wctx->audio_buffer, wctx->audio_buffer + end_pos,
> > > +           end_pos * sizeof(float));
> >
> > sizeof(*wctx->audio_buffer) is more robust than float
>=20
> But end_pos is not necessarily equal to the audio_buffer size, it
> could be lower.

you misunderstood

sizeof(*wctx->audio_buffer) =3D=3D sizeof(float)

I was just sugesting to use the "type of the array" not to repeat
the type in the source


>=20
> >
> > not sure how others think of this, but i would ignore the 80 char limit=
 and format this like:
> >
> > static const AVOption whisper_options[] =3D {
> >     { "model"   , "Path to the whisper.cpp model file"                 =
, OFFSET(model_path), AV_OPT_TYPE_STRING,.flags =3D FLAGS },
> >     { "language", "Language for transcription ('auto' for auto-detect)"=
, OFFSET(language)  , AV_OPT_TYPE_STRING, {.str =3D "auto"},             .f=
lags =3D FLAGS },
>=20
> I've used `indent -i4 -kr -nut` to format the code.

human formatted code looks better than what indent generates.
We are not litterally using indent to format code.
the docs also say "The presentation is one inspired by 'indent -i4 -kr -nut=
'."

A human will add a space here or a empty line there or align things to make
everything be neatly formatted and readable.
indent is not a human and not AI.

AI produces this: (i didnt verify this is still correct, but it should
show that its more readable)

static const AVOption whisper_options[] =3D {
    { "model",                   "Path to the whisper.cpp model file",     =
             OFFSET(model_path),             AV_OPT_TYPE_STRING, {.str =3D =
NULL},     0, 0,        FLAGS },
    { "language",                "Language for transcription ('auto' for au=
to-detect)", OFFSET(language),               AV_OPT_TYPE_STRING, {.str =3D =
"auto"},   0, 0,        FLAGS },
    { "queue",                   "Audio queue size in milliseconds",       =
             OFFSET(queue),                  AV_OPT_TYPE_INT,    {.i64 =3D =
3000},     20, INT_MAX, FLAGS },
    { "use_gpu",                 "Use GPU for processing",                 =
             OFFSET(use_gpu),                AV_OPT_TYPE_BOOL,   {.i64 =3D =
1},        0, 1,        FLAGS },
    { "gpu_device",              "GPU device to use",                      =
             OFFSET(gpu_device),             AV_OPT_TYPE_INT,    {.i64 =3D =
0},        0, INT_MAX,  FLAGS },
    { "threads",                 "Number of threads to use",               =
             OFFSET(threads),                AV_OPT_TYPE_INT,    {.i64 =3D =
4},        0, INT_MAX,  FLAGS },
    { "destination",             "Output destination",                     =
             OFFSET(destination),            AV_OPT_TYPE_STRING, {.str =3D =
""},       0, 0,        FLAGS },
    { "format",                  "Output format (text|srt|json)",          =
             OFFSET(format),                 AV_OPT_TYPE_STRING, {.str =3D =
"text"},   0, 0,        FLAGS },
    { "vad_model",               "Path to the VAD model file",             =
             OFFSET(vad_model_path),         AV_OPT_TYPE_STRING, {.str =3D =
NULL},     0, 0,        FLAGS },
    { "vad_threshold",           "VAD threshold",                          =
             OFFSET(vad_threshold),          AV_OPT_TYPE_FLOAT,  {.dbl =3D =
0.5},      0.0, 1.0,    FLAGS },
    { "vad_min_speech_duration", "Minimum speech duration in milliseconds f=
or VAD",     OFFSET(vad_min_speech_duration),AV_OPT_TYPE_INT,    {.i64 =3D =
50},       20, INT_MAX, FLAGS },
    { "vad_min_silence_duration","Minimum silence duration in milliseconds =
for VAD",    OFFSET(vad_min_silence_duration),AV_OPT_TYPE_INT,   {.i64 =3D =
500},      0, INT_MAX,  FLAGS },
    { NULL }
};


>=20
> >
> > Also it seems, this is alot slower than whisper-cli
> >
> > time whisper-cli  matrix.wav -m ~/whisper.cpp/models/ggml-base.en.bin  =
--output-srt
> > real    0m16,283s
> > user    1m3,644s
> > sys     0m0,581s
> >
> >
> > time ./ffmpeg -v 99 -i matrix.wav -af "aformat=3Dsample_rates=3D16000:c=
hannel_layouts=3Dmono,whisper=3Dmodel=3D/home/michael/whisper.cpp/models/gg=
ml-base.en.bin:language=3Den:queue=3D3000:destination=3Doutput.srt:format=
=3Dsrt" -f null - 2> /tmp/log
> > real    1m30,827s
> > user    6m0,590s
> > sys     0m0,756s
> >
>=20
> Tested with: https://github.com/vpalmisano/webrtcperf/releases/download/v=
ideos-1.0/kt.mp4
> (and you need to increase the queue param to obtain a fair
> comparison):

This should be explained better in the documentation

it just says:

    @item queue
    The maximum size in milliseconds that will be queued into the filter be=
fore
    processing the audio with whisper
    Default value: @code{"3000"}

=46rom reading that i have no idea that its value affects speed.
I might guess it affects latency.
Please make this a bit more elaborate so the user has enough information
so she can select a queue value.
ATM she just has a example value which seemed slow

thx

[...]

--=20
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact

--8VuCyDveSafOkxcj
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCaHGmTgAKCRBhHseHBAsP
q2WDAJ0XbeQDydSJzBhda5dGvvyt6OPgQwCfe0tRncqB99WiZbNHtpmJDJpDFUU=
=71r5
-----END PGP SIGNATURE-----

--8VuCyDveSafOkxcj--

--===============0948437990630988581==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

--===============0948437990630988581==--