From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 79A3A4BAEF for ; Wed, 23 Jul 2025 08:43:53 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 4BEF668D500; Wed, 23 Jul 2025 11:43:50 +0300 (EEST) Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 4AF7868D4C0 for ; Wed, 23 Jul 2025 11:43:44 +0300 (EEST) Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-4ab6416496dso75860601cf.1 for ; Wed, 23 Jul 2025 01:43:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1753260222; x=1753865022; darn=ffmpeg.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=p2d+qUCmRqCQK8+aLksR1qTXY2r4lZrJQJmMvFcoN1E=; b=VXQF0ps9qt9mbNfHo9j4iPeIuNQuGkZzL28NZLnjqmx8sI5F0jcwhV8FXiOC0rlorc vdy6IXoW5gVoSLUuo17IhJMTlsBzqplv6lOome7C5Cuv6b0Arpo+Ay9aglvbCciyjBrq jJZ7HTJiobd6EFSvCtHPOph9SIdUhAS28sX1RyHaHDCCV2USgBstFEVTtL3w/yhoaPEE a4PSk4A1UtJ248fKSKvpYGH4K7sHReM77T5oaUBk8z9g1GiTYM0NhBhGVyBNLQUIAfMa RoX7CoAy+qV8A7cZSIrLYzdicu5EpD1X9onfPGn+SG5hiRaIwJNLk1dV91UGHhas3hGe +L3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753260222; x=1753865022; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=p2d+qUCmRqCQK8+aLksR1qTXY2r4lZrJQJmMvFcoN1E=; b=UKG4LFSfXBlrgeJBrOBDGnm+N0TOF+OnIcgLon7QfC6TqnEcy6A+xHUriIgKdiWnkD vreEYIa6cjQFrJ2zfbYwvtp+3Nn7ModebgkrFNSbRfhmHeraQb2AZvsikgtAURGcBz8O IDgD6EQWZkOEI8aqyq+u1wZ7k7igHx2xhjQ1L+Mj/bMn/0RTFYSvlT/93WOmgzOOyk1H 3D/Dv6iWbEqia9un7+QFo0yHxJS5w0L3+kucBLYLD2S0EeA9wRdvF7syOINotrr0oPw7 QkY9Ee1HwH0WNsY3A5aaVW00KZ5sTwevuXfW8V7kNvDOUbnbyxxu3DmJYv/LLmLa/Elx Plqg== X-Gm-Message-State: AOJu0YzEKzL8XyZau6B+7CFLr0WkfDed2KvoeFIxShVniqmOIT9u9QF8 m6p8HTFyAXNw1/BvkYK+mYblJqRvj7E6mK9A61QHGg3Jt6oviJT+6SooDZvFmDg1wVi9QO99yxI 3mtvfNi66ielzkSe49VoH7hnQZDrDuNWddEiI X-Gm-Gg: ASbGncuq8/VCDw9KtcOo8eoDr3bFKlkyahH1RX4csyxi/nksD1bhZwmipaQYxDerrV5 ddoKLD3uJnSLUEiSdoaxXpMRPm1HmO0mCqF9kLTxFcDOGx4eDzvZZS9i+OsGzoDim+roSJA4pDf ADdEq9eVMWYDFJPLfuS8PCR9z/+ooeflf8WboQN3xJlhqUsQyFm9D6obnBVvzGurAl4Jng/kj3f QtxxIQ= X-Google-Smtp-Source: AGHT+IEZXB/iQfpMZSHKZ6qVbTPGG7sxO1Qpo+Sn4wfObwG2JUDiIV2ChoVdesW1Inol+agI3tSQm/D5fiIGHsWRP3E= X-Received: by 2002:ac8:5d0e:0:b0:4ab:9335:7af4 with SMTP id d75a77b69052e-4ae6de21188mr28406141cf.2.1753260222382; Wed, 23 Jul 2025 01:43:42 -0700 (PDT) MIME-Version: 1.0 References: <20250719001553.GP29660@pb2> <20250719125526.389239-1-vpalmisano@gmail.com> <20250720012209.GW29660@pb2> In-Reply-To: <20250720012209.GW29660@pb2> From: Vittorio Palmisano Date: Wed, 23 Jul 2025 10:43:16 +0200 X-Gm-Features: Ac12FXxwW3i1gjtfxG0TB5xIEKhfNyiAIOI52B4XWVDIGKHTkjUj-hjDYeZRf1w Message-ID: To: FFmpeg development discussions and patches Subject: Re: [FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Hi, I've applied some changes and created a pull request: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20022 > > > + frames = FFMAX(0, FFMIN(frames, wctx->audio_buffer_fill_size)); > > I would call it samples, sample_count or nb_samples > > why are you cliping the number of samples ? > > I assume run_transcription() would be called with the correct number or am i missing > something ? When using the VAD option, we want to process only a portion of the total samples stored into the buffer (up to the detected silence). > A bigger problem is that the input frame->pts are not passed through to the output > srt/json timestamps. > > To understand why this is a problem, consider some audio input device > which samples at 16khz. This hardware contains lets say for simplicity a 16khz > crystal and samples based on that. But depending on temperature of this > crystal it will really sample lets say between 15990 and 16010khz. So > simply counting samples alone is not enough. the frame->pts need to be > used too. > If the subtitles should be perfectly in sync with the video > > Its probably best to give the user the option to produce srt/json times > based purely on sample numbers but also on pts. Ok, let me think about using pts instead. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".