From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 10EAC4ADFE for ; Mon, 22 Dec 2025 04:54:32 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'nZvrYNYF/VhdwecB37l1WCdc/setiRyV+3QUT6lmDJw=', expected b'IZMIvsM6tKtyZFXrHcv+DhEYTRkipQg1IkJeoe2p2/M=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1766379206; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=nZvrYNYF/VhdwecB37l1WCdc/setiRyV+3QUT6lmDJw=; b=lQFswgxe9ri8KZm8AUvmkPW3xgNIxPCfCpCr/bdPE0+FPVJIxhHKHckt1N34999ZuaJ1f 5qpK0TVCKSBuiZ4M5gsfB1ALOxiDizUgC5fiRrYKJ9I60pD++NWQ94bfMP+NzFFkL5B02Yz Qgd99Kd5ecjQ1iy45TIIY31IbXmYZ88ecHsN+kU462UhJedn/GnWsEIOZsrK4cQ3CCeLAt8 SjufY3bnQK/ATsRKxz2pAwxken6JUuQR2lXuFAUmu+t6TsauzFgokZJPVL4VBBtpLP8eVmy WOwF5AyfwH9WJx6gZ/RYsYMCzsbmxLdxKP73U4OYqw7qjeLQl8GFBUnDkKIA== Received: from [172.20.0.2] (unknown [172.19.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id BDA4A690AB7; Mon, 22 Dec 2025 06:53:26 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1766379179; b=GCCV+z+7Lto804zj6+Qt8YCTVl1Rep+UwyuYJyNg2scB102uWU/2cybVPkQTtIBIFr0s9 Dxd0Z7Dgfxqxajt41+JzJStK1AR20TgTWJ9bRLyz3XAO+63tQX/nhHas6D1/OMf0ubeNUf/ 4HCYJHiMpNudpJTsWX7uFP2QjU3CFSc8jWim15dAd/R03u6S3mhO8vSx+06+imbYWLNuWjX UlxAvSpny+t8SQBwdrDE8IAE9AD3vpPuEJ4XkQmbKI4j+jA88rKeA5rRazmeIMjVUDYxgs0 2XCeI1sAZtWhvoJgze6uu30OfykiMUwn0qO9ZhO9zbswYbSU2T+e9XoTpJgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1766379179; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=yBj+NhNC20unWuvY03j2cGKIj8mf5pXbtrqocDiFKLU=; b=kCTwaDSb+yrWO2P/G2krz5w8OVRAEgt7yKGXZftqa7XbJP2WmJ5Nba7NA+CaZ4p4GUwlg dQO78tDI5NwMkcKD5PhNlKhHG4v9/uyyZiTfxqFtlTRHuUJvh6DgB7TaXhAwDt29qvh+ra2 4IIjhjuBEFESYgfBDLPeiEtSgRbtJN+6SZij5xVLRMCdsCdfbxzT5+uQPhqrzWD/Tgoc8IQ suSJ0yeYZjrLI/E5d0OlAc+GV7Uwy3pf2CiU+951bK9ZVT+gskvgqEo9jxcmP2UtvbnqZDN ybwlXbaaipLK5AYCu2i4YHFdBjEGM3x0LsgjNWb4OypNspxyCZ+/yXMcAkVw== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1766379155; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=IZMIvsM6tKtyZFXrHcv+DhEYTRkipQg1IkJeoe2p2/M=; b=LIrd1mDNQW7ks8SiGOlH/FzYhXOewPRDTnc7HwQyUU6DHce2U7OAXPsYvGJhKxeiqW7n3 23ByWQePAgN0Fu+h3VQKKo8AxlW8sh25peTf1u3J5sqkdx4EJNpZnKJNIf2I1cmlu3bpmsg oLkDKUgPMP+LlhrlC9n6FJCBMBaPdbvvMAykEwAaGaCd9Ib2zPf7ewj+TOiEptAtErqoixC RtwRZfmJDkWh+N8XYALaFQ1s0TZpkh1gWxpI433d9xaWt7ukCB1GVbsTpQG5eea858zuhRF 4yDfI/HJmDq6u4GE1wTtzWFOWB23ckRz00+8TstzcPgImhzjNTdjZx24E3pA== Received: from 55ca25703178 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id D4EC2690A3D for ; Mon, 22 Dec 2025 06:52:35 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Mon, 22 Dec 2025 04:52:35 -0000 Message-ID: <176637915645.60.15086184433482053711@2cb04c0e5124> Message-ID-Hash: 5RCBGY65XGQN43DZSWI72MW3YB5XFPXW X-Message-ID-Hash: 5RCBGY65XGQN43DZSWI72MW3YB5XFPXW X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] avfilter/af_whisper: Add max_len parameter (PR #21259) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: WyattBlue via ffmpeg-devel Cc: WyattBlue Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #21259 opened by WyattBlue URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21259 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21259.patch This closes #20333 >>From d387c186321ab5e8ebff92521c178dbd90475388 Mon Sep 17 00:00:00 2001 From: WyattBlue Date: Sun, 21 Dec 2025 23:51:15 -0500 Subject: [PATCH] avfilter/af_whisper: Add max_len parameter --- libavfilter/af_whisper.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/libavfilter/af_whisper.c b/libavfilter/af_whisper.c index 3c0eba42f0..7e1b27e21b 100644 --- a/libavfilter/af_whisper.c +++ b/libavfilter/af_whisper.c @@ -52,6 +52,7 @@ typedef struct WhisperContext { int64_t queue; char *destination; char *format; + int max_len; struct whisper_context *ctx_wsp; struct whisper_vad_context *ctx_vad; @@ -204,6 +205,8 @@ static void run_transcription(AVFilterContext *ctx, AVFrame *frame, int samples) params.print_progress = 0; params.print_realtime = 0; params.print_timestamps = 0; + params.max_len = wctx->max_len; + params.token_timestamps = (wctx->max_len > 0); if (whisper_full(wctx->ctx_wsp, params, wctx->audio_buffer, samples) != 0) { av_log(ctx, AV_LOG_ERROR, "Failed to process audio with whisper.cpp\n"); @@ -224,6 +227,14 @@ static void run_transcription(AVFilterContext *ctx, AVFrame *frame, int samples) continue; } + // Skip segments that are parts of [BLANK_AUDIO] when max_len splits them + if (wctx->max_len > 0 && (strcmp(text_cleaned, "[") == 0 || strcmp(text_cleaned, "]") == 0 || + strcmp(text_cleaned, "BLANK") == 0 || strcmp(text_cleaned, "_") == 0 || + strcmp(text_cleaned, "AUDIO") == 0)) { + av_freep(&text_cleaned); + continue; + } + const bool turn = whisper_full_get_segment_speaker_turn_next(wctx->ctx_wsp, i); const int64_t t0_ms = whisper_full_get_segment_t0(wctx->ctx_wsp, i) * 10; const int64_t t1_ms = whisper_full_get_segment_t1(wctx->ctx_wsp, i) * 10; @@ -437,6 +448,7 @@ static const AVOption whisper_options[] = { { "gpu_device", "GPU device to use", OFFSET(gpu_device), AV_OPT_TYPE_INT, {.i64 = 0}, 0, INT_MAX, .flags = FLAGS }, { "destination", "Output destination", OFFSET(destination), AV_OPT_TYPE_STRING, {.str = ""}, .flags = FLAGS }, { "format", "Output format (text|srt|json)", OFFSET(format), AV_OPT_TYPE_STRING, {.str = "text"},.flags = FLAGS }, + { "max_len", "Max segment length in characters", OFFSET(max_len), AV_OPT_TYPE_INT, {.i64 = 0}, 0, INT_MAX, .flags = FLAGS }, { "vad_model", "Path to the VAD model file", OFFSET(vad_model_path), AV_OPT_TYPE_STRING,.flags = FLAGS }, { "vad_threshold", "VAD threshold", OFFSET(vad_threshold), AV_OPT_TYPE_FLOAT, {.dbl = 0.5}, 0.0, 1.0, .flags = FLAGS }, { "vad_min_speech_duration", "Minimum speech duration for VAD", OFFSET(vad_min_speech_duration), AV_OPT_TYPE_DURATION, {.i64 = 100000}, 20000, HOURS, .flags = FLAGS }, -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org