From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 8CBF545435 for ; Mon, 20 Oct 2025 05:01:59 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'P3CvvCVusARbcfKQkZ+S9MazHUi/ZOVhZXm4F0y499M=', expected b'fGhB1LWORvIzYPzVIy8pIMn/nm5EVkovEfdEJjTAiBg=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1760936493; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=P3CvvCVusARbcfKQkZ+S9MazHUi/ZOVhZXm4F0y499M=; b=vAaghFhTwe3PTxGvTVJOdHXFaoZJ1jR7NvOhGm1+C4BzuFte85YIX47hlUrGOjP8qLxnD 7ROSb8Ch27bOgokEVziK/mS40iIOI7lONA/Eg/BOxST61Daol+CD19CA2pQTos1jPc41D7B JCvf1XwX4mxNiVcWvbbo59Trx5BA3LOeDvucUzL3A5I4fPCBunTadBhd1GlbgGpov3kY2ad hoR0pJ18IHKJTU3aSfhuLVp4V1ilyBWsepz+Z9KOeYmJzegFq4VA5o1wPC0HNKi1zdyxjxx 2SoIdE52r5jloWY4xNLQG2UsUjB/dmOPekdMIq81feiqxoidNPkZ/kYsF8XA== Received: from [172.19.0.2] (unknown [172.19.0.2]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 07EA168F324; Mon, 20 Oct 2025 08:01:33 +0300 (EEST) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1760936469; b=Al5FfhIpXyBD+wDcPOPdLguWE9H7UdtxJc+I1jJfEM6jaCZd4YrvyewHmjrDP5mMeUkPx D9QiePEP411AF53EgmuXOwT3QH+lWiwiD2VZfS4KGvZNyDzw+03OdpJ3gzTjpVlDXL0OtGM cyFCTVe9DOcQ1K5XlbkysCV+uFo6TQj6vKh+XAYX3zIuX8xTTzTJ4j996w0YNP0Y/yOCJv5 IJ/A+P+JImpN6eZgzfDnboxMPU/ojKEvy52/ibn7Rdo99UOgwkGd4vRXqdaMgGBAp0SfePF FJ+4qxKnYQ8dTVu9axsRq5ijeCUXSZCerqu3QPQ0jq1OBgkJfj+8UJRPg8+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1760936469; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=DIItqLbkvTB+WnSw3S3Yqg5E9bxOxvgOsz1HwoKpSnI=; b=dxkWtq05X1QCEiULYtZUT2Ple8MUF9CDZEJA/6M3hikP7o8LmtzVljEbUw/90XRUqwUJv c9m2V4vDGVf46bf+MVkqNu50YPvrtUrqqOJ1CVKF+5NqHzH7rLK32mRvW6Wor5ejv+e8/qx ZxdPB3K+K/pMAOsbbDX7wSk8UpzyJlA9yNWy+ZPxnP/hg4FkaQ1smfYNlPL2ZnNL0+Fry/n Rz41Y4dbCf3lJopWAkmzhdTk3Ug1OdJiQJ25oA9KNm7XZ5EKJzJO0n5oQKvlS0Pw4gMHZnH HSPijnmHy0gga81wzukbBO7wmTdix2zg6c+WQa0AoQn0FiIHLVQ1etRoiUBQ== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1760936454; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=fGhB1LWORvIzYPzVIy8pIMn/nm5EVkovEfdEJjTAiBg=; b=4FWFQjbnjs8NVVZITiYgHPm1Z17RcCBAa+jyOm0bafDm1xbegFCiPtb1jhPVW5r3TB6xp 00+X9E53plBa8LPXQ1Z2UmBu+cJjGgJf9ZJHW2LMbmO0vkODk+LBnNq2rY2yAO2Z9l6R6Zr gKo3d8Rxc7XkcX24E5NJs+9D7LSV0yA1CEw3pk/fN4SOcRBC7ofu7fm6D7ClkMcDgy9ANeW ddVpmKa24onyFyJks+PTT5Mwro0Yxn7VML9cjLKarCg0p9yX4TtlYy/TMFa2qJ//PqHcjh8 w72xCxxiDerb0t1jjhSb+kYq9QLUydgytEZhE2PqUUMPiZ8AwKF+lJC2/YnQ== Received: from 547bf0a948a1 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 5673568F2E7 for ; Mon, 20 Oct 2025 08:00:54 +0300 (EEST) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Mon, 20 Oct 2025 05:00:53 -0000 Message-ID: <176093645488.62.5186839454505248468@bf907ddaa564> Message-ID-Hash: PPFSATHNKN6UNNHPL56WH7UIUNLNAWEA X-Message-ID-Hash: PPFSATHNKN6UNNHPL56WH7UIUNLNAWEA X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PATCH] release/8.0: libavfilter: backporting bugfix for whisper (PR #20722) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Zhao Zhili via ffmpeg-devel Cc: Zhao Zhili Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #20722 opened by Zhao Zhili (quink) URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20722 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20722.patch >>From bb65f51fce43d970e94eeef12dd20d3beabb559e Mon Sep 17 00:00:00 2001 From: Zhao Zhili Date: Fri, 15 Aug 2025 20:39:49 +0800 Subject: [PATCH 1/5] avfilter/af_whisper: fix broken output for multibyte character text + 1 can break a multibyte character, e.g., Chinese in UTF-8. There is no space at the beginning in this case. (cherry picked from commit 1d06e8ddcd8c14232d0a3c2b1c21e50b232549b4) --- libavfilter/af_whisper.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/libavfilter/af_whisper.c b/libavfilter/af_whisper.c index 186b624504..bd145d6d1d 100644 --- a/libavfilter/af_whisper.c +++ b/libavfilter/af_whisper.c @@ -215,7 +215,9 @@ static void run_transcription(AVFilterContext *ctx, AVFrame *frame, int samples) for (int i = 0; i < n_segments; ++i) { const char *text = whisper_full_get_segment_text(wctx->ctx_wsp, i); - char *text_cleaned = av_strireplace(text + 1, "[BLANK_AUDIO]", ""); + if (av_isspace(text[0])) + text++; + char *text_cleaned = av_strireplace(text, "[BLANK_AUDIO]", ""); if (av_strnlen(text_cleaned, 1) == 0) { av_freep(&text_cleaned); -- 2.49.1 >>From b784c3eb6d4f989b9e78f2d93e1b4d73018a1f4f Mon Sep 17 00:00:00 2001 From: Gyan Doshi Date: Sat, 16 Aug 2025 16:12:10 +0530 Subject: [PATCH 2/5] avfilter/whisper: correct option formatting (cherry picked from commit 7df92712723dee0ace3596684a90282c0e12a8ef) --- libavfilter/af_whisper.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libavfilter/af_whisper.c b/libavfilter/af_whisper.c index bd145d6d1d..385180b4ed 100644 --- a/libavfilter/af_whisper.c +++ b/libavfilter/af_whisper.c @@ -430,7 +430,8 @@ static int query_formats(const AVFilterContext *ctx, #define HOURS 3600000000 static const AVOption whisper_options[] = { - { "model", "Path to the whisper.cpp model file", OFFSET(model_path), AV_OPT_TYPE_STRING,.flags = FLAGS }, { "language", "Language for transcription ('auto' for auto-detect)", OFFSET(language), AV_OPT_TYPE_STRING, {.str = "auto"}, .flags = FLAGS }, + { "model", "Path to the whisper.cpp model file", OFFSET(model_path), AV_OPT_TYPE_STRING,.flags = FLAGS }, + { "language", "Language for transcription ('auto' for auto-detect)", OFFSET(language), AV_OPT_TYPE_STRING, {.str = "auto"}, .flags = FLAGS }, { "queue", "Audio queue size", OFFSET(queue), AV_OPT_TYPE_DURATION, {.i64 = 3000000}, 20000, HOURS, .flags = FLAGS }, { "use_gpu", "Use GPU for processing", OFFSET(use_gpu), AV_OPT_TYPE_BOOL, {.i64 = 1}, 0, 1, .flags = FLAGS }, { "gpu_device", "GPU device to use", OFFSET(gpu_device), AV_OPT_TYPE_INT, {.i64 = 0}, 0, INT_MAX, .flags = FLAGS }, -- 2.49.1 >>From adc819773ba98db56bbff0e4da8187e6ed3d7db9 Mon Sep 17 00:00:00 2001 From: Vittorio Palmisano Date: Fri, 29 Aug 2025 11:32:20 +0200 Subject: [PATCH 3/5] avfilter/af_whisper: fix srt file format The SRT file format requires commas in the time string, not periods. (cherry picked from commit 73d411c399df4abe2750b611fc8381979fcbafc6) --- libavfilter/af_whisper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavfilter/af_whisper.c b/libavfilter/af_whisper.c index 385180b4ed..663fe446bb 100644 --- a/libavfilter/af_whisper.c +++ b/libavfilter/af_whisper.c @@ -246,7 +246,7 @@ static void run_transcription(AVFilterContext *ctx, AVFrame *frame, int samples) if (!av_strcasecmp(wctx->format, "srt")) { buf = av_asprintf - ("%d\n%02ld:%02ld:%02ld.%03ld --> %02ld:%02ld:%02ld.%03ld\n%s\n\n", + ("%d\n%02ld:%02ld:%02ld,%03ld --> %02ld:%02ld:%02ld,%03ld\n%s\n\n", wctx->index, start_t / 3600000, (start_t / 60000) % 60, (start_t / 1000) % 60, start_t % 1000, end_t / 3600000, (end_t / 60000) % 60, -- 2.49.1 >>From d8049e01d79df68520371fc8e0a93c5dd786df06 Mon Sep 17 00:00:00 2001 From: Vittorio Palmisano Date: Sun, 21 Sep 2025 15:19:51 +0200 Subject: [PATCH 4/5] avfilter/af_whisper: fix int64 printf format Use PRId64 for printing int64_t values in the SRT output. (cherry picked from commit f18b1e23890c1179d41c07f2c195c56fb90ea072) --- libavfilter/af_whisper.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libavfilter/af_whisper.c b/libavfilter/af_whisper.c index 663fe446bb..90b2fc89a0 100644 --- a/libavfilter/af_whisper.c +++ b/libavfilter/af_whisper.c @@ -150,7 +150,7 @@ static int init(AVFilterContext *ctx) } av_log(ctx, AV_LOG_INFO, - "Whisper filter initialized: model: %s lang: %s queue: %ld ms\n", + "Whisper filter initialized: model: %s lang: %s queue: %" PRId64 " ms\n", wctx->model_path, wctx->language, wctx->queue / 1000); return 0; @@ -194,7 +194,7 @@ static void run_transcription(AVFilterContext *ctx, AVFrame *frame, int samples) const float duration = (float) samples / WHISPER_SAMPLE_RATE; av_log(ctx, AV_LOG_INFO, - "run transcription at %ld ms, %d/%d samples (%.2f seconds)...\n", + "run transcription at %" PRId64 " ms, %d/%d samples (%.2f seconds)...\n", timestamp_ms, samples, wctx->audio_buffer_fill_size, duration); struct whisper_full_params params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY); @@ -228,7 +228,7 @@ static void run_transcription(AVFilterContext *ctx, AVFrame *frame, int samples) const int64_t t0_ms = whisper_full_get_segment_t0(wctx->ctx_wsp, i) * 10; const int64_t t1_ms = whisper_full_get_segment_t1(wctx->ctx_wsp, i) * 10; - av_log(ctx, AV_LOG_DEBUG, " [%ld-%ld%s]: \"%s\"\n", + av_log(ctx, AV_LOG_DEBUG, " [%" PRId64 "-%" PRId64 "%s]: \"%s\"\n", timestamp_ms + t0_ms, timestamp_ms + t1_ms, turn ? " (turn)" : "", text_cleaned); if (segments_text) { @@ -246,13 +246,13 @@ static void run_transcription(AVFilterContext *ctx, AVFrame *frame, int samples) if (!av_strcasecmp(wctx->format, "srt")) { buf = av_asprintf - ("%d\n%02ld:%02ld:%02ld,%03ld --> %02ld:%02ld:%02ld,%03ld\n%s\n\n", + ("%d\n%02" PRId64 ":%02" PRId64 ":%02" PRId64 ",%03" PRId64 " --> %02" PRId64 ":%02" PRId64 ":%02" PRId64 ",%03" PRId64 "\n%s\n\n", wctx->index, start_t / 3600000, (start_t / 60000) % 60, (start_t / 1000) % 60, start_t % 1000, end_t / 3600000, (end_t / 60000) % 60, (end_t / 1000) % 60, end_t % 1000, text_cleaned); } else if (!av_strcasecmp(wctx->format, "json")) { - buf = av_asprintf("{\"start\":%ld,\"end\":%ld,\"text\":\"%s\"}\n", start_t, end_t, text_cleaned); + buf = av_asprintf("{\"start\":%" PRId64 ",\"end\":%" PRId64 ",\"text\":\"%s\"}\n", start_t, end_t, text_cleaned); } else buf = av_strdup(text_cleaned); -- 2.49.1 >>From 7cbc26267b6968eb2b8cc5f70e3603b6ee9b2101 Mon Sep 17 00:00:00 2001 From: Vittorio Palmisano Date: Sun, 21 Sep 2025 15:33:17 +0200 Subject: [PATCH 5/5] avfilter/af_whisper: fix srt index The srt index should be incremented for each segment. (cherry picked from commit 9970dc32bf85628e53ed0952d87d384080d8976e) --- libavfilter/af_whisper.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavfilter/af_whisper.c b/libavfilter/af_whisper.c index 90b2fc89a0..3c0eba42f0 100644 --- a/libavfilter/af_whisper.c +++ b/libavfilter/af_whisper.c @@ -251,6 +251,8 @@ static void run_transcription(AVFilterContext *ctx, AVFrame *frame, int samples) (start_t / 60000) % 60, (start_t / 1000) % 60, start_t % 1000, end_t / 3600000, (end_t / 60000) % 60, (end_t / 1000) % 60, end_t % 1000, text_cleaned); + + wctx->index++; } else if (!av_strcasecmp(wctx->format, "json")) { buf = av_asprintf("{\"start\":%" PRId64 ",\"end\":%" PRId64 ",\"text\":\"%s\"}\n", start_t, end_t, text_cleaned); } else @@ -265,8 +267,6 @@ static void run_transcription(AVFilterContext *ctx, AVFrame *frame, int samples) av_freep(&text_cleaned); } - wctx->index++; - AVDictionary **metadata = &frame->metadata; if (metadata && segments_text) { av_dict_set(metadata, "lavfi.whisper.text", segments_text, 0); -- 2.49.1 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org