From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTPS id E6CFA4E084
	for <ffmpegdev@gitmailbox.com>; Sat,  8 Mar 2025 15:01:07 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 75C5D68F44F;
	Sat,  8 Mar 2025 17:01:04 +0200 (EET)
Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com
 [209.85.128.45])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DE22068F444
 for <ffmpeg-devel@ffmpeg.org>; Sat,  8 Mar 2025 17:00:57 +0200 (EET)
Received: by mail-wm1-f45.google.com with SMTP id
 5b1f17b1804b1-43690d4605dso16782255e9.0
 for <ffmpeg-devel@ffmpeg.org>; Sat, 08 Mar 2025 07:00:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1741446057; x=1742050857; darn=ffmpeg.org;
 h=content-language:thread-index:content-transfer-encoding
 :mime-version:message-id:date:subject:to:from:from:to:cc:subject
 :date:message-id:reply-to;
 bh=JcxjOzJlWCXT+RSj9YIOH1zGpksMVMjE444fmzLaSxg=;
 b=O7T8PRfZUGNvSSsLSO5KLZS5OcGJug2VKaa1kmHR5nMYO8O35xu1Akfha5AzuBNuWD
 serUPNkh0OrxPSUrw0ARJh9iGycmnjK4Nx6XGDuihsg+4yzCzMv7l/UV7oKs1k6tN3cP
 kBthYcXsw/pPIYNMpcaBkjri3p/CZRU6nlPyFMEQRqbx1LyP5DGTeFVb/9tBs6E4sFaR
 Svc05TR7Mf8OI9rWxJQvjgeHKWsIzY1TiHUkFYmSj5zNOieoZxLoIjILZouw9rHkGWLf
 6fiAnZoDpnsYprIgGOwtgSvxjb9c9AIwEilJUoPRgBQdhb99XvNsL96/KenbENbWZ1h6
 HVIg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1741446057; x=1742050857;
 h=content-language:thread-index:content-transfer-encoding
 :mime-version:message-id:date:subject:to:from:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=JcxjOzJlWCXT+RSj9YIOH1zGpksMVMjE444fmzLaSxg=;
 b=tFd2yTX81AlVww4wHd9KFQmGzjXrvn8ufjONTtGrlitOEEg7m4BfqEkfgCja5nkljs
 krToq8LsmfUdmXep6q1VII9BWyCMd3lMIHn6bO6t/plpAxmXs8U1EUJQIlhaE9QQZ8R3
 6/XB3mc4uc84HYpC52yJIiGy58iwxiWqbByjHQJJajjz+uBQwUwFqAdM4UqMTy0EwNon
 dlJaDS/sTS6BT2nGkSWLnbFbD4f0q4OadgFcAkSipL0F5pVgD5VZhLKmthibVxoNL87u
 l1T/FD+kzCKc5K+aoBtoRuH5OmKajo21z4/A2ypARIiDBAhjx9viruFKrlfXTZHOR0sS
 MTgw==
X-Gm-Message-State: AOJu0YxP8ElPpqSYRyq/A5WaUgfZ2wvdgYm/qQm0hVPril81ZqY0cYBR
 HQPXwloOhAWzorITHOzohdLKuXMw4aT0Md6064cAs3XZtlJeQTuxvX1Rxw==
X-Gm-Gg: ASbGncu388M+R+51yCQkqbDVkNWUdL1PimtjlSVy3Z9FDuniQ7CWMDwSS7OPYdqDhH7
 YfTXifl92V2RGXEjLmsP3ZwEK4ovOHzSBkWHM8sz402QNEr0gxWZLPkhkpnET7X5FJplg3Op9BB
 doj1eUH6WjWMnccOfgPxNdXis7Uk89D+br0XW7Ii/my5Xyt3bhN7O6w81eHnEfCz3JcuveI8Hqg
 eobw+UQ4QUYMFzUhbdmcRBegBHYMyDH55r3Y/moxsEfzTl3NUC7OhP+5nLZirIltJUhQbUEhYua
 lH5U3QySMEr8pUyX4SOX9m1MoSG2W5wfVRRHy0afZEupTYejM/FLgaLNP7cA5nSucU1uLYNGY+d
 imS0fSYF2mcsD8/uu
X-Google-Smtp-Source: AGHT+IHuGPnib7qQlabfjV0hstBXbH0blDsIn0VOaajl2ox23FHaJpQMvrkTdtWtnxOPAh0BAJoLrw==
X-Received: by 2002:a05:600c:1ca5:b0:439:9b3f:2de1 with SMTP id
 5b1f17b1804b1-43c601e129fmr48999835e9.15.1741446056502; 
 Sat, 08 Mar 2025 07:00:56 -0800 (PST)
Received: from MK2 (80-108-16-220.cable.dynamic.surfer.at. [80.108.16.220])
 by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-43bd41c7cc7sm122414415e9.0.2025.03.08.07.00.55
 for <ffmpeg-devel@ffmpeg.org>
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Sat, 08 Mar 2025 07:00:56 -0800 (PST)
From: <m.kaindl0208@gmail.com>
To: <ffmpeg-devel@ffmpeg.org>
Date: Sat, 8 Mar 2025 16:00:59 +0100
Message-ID: <007a01db903a$e723fd40$b56bf7c0$@gmail.com>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AduQOIrrues5CAeGQCCB9cs7x1kLJA==
Content-Language: en-at
Subject: [FFmpeg-devel] [PATCH FFmpeg 8/15] libavfilter: add missing
 temperature application in apply_softmax function and set default
 temperature to 1. apply_softmax refactoring and improved error handling
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/007a01db903a$e723fd40$b56bf7c0$@gmail.com/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

Try the new filters using my Github Repo https://github.com/MaximilianKaindl/DeepFFMPEGVideoClassification. 

Any Feedback is appreciated!

Signed-off-by: MaximilianKaindl <m.kaindl0208@gmail.com>
---
 libavfilter/avf_dnn_classify.c        |  2 +-
 libavfilter/dnn/dnn_backend_torch.cpp | 66 ++++++++++++++++-----------
 2 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/libavfilter/avf_dnn_classify.c b/libavfilter/avf_dnn_classify.c
index 5f294d1d9b..fa3a5ebf99 100644
--- a/libavfilter/avf_dnn_classify.c
+++ b/libavfilter/avf_dnn_classify.c
@@ -134,7 +134,7 @@ static const AVOption dnn_classify_options[] = {
 #if (CONFIG_LIBTORCH == 1)
     { "torch",          "torch backend flag",                            0,                             AV_OPT_TYPE_CONST,  { .i64 = DNN_TH },   0,       0,       FLAGS, .unit = "backend" },
     { "logit_scale",    "logit scale for similarity calculation",        OFFSET3(logit_scale),          AV_OPT_TYPE_FLOAT,  { .dbl = -1.0 },     -1.0,    100.0,   FLAGS },
-    { "temperature",    "softmax temperature",                           OFFSET3(temperature),          AV_OPT_TYPE_FLOAT,  { .dbl = 1.0 },      1,       100.0,   FLAGS },
+    { "temperature",    "softmax temperature",                           OFFSET3(temperature),          AV_OPT_TYPE_FLOAT,  { .dbl = -1.0 },     -1.0,       100.0,   FLAGS },
     { "forward_order",  "Order of forward output (0: media text, 1: text media) (CLIP/CLAP only)", OFFSET3(forward_order), AV_OPT_TYPE_BOOL,   { .i64 = -1 },     -1,      1,       FLAGS },
     { "normalize",      "Normalize the input tensor (CLIP/CLAP only)",   OFFSET3(normalize),            AV_OPT_TYPE_BOOL,   { .i64 = -1 },       -1,      1,       FLAGS },
     { "input_res",      "video processing model expected input size",    OFFSET3(input_resolution),     AV_OPT_TYPE_INT64,  { .i64 = -1 },       -1,      10000,   FLAGS },
diff --git a/libavfilter/dnn/dnn_backend_torch.cpp b/libavfilter/dnn/dnn_backend_torch.cpp
index dc68ad254f..c8804639d9 100644
--- a/libavfilter/dnn/dnn_backend_torch.cpp
+++ b/libavfilter/dnn/dnn_backend_torch.cpp
@@ -473,15 +473,12 @@ static torch::Tensor calculate_similarity(torch::Tensor &tensor1, torch::Tensor
         torch::Tensor similarity = logit_scale * torch::matmul(tensor2, tensor1.transpose(0, 1));
         return similarity.transpose(0, 1);
     } catch (const c10::Error &e) {
-        if (ctx) {
-            av_log(ctx, AV_LOG_ERROR, "Similarity computation failed: %s\n", e.what());
-        }
+        av_log(ctx, AV_LOG_ERROR, "Similarity computation failed: %s\n", e.what());
         return torch::Tensor(); // Return empty tensor properly
     }
 }
 
-static torch::Tensor apply_softmax(torch::Tensor input_tensor, const int *softmax_units, int softmax_units_count,
-                                   DnnContext *ctx)
+static torch::Tensor apply_softmax(torch::Tensor input_tensor, float temperature, const int *softmax_units, int softmax_units_count, DnnContext *ctx)
 {
     try {
         // Check for empty or invalid input tensor
@@ -490,44 +487,53 @@ static torch::Tensor apply_softmax(torch::Tensor input_tensor, const int *softma
             return input_tensor;
         }
 
+        // Apply temperature if needed
+        torch::Tensor scaled_tensor;
+        if (temperature > 0.0f && temperature != 1.0f) {
+            scaled_tensor = input_tensor / temperature;
+        } else {
+            scaled_tensor = input_tensor;
+        }
+
         // If no specific units are provided, apply softmax to the entire tensor
         if (!softmax_units || softmax_units_count <= 0) {
-            return torch::nn::functional::softmax(input_tensor, torch::nn::functional::SoftmaxFuncOptions(1));
+            return torch::nn::functional::softmax(scaled_tensor, torch::nn::functional::SoftmaxFuncOptions(1));
         }
 
-        torch::Tensor result = input_tensor.clone();
+        // Create a new output tensor with the same shape as the input
+        torch::Tensor result = torch::empty_like(scaled_tensor);
         int offset = 0;
 
         // Apply softmax to each specified segment
         for (int i = 0; i < softmax_units_count; i++) {
             int length = softmax_units[i];
-            if (length <= 0 || offset + length > input_tensor.size(1)) {
-                continue;
+            if (length <= 0 || offset + length > scaled_tensor.size(1)) {
+                av_log(ctx, AV_LOG_ERROR, "Invlid Softmax units were given to softmax. Index invalid or out of Bounds.\n");
+                return input_tensor;
             }
 
-            // Select the segment to apply softmax
-            torch::Tensor segment = result.slice(1, offset, offset + length);
-
-            // Apply softmax along dimension 1 (across labels in segment)
-            torch::Tensor softmax_segment =
-                torch::nn::functional::softmax(segment, torch::nn::functional::SoftmaxFuncOptions(1));
-
-            // Put softmaxed segment back into result tensor
-            result.slice(1, offset, offset + length) = softmax_segment;
+            // Apply softmax to the segment and directly place it in the result tensor
+            result.slice(1, offset, offset + length) = torch::nn::functional::softmax(
+                scaled_tensor.slice(1, offset, offset + length), torch::nn::functional::SoftmaxFuncOptions(1));
 
             // Move offset forward
             offset += length;
         }
+
+        // Copy any remaining unprocessed parts if there are any
+        if (offset < scaled_tensor.size(1)) {
+            result.slice(1, offset, scaled_tensor.size(1)) = scaled_tensor.slice(1, offset, scaled_tensor.size(1));
+            // Copy remaining unprocessed elements without modification
+            av_log(ctx, AV_LOG_ERROR, "Some tensor elements (%d to %ld) were not processed by softmax\n", offset,
+                   scaled_tensor.size(1) - 1);
+        }
+
         return result;
     } catch (const c10::Error &e) {
-        if (ctx) {
-            av_log(ctx, AV_LOG_ERROR, "Error applying softmax: %s\n", e.what());
-        }
+        av_log(ctx, AV_LOG_ERROR, "Error applying softmax: %s\n", e.what());
         return input_tensor; // Return original tensor on error
     } catch (const std::exception &e) {
-        if (ctx) {
-            av_log(ctx, AV_LOG_ERROR, "Error applying softmax: %s\n", e.what());
-        }
+        av_log(ctx, AV_LOG_ERROR, "Error applying softmax: %s\n", e.what());
         return input_tensor; // Return original tensor on error
     }
 }
@@ -833,8 +839,9 @@ static int th_start_inference(void *args)
                 *infer_request->output = calculate_similarity(media_embeddings, text_embeddings,
                                                               th_model->ctx->torch_option.normalize, logit_scale, ctx);
             }
-            *infer_request->output = apply_softmax(*infer_request->output, th_model->clxp_ctx->softmax_units,
-                                                   th_model->clxp_ctx->softmax_units_count, ctx);
+            *infer_request->output =
+                apply_softmax(*infer_request->output, th_model->ctx->torch_option.temperature,
+                              th_model->clxp_ctx->softmax_units, th_model->clxp_ctx->softmax_units_count, ctx);
         }
     } else {
         avpriv_report_missing_feature(ctx, "model function type %d", th_model->model.func_type);
@@ -1071,6 +1078,13 @@ static THModel *init_model_th(DnnContext *ctx, DNNFunctionType func_type, AVFilt
         av_log(ctx, AV_LOG_INFO, "Using default logit_scale=%.4f for %s input\n", ctx->torch_option.logit_scale,
                func_type == DFT_ANALYTICS_CLAP ? "audio" : "video");
     }
+    if (ctx->torch_option.temperature <= 0) {
+        // set default value for logit_scale
+        ctx->torch_option.temperature = 1;
+        // Log the default value for logit_scale
+        av_log(ctx, AV_LOG_INFO, "Using default temperature=%.4f for %s input\n", ctx->torch_option.temperature,
+               func_type == DFT_ANALYTICS_CLAP ? "audio" : "video");
+    }
     if (ctx->torch_option.normalize < 0) {
         ctx->torch_option.normalize = func_type == DFT_ANALYTICS_CLAP ? 1 : 0;
         // Log the default value for logit_scale
-- 
2.34.1


_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".