Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH] lavu/base64: optimize base64 encoding hot path
@ 2025-12-22  4:45 Victor Duarte Melo via ffmpeg-devel
  2025-12-25  3:54 ` [FFmpeg-devel] " Michael Niedermayer via ffmpeg-devel
  0 siblings, 1 reply; 2+ messages in thread
From: Victor Duarte Melo via ffmpeg-devel @ 2025-12-22  4:45 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Victor Duarte Melo

Reduce branches in the encoder fast path and improve
throughput while preserving bit-exact behavior and API
compatibility.

The decoder logic is unchanged.

Benchmarks and differential fuzzing were used to validate
correctness and performance.
---
 libavutil/base64.c | 147 ++++++++++++++++++++++++++++++---------------
 libavutil/base64.h |  40 +++++++++---
 2 files changed, 131 insertions(+), 56 deletions(-)

diff --git a/libavutil/base64.c b/libavutil/base64.c
index 69e11e6f5e..6d3c6c0d83 100644
--- a/libavutil/base64.c
+++ b/libavutil/base64.c
@@ -22,16 +22,40 @@
  * @file
  * @brief Base64 encode/decode
  * @author Ryan Martell <rdm4@martellventures.com> (with lots of Michael)
+ *
+ * This is a drop-in compatible implementation of FFmpeg's base64 helpers.
+ * The decode routine preserves FFmpeg's historical semantics (strict input,
+ * stops at the first invalid character, supports unpadded input).
+ *
+ * Small performance-oriented changes were made to the encoder:
+ *   - The slow "shift loop" tail handling was replaced by a constant-time
+ *     switch on the remaining 1 or 2 bytes, reducing branches and shifts.
+ *   - The main loop now packs 3 bytes into a 24-bit value directly instead of
+ *     reading an overlapping 32-bit word (avoids endian conversions and makes
+ *     the loop easier for compilers to optimize).
+ *
+ * The API and output are fully compatible with the original code.
  */
 
 #include <limits.h>
 #include <stddef.h>
+#include <stdint.h>
 
 #include "base64.h"
 #include "error.h"
 #include "intreadwrite.h"
 
-/* ---------------- private code */
+/* ---------------- private code
+ *
+ * map2[c] returns:
+ *   - 0..63  : decoded 6-bit value for valid Base64 symbols
+ *   - 0xFE   : "stop" symbol (NUL terminator and '=' padding)
+ *   - 0xFF   : invalid symbol (produces AVERROR_INVALIDDATA)
+ *
+ * The decoder uses:
+ *   - bits & 0x80 to detect "stop/invalid" quickly (both 0xFE and 0xFF have MSB set)
+ *   - bits & 1 to distinguish invalid (0xFF) from stop (0xFE)
+ */
 static const uint8_t map2[256] =
 {
     0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
@@ -72,58 +96,72 @@ static const uint8_t map2[256] =
 };
 
 #define BASE64_DEC_STEP(i) do { \
-    bits = map2[in[i]]; \
-    if (bits & 0x80) \
-        goto out ## i; \
-    v = i ? (v << 6) + bits : bits; \
-} while(0)
+    bits = map2[in[i]];         \
+    if (bits & 0x80)            \
+        goto out ## i;          \
+    v = (i) ? (v << 6) + bits : bits; \
+} while (0)
 
 int av_base64_decode(uint8_t *out, const char *in_str, int out_size)
 {
     uint8_t *dst = out;
     uint8_t *end;
-    // no sign extension
-    const uint8_t *in = in_str;
+    /* Cast to unsigned to avoid sign extension on platforms where char is signed. */
+    const uint8_t *in = (const uint8_t *)in_str;
     unsigned bits = 0xff;
     unsigned v;
 
+    /* Validation-only mode: keep FFmpeg's original behavior. */
     if (!out)
         goto validity_check;
 
     end = out + out_size;
+
+    /*
+     * Fast path: decode complete 4-char blocks while we can safely do a 32-bit store.
+     * We write 4 bytes and advance by 3 (the 4th written byte is overwritten on the next iteration).
+     */
     while (end - dst > 3) {
         BASE64_DEC_STEP(0);
         BASE64_DEC_STEP(1);
         BASE64_DEC_STEP(2);
         BASE64_DEC_STEP(3);
-        // Using AV_WB32 directly confuses compiler
+
+        /* Convert to native-endian so a native write yields correct byte order in memory. */
         v = av_be2ne32(v << 8);
         AV_WN32(dst, v);
+
         dst += 3;
-        in += 4;
+        in  += 4;
     }
+
+    /* Tail: decode at most one more block without overrunning the output buffer. */
     if (end - dst) {
         BASE64_DEC_STEP(0);
         BASE64_DEC_STEP(1);
         BASE64_DEC_STEP(2);
         BASE64_DEC_STEP(3);
+
         *dst++ = v >> 16;
         if (end - dst)
             *dst++ = v >> 8;
         if (end - dst)
             *dst++ = v;
+
         in += 4;
     }
+
 validity_check:
+    /*
+     * Strict validation: keep decoding groups of 4 until we hit the first stop/invalid.
+     * Using BASE64_DEC_STEP(0) ensures we always jump to out0 and never touch out1/out2/out3
+     * (important for the out == NULL validation-only mode).
+     */
     while (1) {
-        BASE64_DEC_STEP(0);
-        in++;
-        BASE64_DEC_STEP(0);
-        in++;
-        BASE64_DEC_STEP(0);
-        in++;
-        BASE64_DEC_STEP(0);
-        in++;
+        BASE64_DEC_STEP(0); in++;
+        BASE64_DEC_STEP(0); in++;
+        BASE64_DEC_STEP(0); in++;
+        BASE64_DEC_STEP(0); in++;
     }
 
 out3:
@@ -135,49 +173,64 @@ out2:
         *dst++ = v >> 4;
 out1:
 out0:
-    return bits & 1 ? AVERROR_INVALIDDATA : out ? dst - out : 0;
+    /* bits==0xFE => stop (NUL or '=') => success. bits==0xFF => invalid => error. */
+    return (bits & 1) ? AVERROR_INVALIDDATA : (out ? (int)(dst - out) : 0);
 }
 
 /*****************************************************************************
-* b64_encode: Stolen from VLC's http.c.
-* Simplified by Michael.
-* Fixed edge cases and made it work from data (vs. strings) by Ryan.
-*****************************************************************************/
+ * b64_encode: Stolen from VLC's http.c.
+ * Simplified by Michael.
+ * Fixed edge cases and made it work from data (vs. strings) by Ryan.
+ *
+ * Encoder micro-optimizations:
+ *   - Direct 24-bit packing (3 bytes -> 4 symbols) in the main loop.
+ *   - Branchless tail handling via a small switch for 1 or 2 remaining bytes.
+ *****************************************************************************/
 
 char *av_base64_encode(char *out, int out_size, const uint8_t *in, int in_size)
 {
     static const char b64[] =
         "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
     char *ret, *dst;
-    unsigned i_bits = 0;
-    int i_shift = 0;
-    int bytes_remaining = in_size;
 
-    if (in_size >= UINT_MAX / 4 ||
-        out_size < AV_BASE64_SIZE(in_size))
+    if (in_size >= (int)(UINT_MAX / 4) || out_size < AV_BASE64_SIZE(in_size))
         return NULL;
+
     ret = dst = out;
-    while (bytes_remaining > 3) {
-        i_bits = AV_RB32(in);
-        in += 3; bytes_remaining -= 3;
-        *dst++ = b64[ i_bits>>26        ];
-        *dst++ = b64[(i_bits>>20) & 0x3F];
-        *dst++ = b64[(i_bits>>14) & 0x3F];
-        *dst++ = b64[(i_bits>>8 ) & 0x3F];
-    }
-    i_bits = 0;
-    while (bytes_remaining) {
-        i_bits = (i_bits << 8) + *in++;
-        bytes_remaining--;
-        i_shift += 8;
+
+    /* Encode full 3-byte blocks. */
+    while (in_size >= 3) {
+        uint32_t v = ((uint32_t)in[0] << 16) |
+                     ((uint32_t)in[1] <<  8) |
+                     ((uint32_t)in[2]      );
+        in += 3;
+        in_size -= 3;
+
+        dst[0] = b64[ (v >> 18)        ];
+        dst[1] = b64[ (v >> 12) & 0x3F ];
+        dst[2] = b64[ (v >>  6) & 0x3F ];
+        dst[3] = b64[ (v      ) & 0x3F ];
+        dst += 4;
     }
-    while (i_shift > 0) {
-        *dst++ = b64[(i_bits << 6 >> i_shift) & 0x3f];
-        i_shift -= 6;
+
+    /* Encode the remaining 1 or 2 bytes (if any) and add '=' padding. */
+    if (in_size == 1) {
+        uint32_t v = (uint32_t)in[0];
+        dst[0] = b64[(v >> 2) & 0x3F];
+        dst[1] = b64[(v & 0x03) << 4];
+        dst[2] = '=';
+        dst[3] = '=';
+        dst += 4;
+    } else if (in_size == 2) {
+        uint32_t v = ((uint32_t)in[0] << 8) | (uint32_t)in[1];
+        dst[0] = b64[(v >> 10) & 0x3F];
+        dst[1] = b64[(v >>  4) & 0x3F];
+        dst[2] = b64[(v & 0x0F) << 2];
+        dst[3] = '=';
+        dst += 4;
     }
-    while ((dst - ret) & 3)
-        *dst++ = '=';
-    *dst = '\0';
 
+    /* NUL-terminate. The caller guaranteed enough space via AV_BASE64_SIZE(). */
+    *dst = '\0';
     return ret;
 }
diff --git a/libavutil/base64.h b/libavutil/base64.h
index 2954c12d42..31bd8357e3 100644
--- a/libavutil/base64.h
+++ b/libavutil/base64.h
@@ -23,6 +23,16 @@
 
 #include <stdint.h>
 
+/*
+ * NOTE: This header intentionally keeps the original FFmpeg API surface
+ * (function names, macros and semantics). The implementation shipped in
+ * base64.c is a drop-in replacement with extra tests/benchmarks around it.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @defgroup lavu_base64 Base64
  * @ingroup lavu_crypto
@@ -32,12 +42,17 @@
 /**
  * Decode a base64-encoded string.
  *
- * @param out      buffer for decoded data
+ * The input must be a NUL-terminated string. This decoder is strict:
+ * it does not ignore whitespace and it stops at the first invalid byte
+ * (including the terminating NUL). This matches FFmpeg's historical behavior.
+ *
+ * @param out      buffer for decoded data, or NULL to only validate input
  * @param in       null-terminated input string
  * @param out_size size in bytes of the out buffer, must be at
- *                 least 3/4 of the length of in, that is AV_BASE64_DECODE_SIZE(strlen(in))
- * @return         number of bytes written, or a negative value in case of
- *                 invalid input
+ *                 least 3/4 of the length of in, that is
+ *                 AV_BASE64_DECODE_SIZE(strlen(in))
+ * @return         number of bytes written, 0 for validation-only success,
+ *                 or a negative value in case of invalid input
  */
 int av_base64_decode(uint8_t *out, const char *in, int out_size);
 
@@ -50,12 +65,15 @@ int av_base64_decode(uint8_t *out, const char *in, int out_size);
 /**
  * Encode data to base64 and null-terminate.
  *
+ * The output is padded using '=' and is always NUL-terminated (if out is large
+ * enough). This matches FFmpeg's av_base64_encode behavior.
+ *
  * @param out      buffer for encoded data
  * @param out_size size in bytes of the out buffer (including the
  *                 null terminator), must be at least AV_BASE64_SIZE(in_size)
  * @param in       input buffer containing the data to encode
  * @param in_size  size in bytes of the in buffer
- * @return         out or NULL in case of error
+ * @return         out or NULL in case of error (e.g. output too small)
  */
 char *av_base64_encode(char *out, int out_size, const uint8_t *in, int in_size);
 
@@ -63,10 +81,14 @@ char *av_base64_encode(char *out, int out_size, const uint8_t *in, int in_size);
  * Calculate the output size needed to base64-encode x bytes to a
  * null-terminated string.
  */
-#define AV_BASE64_SIZE(x)  (((x)+2) / 3 * 4 + 1)
+#define AV_BASE64_SIZE(x)  (((x) + 2) / 3 * 4 + 1)
+
+/**
+ * @}
+ */
 
- /**
-  * @}
-  */
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
 
 #endif /* AVUTIL_BASE64_H */
-- 
2.51.0

_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [FFmpeg-devel] Re: [PATCH] lavu/base64: optimize base64 encoding hot path
  2025-12-22  4:45 [FFmpeg-devel] [PATCH] lavu/base64: optimize base64 encoding hot path Victor Duarte Melo via ffmpeg-devel
@ 2025-12-25  3:54 ` Michael Niedermayer via ffmpeg-devel
  0 siblings, 0 replies; 2+ messages in thread
From: Michael Niedermayer via ffmpeg-devel @ 2025-12-25  3:54 UTC (permalink / raw)
  To: FFmpeg development discussions and patches; +Cc: Michael Niedermayer


[-- Attachment #1.1: Type: text/plain, Size: 792 bytes --]

Hi

On Mon, Dec 22, 2025 at 01:45:35AM -0300, Victor Duarte Melo via ffmpeg-devel wrote:
> Reduce branches in the encoder fast path and improve
> throughput while preserving bit-exact behavior and API
> compatibility.
> 
> The decoder logic is unchanged.
> 
> Benchmarks and differential fuzzing were used to validate
> correctness and performance.

please split this in multiple patches.
Some of this is purely cosmetic, and improves code readability

some change the code

the commit message should contain the benchmark results, if you
did run benchmarks

we generally dont add "ifdef __cplusplus" wrapers

thx

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No snowflake in an avalanche ever feels responsible. -- Voltaire

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 163 bytes --]

_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-12-25  3:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-22  4:45 [FFmpeg-devel] [PATCH] lavu/base64: optimize base64 encoding hot path Victor Duarte Melo via ffmpeg-devel
2025-12-25  3:54 ` [FFmpeg-devel] " Michael Niedermayer via ffmpeg-devel

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git