From: hezuoqiang via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
To: "FFmpeg development discussions and patches" <ffmpeg-devel@ffmpeg.org>
Cc: "Rémi Denis-Courmont" <remi@remlab.net>,
hezuoqiang <hezuoqiang@foxmail.com>
Subject: [FFmpeg-devel] 回复:Re: [PATCH] libavformat/nal: add ARM NEON optimization forff_nal_find_startcode
Date: Wed, 14 Jan 2026 01:20:13 +0800
Message-ID: <tencent_B233DB36441FC505F2576352F6D4C220AE09@qq.com> (raw)
In-Reply-To: <2154BAC0-A450-438A-AE4B-67DCD5024002@remlab.net>
Hi James,
Thank you for your review. I'd like to clarify the difference between the two approaches:
**Clarification:**
My patch optimizes `ff_nal_find_startcode` in libavformat/nal.c, which is different from the `ff_startcode_find_candidate` hook you mentioned under libavcodec/h264dsp.c.
- `ff_startcode_find_candidate`: Returns offset to first zero byte, requires upper layer validation
- `ff_nal_find_startcode`: Returns pointer to complete startcode (00 00 01), used by H.264 demuxer
**Test Environment:**
- Platform: Raspberry Pi 5 (ARM Cortex-A76, AArch64)
- Compiler: GCC 14.2.0 with -O3 -march=armv8-a
- Test file: 1080p H.264 video, 22.88 MB
- Total NALU startcodes found: 1,224
**Test Methodology:**
I compared two approaches:
**Method 1 (baseline):** Use `ff_startcode_find_candidate` + C validation (current FFmpeg approach)
```c
// Simplified pseudo-code
std::vector<size_t> find_all_startcode_positions(const uint8_t* data, size_t size) {
std::vector<size_t> positions;
size_t i = 0;
while (i < size) {
// Step 1: Fast search for zero byte
int offset = ff_startcode_find_candidate(data + i, size - i);
if (offset >= size - i) break;
i += offset;
// Step 2: Validate if it's a complete startcode (00 00 01)
if (i + 2 < size && data[i] == 0 && data[i+1] == 0) {
if (data[i+2] == 1) {
positions.push_back(i);
i += 3;
continue;
} else if (i + 3 < size && data[i+2] == 0 && data[i+3] == 1) {
positions.push_back(i);
i += 4;
continue;
}
}
i++;
}
return positions;
}
```
Method 2 (NEON optimized): Use ff_nal_find_startcode_neon directly
```cpp
std::vector<size_t> find_all_startcode_positions_neon(const uint8_t* data, size_t size) {
std::vector<size_t> positions;
const uint8_t* p = data;
const uint8_t* end = data + size;
while (p < end) {
// Directly find complete startcode
const uint8_t* start = ff_nal_find_startcode_neon(p, end);
// Skip zero bytes before NALU header
while (start < end && *start == 0) start++;
if (start >= end) break;
positions.push_back(start - data);
p = start;
}
return positions;
}
```
Performance Results (1000 iterations):
- Method 1 (find zero + validate): 5,454,680 μs
- Method 2 (NEON direct search): 1,741,280 μs
- Speedup: 3.13x
Why this optimization is effective:
The NEON version detects "00" pattern (two consecutive zeros) instead of single zeros:
Test file analysis (22.88 MB 1080p H.264):
- Single zero bytes: 95,673 (98.1% false positive rate)
- Valid startcodes: 1,224
- With "00" pattern: Only 22.8% of 64-byte blocks need detailed checking
- 77.2% of blocks can be skipped entirely
This optimization specifically improves H.264 demuxing performance on ARM platforms.
Should I modify the commit message to better clarify this distinction?
Best regards,
He Zuoqiang
原始邮件
发件人:Rémi Denis-Courmont via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
发件时间:2026年1月13日 18:26
收件人:hezuoqiang--- via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>
抄送:Zuoqiang He <hezuoqiang@foxmail.com>, Rémi Denis-Courmont <remi@remlab.net>
主题:[FFmpeg-devel] Re: [PATCH] libavformat/nal: add ARM NEON optimization forff_nal_find_startcode
Nihao,
There already is a hook for this purpose under h264dsp, and it's already used on some other ISAs. So there should be no need to add a new one.
It's also probably faster to just look for a nul byte in assembler and let the C code manually check for the full 32-bit start code. This is basically just `strnlen()`.
Br,
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
_______________________________________________
ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org
To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org
prev parent reply other threads:[~2026-01-13 17:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-13 2:03 [FFmpeg-devel] [PATCH] libavformat/nal: add ARM NEON optimization for ff_nal_find_startcode hezuoqiang--- via ffmpeg-devel
2026-01-13 2:48 ` [FFmpeg-devel] " Zhao Zhili via ffmpeg-devel
2026-01-13 10:26 ` Rémi Denis-Courmont via ffmpeg-devel
2026-01-13 17:20 ` hezuoqiang via ffmpeg-devel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tencent_B233DB36441FC505F2576352F6D4C220AE09@qq.com \
--to=ffmpeg-devel@ffmpeg.org \
--cc=hezuoqiang@foxmail.com \
--cc=remi@remlab.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git