* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
@ 2025-05-21 10:14 Jiawei
0 siblings, 0 replies; 28+ messages in thread
From: Jiawei @ 2025-05-21 10:14 UTC (permalink / raw)
To: ffmpeg-devel
> > -----原始邮件-----
> > 发件人: "Nicolas George" <george@nsup.org>
> > 发送时间: 2025-05-21 14:52:12 (星期三)
> > 收件人: "FFmpeg development discussions and patches"
> <ffmpeg-devel@ffmpeg.org>
> > 抄送:
> > 主题: Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
> >
> > Jiawei (HE12025-05-21):
> > > particularly improving
> > > performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV)
> architectures.
> >
> > Benchmark needed.
> >
> > Regards,
> >
> > --
> > Nicolas George
Hi Nicolas,
Since I am a gcc developer, I'm not so familiar with the FFmpeg test
flow, here is my test process,
if there exists anything uncorrect, please point me out:
1. Download the video bbb_sunflower_2160p_30fps_normal.mp4.zip
<https://download.blender.org/demo/movies/BBB/bbb_sunflower_2160p_30fps_normal.mp4.zip>
from https://download.blender.org/demo/movies/BBB/,
```
ffmpeg -i bbb_sunflower_2160p_30fps_normal.mp4 -t 60 -vf
"scale=1920:1080" -c:v libx265 -c:a libmp3lame 1080p_hevc_mp3.mp4
```
get the 1080p video as Benchmark test video
2. Build two version of FFmpeg, one with the modify, another without
the patch modif, using the gcc 13.3 release version,
verified with Intel(R) Core(TM) Ultra 9 285HX
Using patch:
```
./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
developers
built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
configuration: --prefix=/home/pz9115/ffpo --disable-ffplay --arch=x64
--extra-cflags=-O3 --enable-static --target-os=linux
libavutil 60. 2.100 / 60. 2.100
libavcodec 62. 3.101 / 62. 3.101
libavformat 62. 0.102 / 62. 0.102
libavdevice 62. 0.100 / 62. 0.100
libavfilter 11. 0.100 / 11. 0.100
libswscale 9. 0.100 / 9. 0.100
libswresample 6. 0.100 / 6. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/home/pz9115/mp/1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf62.0.102
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
encoder : Lavc62.3.101 wrapped_avframe
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
encoder : Lavc62.3.101 pcm_s16le
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
[out#0/null @ 0x565233669eb0] video:731KiB audio:11250KiB subtitle:0KiB
other streams:0KiB global headers:0KiB muxing overhead: unknown
frame= 1800 fps=635 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
speed=21.2x elapsed=0:00:02.83
bench: utime=11.324s stime=0.290s rtime=2.834s
bench: maxrss=186556KiB
```
Without patch(here I add the fno-tree-vectorize directly):
./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
developers
built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
configuration: --prefix=/home/pz9115/ffpo --disable-ffplay --arch=x64
--extra-cflags='-O3 -fno-tree-vectorize' --enable-static --target-os=linux
libavutil 60. 2.100 / 60. 2.100
libavcodec 62. 3.101 / 62. 3.101
libavformat 62. 0.102 / 62. 0.102
libavdevice 62. 0.100 / 62. 0.100
libavfilter 11. 0.100 / 11. 0.100
libswscale 9. 0.100 / 9. 0.100
libswresample 6. 0.100 / 6. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/home/pz9115/mp/1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf62.0.102
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
encoder : Lavc62.3.101 wrapped_avframe
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
encoder : Lavc62.3.101 pcm_s16le
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
[out#0/null @ 0x55eb196b7eb0] video:731KiB audio:11250KiB subtitle:0KiB
other streams:0KiB global headers:0KiB muxing overhead: unknown
frame= 1800 fps=509 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
speed= 17x elapsed=0:00:03.53
bench: utime=21.544s stime=0.349s rtime=3.536s
bench: maxrss=181580KiB
And I also tested on a RISC-V develop board MUSE Pi Pro, Here following
is the configure and result:
Using patch:
root@spacemit-k1-x-MUSE-Pi-Pro-board:~# ./ffpv/bin/ffmpeg -benchmark -i
1080p_hevc_mp3.mp4 -f null -
ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 16.0.0 (g3fc902e738b) 20250519 (experimental)
configuration: --prefix=/home/pz9115/ffpv --disable-ffplay
--arch=riscv --extra-cflags='-march=rv64gcv_zba_zbb_zbs -O3 -ffast-math'
--cross-prefix=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-
--cc=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-gcc
--cxx=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-g++ --enable-static
--enable-cross-compile --target-os=linux --disable-rvv
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Odd rotation angle.
If you want to help, upload a sample of this file to
https://streams.videolan.org/upload/ and contact the ffmpeg-devel
mailing list. (ffmpeg-devel@ffmpeg.org)Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf60.16.100
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1080x1920 [SAR 1:1 DAR 9:16], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 wrapped_avframe
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 pcm_s16le
[out#0/null @ 0x28a82e0] video:844kB audio:11250kB subtitle:0kB other
streams:0kB global headers:0kB muxing overhead: unknown
frame= 1800 fps= 42 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
speed=1.41x
bench: utime=207.150s stime=5.319s rtime=42.608s
bench: maxrss=162160kB
Without patch(same added the fno-tree-vectorize directly):
./ffp/bin/ffmpeg -benchmark -i 1080p_hevc_mp3.mp4 -f null -
ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 16.0.0 (g38163c874a3-dirty) 20250515 (experimental)
configuration: --prefix=/home/pz9115/ffp --disable-ffplay
--arch=riscv --sysroot=/home/pz9115/rv/sysroot
--extra-cflags='-march=rv64gcv_zba_zbb_zbc_zbs_zca_zcd -mabi=lp64d -O3
-fno-tree-vectorize -static' --extra-ldflags=-static
--cross-prefix=/home/pz9115/rv/bin/riscv64-unknown-linux-gnu-
--enable-static --enable-cross-compile --target-os=linux --disable-rvv
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf60.16.100
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 wrapped_avframe
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 pcm_s16le
[out#0/null @ 0x2729630] video:844kB audio:11250kB subtitle:0kB other
streams:0kB global headers:0kB muxing overhead: unknown
frame= 1800 fps= 30 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
speed= 1x
bench: utime=321.145s stime=2.475s rtime=59.960s
bench: maxrss=131532kB
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-06-03 16:14 ` Niklas Haas
@ 2025-06-04 11:13 ` Rémi Denis-Courmont
0 siblings, 0 replies; 28+ messages in thread
From: Rémi Denis-Courmont @ 2025-06-04 11:13 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Le 3 juin 2025 19:14:16 GMT+03:00, Niklas Haas <ffmpeg@haasn.xyz> a écrit :
>We have an open bug in swscale on 32-bit platforms where the use of x87 causes
>non-bitexact results in 32-bit platforms, resolved by setting -mfpu=sse at
>build time.
>
>Maybe we should think about setting this flag globally?
Enabling SSE is a separate discussion IMO. I suppose we could enable it on x86-32, but it should already be enabled on x86-64.
The problem with min/max has to do with NaN handling. If one of the two operands is NaN, FFMIN/FFMAX return the right hand side, whereas fminf/fmaxf return the number.
As long as we don't care about NaNs, we should allow whichever behaviour is most efficient on the architecture and that differs between x86, WASM, and the IEEE-style ones.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-30 6:58 ` Rémi Denis-Courmont
2025-05-31 13:39 ` Michael Niedermayer
@ 2025-06-03 16:14 ` Niklas Haas
2025-06-04 11:13 ` Rémi Denis-Courmont
1 sibling, 1 reply; 28+ messages in thread
From: Niklas Haas @ 2025-06-03 16:14 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Fri, 30 May 2025 09:58:48 +0300 Rémi Denis-Courmont <remi@remlab.net> wrote:
>
> That will harm performance on x87, whence fminf() and co are function calls rather than single instructions. What we actually should do is define separate macros for integer vs float vs double.
We have an open bug in swscale on 32-bit platforms where the use of x87 causes
non-bitexact results in 32-bit platforms, resolved by setting -mfpu=sse at
build time.
Maybe we should think about setting this flag globally?
>
> But there are hundreds of use sites to patch. To be bluntly honest, I don't have the motivation to carry that tedious repetitive work out in my free time.
>
> Br,
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-30 6:58 ` Rémi Denis-Courmont
@ 2025-05-31 13:39 ` Michael Niedermayer
2025-06-03 16:14 ` Niklas Haas
1 sibling, 0 replies; 28+ messages in thread
From: Michael Niedermayer @ 2025-05-31 13:39 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 2527 bytes --]
Hi Remi
On Fri, May 30, 2025 at 09:58:48AM +0300, Rémi Denis-Courmont wrote:
>
>
> Le 30 mai 2025 03:46:05 GMT+03:00, Michael Niedermayer <michael@niedermayer.cc> a écrit :
> >On Mon, May 26, 2025 at 11:43:15AM +0300, Rémi Denis-Courmont wrote:
> >>
> >>
> >> Le 26 mai 2025 00:37:08 GMT+03:00, Michael Niedermayer <michael@niedermayer.cc> a écrit :
> >> >Hi Rémi
> >> >
> >> >On Sat, May 24, 2025 at 07:10:57PM +0300, Rémi Denis-Courmont wrote:
> >> >> Le torstaina 22. toukokuuta 2025, 9.32.18 Itä-Euroopan kesäaika Jiawei a écrit
> >> >> :
> >> >> > > The RISC-V autovectorised output looks like it has a warning "Odd
> >> >> > > rotation angle" which is not present in the non-autovectorised output.
> >> >> >
> >> >> > I found this occured when using '-ffast-math' in RISC-V, also occur in
> >> >> > -O3 -ffast-math -fno-tree-vectorize case(much slower due to the
> >> >> > -ffast-math),supplementary more comparison results here:
> >> >>
> >> >
> >> >> Unfortunately, the FFmpeg code is written with x87 semantics in mind.
> >> >
> >> >I dont remember ever writing code intentionally with x87 semantics. And i
> >> >have doubts other people did.
> >>
> >> It doesn't have to be intentional. FFmpeg was started and mostly developed with x86-32 then x86-64 in mind. It's entirely possible that this happened innocently.
> >>
> >> Specifically, FFmpeg uses open-code for minimum, maximum, absolute value and so on (see FFMIN, FFMAX, FFABS). They work nicely for integer maths. They also work nicely on x87 with the current set of FPU optimisations, but they differ from IEEE semantics because of NaNs, negative zeros and such.
> >>
> >> Because of that the compiler will *not* use the native FPU instructions on platforms with native IEEE floats.
> >
> >replace all FFMIN with fminf() / fmin() where the arguments are float or
> >double, assuming that has no ill performance effects
>
> That will harm performance on x87, whence fminf() and co are function calls rather than single instructions. What we actually should do is define separate macros for integer vs float vs double.
ok
>
> But there are hundreds of use sites to patch. To be bluntly honest, I don't have the motivation to carry that tedious repetitive work out in my free time.
ok, can you review my patch ?
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-30 0:46 ` Michael Niedermayer
@ 2025-05-30 6:58 ` Rémi Denis-Courmont
2025-05-31 13:39 ` Michael Niedermayer
2025-06-03 16:14 ` Niklas Haas
0 siblings, 2 replies; 28+ messages in thread
From: Rémi Denis-Courmont @ 2025-05-30 6:58 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Le 30 mai 2025 03:46:05 GMT+03:00, Michael Niedermayer <michael@niedermayer.cc> a écrit :
>On Mon, May 26, 2025 at 11:43:15AM +0300, Rémi Denis-Courmont wrote:
>>
>>
>> Le 26 mai 2025 00:37:08 GMT+03:00, Michael Niedermayer <michael@niedermayer.cc> a écrit :
>> >Hi Rémi
>> >
>> >On Sat, May 24, 2025 at 07:10:57PM +0300, Rémi Denis-Courmont wrote:
>> >> Le torstaina 22. toukokuuta 2025, 9.32.18 Itä-Euroopan kesäaika Jiawei a écrit
>> >> :
>> >> > > The RISC-V autovectorised output looks like it has a warning "Odd
>> >> > > rotation angle" which is not present in the non-autovectorised output.
>> >> >
>> >> > I found this occured when using '-ffast-math' in RISC-V, also occur in
>> >> > -O3 -ffast-math -fno-tree-vectorize case(much slower due to the
>> >> > -ffast-math),supplementary more comparison results here:
>> >>
>> >
>> >> Unfortunately, the FFmpeg code is written with x87 semantics in mind.
>> >
>> >I dont remember ever writing code intentionally with x87 semantics. And i
>> >have doubts other people did.
>>
>> It doesn't have to be intentional. FFmpeg was started and mostly developed with x86-32 then x86-64 in mind. It's entirely possible that this happened innocently.
>>
>> Specifically, FFmpeg uses open-code for minimum, maximum, absolute value and so on (see FFMIN, FFMAX, FFABS). They work nicely for integer maths. They also work nicely on x87 with the current set of FPU optimisations, but they differ from IEEE semantics because of NaNs, negative zeros and such.
>>
>> Because of that the compiler will *not* use the native FPU instructions on platforms with native IEEE floats.
>
>replace all FFMIN with fminf() / fmin() where the arguments are float or
>double, assuming that has no ill performance effects
That will harm performance on x87, whence fminf() and co are function calls rather than single instructions. What we actually should do is define separate macros for integer vs float vs double.
But there are hundreds of use sites to patch. To be bluntly honest, I don't have the motivation to carry that tedious repetitive work out in my free time.
Br,
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-26 8:43 ` Rémi Denis-Courmont
@ 2025-05-30 0:46 ` Michael Niedermayer
2025-05-30 6:58 ` Rémi Denis-Courmont
0 siblings, 1 reply; 28+ messages in thread
From: Michael Niedermayer @ 2025-05-30 0:46 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 2006 bytes --]
On Mon, May 26, 2025 at 11:43:15AM +0300, Rémi Denis-Courmont wrote:
>
>
> Le 26 mai 2025 00:37:08 GMT+03:00, Michael Niedermayer <michael@niedermayer.cc> a écrit :
> >Hi Rémi
> >
> >On Sat, May 24, 2025 at 07:10:57PM +0300, Rémi Denis-Courmont wrote:
> >> Le torstaina 22. toukokuuta 2025, 9.32.18 Itä-Euroopan kesäaika Jiawei a écrit
> >> :
> >> > > The RISC-V autovectorised output looks like it has a warning "Odd
> >> > > rotation angle" which is not present in the non-autovectorised output.
> >> >
> >> > I found this occured when using '-ffast-math' in RISC-V, also occur in
> >> > -O3 -ffast-math -fno-tree-vectorize case(much slower due to the
> >> > -ffast-math),supplementary more comparison results here:
> >>
> >
> >> Unfortunately, the FFmpeg code is written with x87 semantics in mind.
> >
> >I dont remember ever writing code intentionally with x87 semantics. And i
> >have doubts other people did.
>
> It doesn't have to be intentional. FFmpeg was started and mostly developed with x86-32 then x86-64 in mind. It's entirely possible that this happened innocently.
>
> Specifically, FFmpeg uses open-code for minimum, maximum, absolute value and so on (see FFMIN, FFMAX, FFABS). They work nicely for integer maths. They also work nicely on x87 with the current set of FPU optimisations, but they differ from IEEE semantics because of NaNs, negative zeros and such.
>
> Because of that the compiler will *not* use the native FPU instructions on platforms with native IEEE floats.
replace all FFMIN with fminf() / fmin() where the arguments are float or
double, assuming that has no ill performance effects
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
If the United States is serious about tackling the national security threats
related to an insecure 5G network, it needs to rethink the extent to which it
values corporate profits and government espionage over security.-Bruce Schneier
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-25 21:37 ` Michael Niedermayer
@ 2025-05-26 8:43 ` Rémi Denis-Courmont
2025-05-30 0:46 ` Michael Niedermayer
0 siblings, 1 reply; 28+ messages in thread
From: Rémi Denis-Courmont @ 2025-05-26 8:43 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Le 26 mai 2025 00:37:08 GMT+03:00, Michael Niedermayer <michael@niedermayer.cc> a écrit :
>Hi Rémi
>
>On Sat, May 24, 2025 at 07:10:57PM +0300, Rémi Denis-Courmont wrote:
>> Le torstaina 22. toukokuuta 2025, 9.32.18 Itä-Euroopan kesäaika Jiawei a écrit
>> :
>> > > The RISC-V autovectorised output looks like it has a warning "Odd
>> > > rotation angle" which is not present in the non-autovectorised output.
>> >
>> > I found this occured when using '-ffast-math' in RISC-V, also occur in
>> > -O3 -ffast-math -fno-tree-vectorize case(much slower due to the
>> > -ffast-math),supplementary more comparison results here:
>>
>
>> Unfortunately, the FFmpeg code is written with x87 semantics in mind.
>
>I dont remember ever writing code intentionally with x87 semantics. And i
>have doubts other people did.
It doesn't have to be intentional. FFmpeg was started and mostly developed with x86-32 then x86-64 in mind. It's entirely possible that this happened innocently.
Specifically, FFmpeg uses open-code for minimum, maximum, absolute value and so on (see FFMIN, FFMAX, FFABS). They work nicely for integer maths. They also work nicely on x87 with the current set of FPU optimisations, but they differ from IEEE semantics because of NaNs, negative zeros and such.
Because of that the compiler will *not* use the native FPU instructions on platforms with native IEEE floats.
>> For
>> instance, the FFmpeg math macros work nicely on x86, but they would work much
>> better with fabs/fmax/fmin/fabsf/fmaxf/fminf on other platforms. I tried to fix
>> that with copious amount of _Generic(), but that lead to ICE...
>
>ICE as the name says, is a internal compiler error and not the fault of
>the code passed to the compiler
Obviously yes. But if it crashes every major recent versions of both major compilers, then it is a given that the code will be rejected.
And even if the compilers got fixed, the code wouldn't be accepted until ten or twenty years in the future judging by how conservative this project is with compiler versions. Lastly, I suspect it's caused ny excessively complex evaluation that simply drive compilers into OOM. It is debatable if OOM is even a compiler bug.
>> So we are stuck between a rock and a hard place where we need fast math for
>> good perfs, but we need to turn it off for correct results.
>
>--ffast-math is not one option, its many
Indeed only a few of these flags are troublesome. I mentioned it on IRC many moons ago.
>
>on the gcc here, it does this:
>+ -fassociative-math [enabled]
>+ -fcx-limited-range [enabled]
>+ -ffinite-math-only [enabled]
>+ -fmath-errno [disabled]
>+ -freciprocal-math [enabled]
>+ -fsigned-zeros [disabled]
>+ -ftrapping-math [disabled]
>+ -funsafe-math-optimizations [enabled]
>
>So maybe some of this can be globally enabled.
>
>But some things like fassociative-math are simply not "safe"
>on general nummeric code. It also violates ISO C according to
>the official gcc documentation
>
>thx
>
>[...]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-24 16:10 ` Rémi Denis-Courmont
@ 2025-05-25 21:37 ` Michael Niedermayer
2025-05-26 8:43 ` Rémi Denis-Courmont
0 siblings, 1 reply; 28+ messages in thread
From: Michael Niedermayer @ 2025-05-25 21:37 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 2242 bytes --]
Hi Rémi
On Sat, May 24, 2025 at 07:10:57PM +0300, Rémi Denis-Courmont wrote:
> Le torstaina 22. toukokuuta 2025, 9.32.18 Itä-Euroopan kesäaika Jiawei a écrit
> :
> > > The RISC-V autovectorised output looks like it has a warning "Odd
> > > rotation angle" which is not present in the non-autovectorised output.
> >
> > I found this occured when using '-ffast-math' in RISC-V, also occur in
> > -O3 -ffast-math -fno-tree-vectorize case(much slower due to the
> > -ffast-math),supplementary more comparison results here:
>
> Unfortunately, the FFmpeg code is written with x87 semantics in mind.
I dont remember ever writing code intentionally with x87 semantics. And i
have doubts other people did.
What i did in soem rare places do, was depend on IEEE 754 semantics
(that is when doing so lead to simpler and cleaner code)
> For
> instance, the FFmpeg math macros work nicely on x86, but they would work much
> better with fabs/fmax/fmin/fabsf/fmaxf/fminf on other platforms. I tried to fix
> that with copious amount of _Generic(), but that lead to ICE...
ICE as the name says, is a internal compiler error and not the fault of
the code passed to the compiler
>
> So we are stuck between a rock and a hard place where we need fast math for
> good perfs, but we need to turn it off for correct results.
--ffast-math is not one option, its many
on the gcc here, it does this:
+ -fassociative-math [enabled]
+ -fcx-limited-range [enabled]
+ -ffinite-math-only [enabled]
+ -fmath-errno [disabled]
+ -freciprocal-math [enabled]
+ -fsigned-zeros [disabled]
+ -ftrapping-math [disabled]
+ -funsafe-math-optimizations [enabled]
So maybe some of this can be globally enabled.
But some things like fassociative-math are simply not "safe"
on general nummeric code. It also violates ISO C according to
the official gcc documentation
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-22 6:32 ` Jiawei
2025-05-24 1:46 ` Kieran Kunhya via ffmpeg-devel
@ 2025-05-24 16:10 ` Rémi Denis-Courmont
2025-05-25 21:37 ` Michael Niedermayer
1 sibling, 1 reply; 28+ messages in thread
From: Rémi Denis-Courmont @ 2025-05-24 16:10 UTC (permalink / raw)
To: ffmpeg-devel
Le torstaina 22. toukokuuta 2025, 9.32.18 Itä-Euroopan kesäaika Jiawei a écrit
:
> > The RISC-V autovectorised output looks like it has a warning "Odd
> > rotation angle" which is not present in the non-autovectorised output.
>
> I found this occured when using '-ffast-math' in RISC-V, also occur in
> -O3 -ffast-math -fno-tree-vectorize case(much slower due to the
> -ffast-math),supplementary more comparison results here:
Unfortunately, the FFmpeg code is written with x87 semantics in mind. For
instance, the FFmpeg math macros work nicely on x86, but they would work much
better with fabs/fmax/fmin/fabsf/fmaxf/fminf on other platforms. I tried to fix
that with copious amount of _Generic(), but that lead to ICE...
So we are stuck between a rock and a hard place where we need fast math for
good perfs, but we need to turn it off for correct results.
--
ヅニ-クーモン・レミ
Hagalund ny stad, f.d. Finska republik Nylands
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 6:17 Jiawei
` (3 preceding siblings ...)
2025-05-21 10:33 ` Andreas Rheinhardt
@ 2025-05-24 12:00 ` Rémi Denis-Courmont
4 siblings, 0 replies; 28+ messages in thread
From: Rémi Denis-Courmont @ 2025-05-24 12:00 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Hi,
Le 21 mai 2025 09:17:50 GMT+03:00, Jiawei <jiawei@iscas.ac.cn> a écrit :
>This patch modifies the FFmpeg build system to remove the explicit disabling
>of GCC's auto-vectorization feature.
>
>Modern GCC versions (>= 10.0) have demonstrated stable auto-vectorization
>capabilities through extensive optimizations in loop analysis and SIMD
>code generation. The explicit -fno-tree-vectorize flag originally added
>in commit 973859f (2009) to workaround early GCC vectorization instability
>is no longer necessary.
>
>Key improvements justifying this change:
>1. Enhanced heuristics for loop vectorization cost models
>2. Mature handling of alignment and memory access patterns
>3. Robust fallback mechanisms for unsupported architectures
>
>This change allows FFmpeg to benefit from automated SIMD optimizations
>when built with -O3 optimization level, particularly improving
>performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
I don't mind the patch but this description is very misleading. Realistically, this will only enable SSE2 or so on x86-64 and nothing on RISC-V, because we can't simply assume that AVX and RVV are supported.
Call me when Debian, Ubuntu and Fedora require RVA23 for their official RISC-V ports... (I won't be holding my breath.)
Bluntly, I am concerned that this gives the wrong impression that AVX and RVV optimisations are no longer necessary, cut funding off. Yet GCC (and LLVM) remain incapable of selecting optimised loops depending on runtime CPU capabilities.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-24 1:46 ` Kieran Kunhya via ffmpeg-devel
@ 2025-05-24 4:10 ` Jiawei
0 siblings, 0 replies; 28+ messages in thread
From: Jiawei @ 2025-05-24 4:10 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: kieran618
> Here is a particularly bad example of autovectorisation across many
> compilers:
>
> https://gcc.godbolt.org/z/rjEqzf1hh
>
> Kieran
Admittedly, in some cases, enabling vectorization is not the optimal
solution.
But the question is the limitation is only added on gcc side.For LLVM
clang, there are no same restrict.
And force add the limitation in configure side will change the user's
original purpose, if user want to
enable the vectorization when using gcc, it will have vectorization
turned off without knowing it.
After a long and careful inspection, user may finally find that the
configure configuration has forced
this feature to be turned off, and they still need to remove this
restriction manually.
BR,
Jiawei
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-22 6:32 ` Jiawei
@ 2025-05-24 1:46 ` Kieran Kunhya via ffmpeg-devel
2025-05-24 4:10 ` Jiawei
2025-05-24 16:10 ` Rémi Denis-Courmont
1 sibling, 1 reply; 28+ messages in thread
From: Kieran Kunhya via ffmpeg-devel @ 2025-05-24 1:46 UTC (permalink / raw)
To: FFmpeg development discussions and patches; +Cc: Kieran Kunhya, post
On Thu, 22 May 2025, 07:32 Jiawei, <jiawei@iscas.ac.cn> wrote:
> 在 2025/5/22 2:21, Frank Plowman 写道:
> > On 21/05/2025 11:17, Jiawei wrote:
> >> 在 2025/5/21 14:52, Nicolas George 写道:
> >>> Jiawei (HE12025-05-21):
> >>>> particularly improving
> >>>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV)
> architectures.
> >>> Benchmark needed.
> >>>
> >>> Regards,
> >>
> >> Hi Nicolas,
> >>
> >>
> >> Since I am a gcc developer, I'm not so familiar with the FFmpeg test
> >> flow, here is my test process,
> >> if there exists anything uncorrect, please point me out:
> >>
> >>
> >> 1. Download the video bbb_sunflower_2160p_30fps_normal.mp4.zip
> >> <
> https://download.blender.org/demo/movies/BBB/bbb_sunflower_2160p_30fps_normal.mp4.zip
> >
> >> from https://download.blender.org/demo/movies/BBB/,
> >>
> >> ```
> >>
> >> ffmpeg -i bbb_sunflower_2160p_30fps_normal.mp4 -t 60 -vf
> >> "scale=1920:1080" -c:v libx265 -c:a libmp3lame 1080p_hevc_mp3.mp4
> >> ```
> >>
> >> get the 1080p video as Benchmark test video
> >>
> >>
> >> 2. Build two version of FFmpeg, one with the modify, another without
> >> the patch modif, using the gcc 13.3 release version,
> >>
> >> verified with Intel(R) Core(TM) Ultra 9 285HX
> >>
> >>
> >> Using patch:
> >>
> >> ```
> >> ./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
> >> ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
> >> developers
> >> built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
> >> configuration: --prefix=/home/pz9115/ffpo --disable-ffplay
> --arch=x64
> >> --extra-cflags=-O3 --enable-static --target-os=linux
> >> libavutil 60. 2.100 / 60. 2.100
> >> libavcodec 62. 3.101 / 62. 3.101
> >> libavformat 62. 0.102 / 62. 0.102
> >> libavdevice 62. 0.100 / 62. 0.100
> >> libavfilter 11. 0.100 / 11. 0.100
> >> libswscale 9. 0.100 / 9. 0.100
> >> libswresample 6. 0.100 / 6. 0.100
> >> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> >> '/home/pz9115/mp/1080p_hevc_mp3.mp4':
> >> Metadata:
> >> major_brand : isom
> >> minor_version : 512
> >> compatible_brands: isomiso2mp41
> >> title : Big Buck Bunny, Sunflower version
> >> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> >> composer : Sacha Goedegebure
> >> encoder : Lavf60.16.100
> >> comment : Creative Commons Attribution 3.0 -
> >> http://bbb3d.renderfarming.net
> >> genre : Animation
> >> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
> >> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
> >> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
> >> fps, 30 tbr, 15360 tbn (default)
> >> Metadata:
> >> handler_name : GPAC ISO Video Handler
> >> vendor_id : [0][0][0][0]
> >> encoder : Lavc60.31.102 libx265
> >> Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
> >> 48000 Hz, stereo, fltp, 128 kb/s (default)
> >> Metadata:
> >> handler_name : GPAC ISO Audio Handler
> >> vendor_id : [0][0][0][0]
> >> Stream mapping:
> >> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
> >> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
> >> Press [q] to stop, [?] for help
> >> Output #0, null, to 'pipe:':
> >> Metadata:
> >> major_brand : isom
> >> minor_version : 512
> >> compatible_brands: isomiso2mp41
> >> title : Big Buck Bunny, Sunflower version
> >> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> >> composer : Sacha Goedegebure
> >> genre : Animation
> >> comment : Creative Commons Attribution 3.0 -
> >> http://bbb3d.renderfarming.net
> >> encoder : Lavf62.0.102
> >> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
> >> 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
> >> Metadata:
> >> encoder : Lavc62.3.101 wrapped_avframe
> >> handler_name : GPAC ISO Video Handler
> >> vendor_id : [0][0][0][0]
> >> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> >> (default)
> >> Metadata:
> >> encoder : Lavc62.3.101 pcm_s16le
> >> handler_name : GPAC ISO Audio Handler
> >> vendor_id : [0][0][0][0]
> >> [out#0/null @ 0x565233669eb0] video:731KiB audio:11250KiB subtitle:0KiB
> >> other streams:0KiB global headers:0KiB muxing overhead: unknown
> >> frame= 1800 fps=635 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
> >> speed=21.2x elapsed=0:00:02.83
> >> bench: utime=11.324s stime=0.290s rtime=2.834s
> >> bench: maxrss=186556KiB
> >> ```
> >>
> >> Without patch(here I add the fno-tree-vectorize directly):
> >>
> >> ./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
> >> ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
> >> developers
> >> built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
> >> configuration: --prefix=/home/pz9115/ffpo --disable-ffplay
> --arch=x64
> >> --extra-cflags='-O3 -fno-tree-vectorize' --enable-static
> --target-os=linux
> >> libavutil 60. 2.100 / 60. 2.100
> >> libavcodec 62. 3.101 / 62. 3.101
> >> libavformat 62. 0.102 / 62. 0.102
> >> libavdevice 62. 0.100 / 62. 0.100
> >> libavfilter 11. 0.100 / 11. 0.100
> >> libswscale 9. 0.100 / 9. 0.100
> >> libswresample 6. 0.100 / 6. 0.100
> >> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> >> '/home/pz9115/mp/1080p_hevc_mp3.mp4':
> >> Metadata:
> >> major_brand : isom
> >> minor_version : 512
> >> compatible_brands: isomiso2mp41
> >> title : Big Buck Bunny, Sunflower version
> >> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> >> composer : Sacha Goedegebure
> >> encoder : Lavf60.16.100
> >> comment : Creative Commons Attribution 3.0 -
> >> http://bbb3d.renderfarming.net
> >> genre : Animation
> >> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
> >> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
> >> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
> >> fps, 30 tbr, 15360 tbn (default)
> >> Metadata:
> >> handler_name : GPAC ISO Video Handler
> >> vendor_id : [0][0][0][0]
> >> encoder : Lavc60.31.102 libx265
> >> Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
> >> 48000 Hz, stereo, fltp, 128 kb/s (default)
> >> Metadata:
> >> handler_name : GPAC ISO Audio Handler
> >> vendor_id : [0][0][0][0]
> >> Stream mapping:
> >> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
> >> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
> >> Press [q] to stop, [?] for help
> >> Output #0, null, to 'pipe:':
> >> Metadata:
> >> major_brand : isom
> >> minor_version : 512
> >> compatible_brands: isomiso2mp41
> >> title : Big Buck Bunny, Sunflower version
> >> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> >> composer : Sacha Goedegebure
> >> genre : Animation
> >> comment : Creative Commons Attribution 3.0 -
> >> http://bbb3d.renderfarming.net
> >> encoder : Lavf62.0.102
> >> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
> >> 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
> >> Metadata:
> >> encoder : Lavc62.3.101 wrapped_avframe
> >> handler_name : GPAC ISO Video Handler
> >> vendor_id : [0][0][0][0]
> >> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> >> (default)
> >> Metadata:
> >> encoder : Lavc62.3.101 pcm_s16le
> >> handler_name : GPAC ISO Audio Handler
> >> vendor_id : [0][0][0][0]
> >> [out#0/null @ 0x55eb196b7eb0] video:731KiB audio:11250KiB subtitle:0KiB
> >> other streams:0KiB global headers:0KiB muxing overhead: unknown
> >> frame= 1800 fps=509 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
> >> speed= 17x elapsed=0:00:03.53
> >> bench: utime=21.544s stime=0.349s rtime=3.536s
> >> bench: maxrss=181580KiB
> >>
> >> And I also tested on a RISC-V develop board MUSE Pi Pro, Here following
> >> is the configure and result:
> >>
> >> Using patch:
> >>
> >> root@spacemit-k1-x-MUSE-Pi-Pro-board:~# ./ffpv/bin/ffmpeg -benchmark -i
> >> 1080p_hevc_mp3.mp4 -f null -
> >> ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
> >> built with gcc 16.0.0 (g3fc902e738b) 20250519 (experimental)
> >> configuration: --prefix=/home/pz9115/ffpv --disable-ffplay
> >> --arch=riscv --extra-cflags='-march=rv64gcv_zba_zbb_zbs -O3 -ffast-math'
> >> --cross-prefix=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-
> >> --cc=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-gcc
> >> --cxx=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-g++ --enable-static
> >> --enable-cross-compile --target-os=linux --disable-rvv
> >> libavutil 58. 29.100 / 58. 29.100
> >> libavcodec 60. 31.102 / 60. 31.102
> >> libavformat 60. 16.100 / 60. 16.100
> >> libavdevice 60. 3.100 / 60. 3.100
> >> libavfilter 9. 12.100 / 9. 12.100
> >> libswscale 7. 5.100 / 7. 5.100
> >> libswresample 4. 12.100 / 4. 12.100
> >> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
> >> Metadata:
> >> major_brand : isom
> >> minor_version : 512
> >> compatible_brands: isomiso2mp41
> >> title : Big Buck Bunny, Sunflower version
> >> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> >> composer : Sacha Goedegebure
> >> encoder : Lavf60.16.100
> >> comment : Creative Commons Attribution 3.0 -
> >> http://bbb3d.renderfarming.net
> >> genre : Animation
> >> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
> >> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
> >> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
> >> fps, 30 tbr, 15360 tbn (default)
> >> Metadata:
> >> handler_name : GPAC ISO Video Handler
> >> vendor_id : [0][0][0][0]
> >> encoder : Lavc60.31.102 libx265
> >> Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
> >> stereo, fltp, 128 kb/s (default)
> >> Metadata:
> >> handler_name : GPAC ISO Audio Handler
> >> vendor_id : [0][0][0][0]
> >> Stream mapping:
> >> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
> >> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
> >> Press [q] to stop, [?] for help
> >> Odd rotation angle.
> >> If you want to help, upload a sample of this file to
> >> https://streams.videolan.org/upload/ and contact the ffmpeg-devel
> >> mailing list. (ffmpeg-devel@ffmpeg.org)Output #0, null, to 'pipe:':
> >> Metadata:
> >> major_brand : isom
> >> minor_version : 512
> >> compatible_brands: isomiso2mp41
> >> title : Big Buck Bunny, Sunflower version
> >> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> >> composer : Sacha Goedegebure
> >> genre : Animation
> >> comment : Creative Commons Attribution 3.0 -
> >> http://bbb3d.renderfarming.net
> >> encoder : Lavf60.16.100
> >> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
> >> 1080x1920 [SAR 1:1 DAR 9:16], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
> >> Metadata:
> >> handler_name : GPAC ISO Video Handler
> >> vendor_id : [0][0][0][0]
> >> encoder : Lavc60.31.102 wrapped_avframe
> >> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> >> (default)
> >> Metadata:
> >> handler_name : GPAC ISO Audio Handler
> >> vendor_id : [0][0][0][0]
> >> encoder : Lavc60.31.102 pcm_s16le
> >> [out#0/null @ 0x28a82e0] video:844kB audio:11250kB subtitle:0kB other
> >> streams:0kB global headers:0kB muxing overhead: unknown
> >> frame= 1800 fps= 42 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
> >> speed=1.41x
> >> bench: utime=207.150s stime=5.319s rtime=42.608s
> >> bench: maxrss=162160kB
> >>
> >> Without patch(same added the fno-tree-vectorize directly):
> >>
> >> ./ffp/bin/ffmpeg -benchmark -i 1080p_hevc_mp3.mp4 -f null -
> >> ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
> >> built with gcc 16.0.0 (g38163c874a3-dirty) 20250515 (experimental)
> >> configuration: --prefix=/home/pz9115/ffp --disable-ffplay
> >> --arch=riscv --sysroot=/home/pz9115/rv/sysroot
> >> --extra-cflags='-march=rv64gcv_zba_zbb_zbc_zbs_zca_zcd -mabi=lp64d -O3
> >> -fno-tree-vectorize -static' --extra-ldflags=-static
> >> --cross-prefix=/home/pz9115/rv/bin/riscv64-unknown-linux-gnu-
> >> --enable-static --enable-cross-compile --target-os=linux --disable-rvv
> >> libavutil 58. 29.100 / 58. 29.100
> >> libavcodec 60. 31.102 / 60. 31.102
> >> libavformat 60. 16.100 / 60. 16.100
> >> libavdevice 60. 3.100 / 60. 3.100
> >> libavfilter 9. 12.100 / 9. 12.100
> >> libswscale 7. 5.100 / 7. 5.100
> >> libswresample 4. 12.100 / 4. 12.100
> >> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
> >> Metadata:
> >> major_brand : isom
> >> minor_version : 512
> >> compatible_brands: isomiso2mp41
> >> title : Big Buck Bunny, Sunflower version
> >> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> >> composer : Sacha Goedegebure
> >> encoder : Lavf60.16.100
> >> comment : Creative Commons Attribution 3.0 -
> >> http://bbb3d.renderfarming.net
> >> genre : Animation
> >> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
> >> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
> >> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
> >> fps, 30 tbr, 15360 tbn (default)
> >> Metadata:
> >> handler_name : GPAC ISO Video Handler
> >> vendor_id : [0][0][0][0]
> >> encoder : Lavc60.31.102 libx265
> >> Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
> >> stereo, fltp, 128 kb/s (default)
> >> Metadata:
> >> handler_name : GPAC ISO Audio Handler
> >> vendor_id : [0][0][0][0]
> >> Stream mapping:
> >> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
> >> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
> >> Press [q] to stop, [?] for help
> >> Output #0, null, to 'pipe:':
> >> Metadata:
> >> major_brand : isom
> >> minor_version : 512
> >> compatible_brands: isomiso2mp41
> >> title : Big Buck Bunny, Sunflower version
> >> artist : Blender Foundation 2008, Janus Bager Kristensen
> 2013
> >> composer : Sacha Goedegebure
> >> genre : Animation
> >> comment : Creative Commons Attribution 3.0 -
> >> http://bbb3d.renderfarming.net
> >> encoder : Lavf60.16.100
> >> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
> >> 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
> >> Metadata:
> >> handler_name : GPAC ISO Video Handler
> >> vendor_id : [0][0][0][0]
> >> encoder : Lavc60.31.102 wrapped_avframe
> >> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> >> (default)
> >> Metadata:
> >> handler_name : GPAC ISO Audio Handler
> >> vendor_id : [0][0][0][0]
> >> encoder : Lavc60.31.102 pcm_s16le
> >> [out#0/null @ 0x2729630] video:844kB audio:11250kB subtitle:0kB other
> >> streams:0kB global headers:0kB muxing overhead: unknown
> >> frame= 1800 fps= 30 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
> >> speed= 1x
> >> bench: utime=321.145s stime=2.475s rtime=59.960s
> >> bench: maxrss=131532kB
> >> _______________________________________________
> >> ffmpeg-devel mailing list
> >> ffmpeg-devel@ffmpeg.org
> >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>
> >> To unsubscribe, visit link above, or email
> >> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> > The RISC-V autovectorised output looks like it has a warning "Odd
> > rotation angle" which is not present in the non-autovectorised output.
>
> I found this occured when using '-ffast-math' in RISC-V, also occur in
> -O3 -ffast-math -fno-tree-vectorize case(much slower due to the
> -ffast-math),supplementary more comparison results here:
>
> No -ffast-math, no -fno-tree-vectorize
>
> root@spacemit-k1-x-MUSE-Pi-Pro-board:~# ./ffpv/bin/ffmpeg -benchmark -i
> 1080p_hevc_mp3.mp4 -f null -
> ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
> developers
> built with gcc 16.0.0 (g3fc902e738b) 20250519 (experimental)
> configuration: --prefix=/home/pz9115/ffpv --disable-ffplay
> --arch=riscv --extra-cflags='-march=rv64gcv_zba_zbb_zbs -O3'
> --extra-ldflags=-static
> --cross-prefix=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-
> --cc=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-gcc
> --cxx=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-g++ --enable-static
> --enable-cross-compile --target-os=linux --disable-rvv
> libavutil 60. 2.100 / 60. 2.100
> libavcodec 62. 3.101 / 62. 3.101
> libavformat 62. 0.102 / 62. 0.102
> libavdevice 62. 0.100 / 62. 0.100
> libavfilter 11. 0.100 / 11. 0.100
> libswscale 9. 0.100 / 9. 0.100
> libswresample 6. 0.100 / 6. 0.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> encoder : Lavf60.16.100
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> genre : Animation
> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
> fps, 30 tbr, 15360 tbn (default)
> Metadata:
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> encoder : Lavc60.31.102 libx265
> Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
> 48000 Hz, stereo, fltp, 128 kb/s (default)
> Metadata:
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> Stream mapping:
> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
> Press [q] to stop, [?] for help
> Output #0, null, to 'pipe:':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> genre : Animation
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> encoder : Lavf62.0.102
> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
> 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
> Metadata:
> encoder : Lavc62.3.101 wrapped_avframe
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> (default)
> Metadata:
> encoder : Lavc62.3.101 pcm_s16le
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> [out#0/null @ 0x2b66150] video:731KiB audio:11250KiB subtitle:0KiB other
> streams:0KiB global headers:0KiB muxing overhead: unknown
> frame= 1800 fps= 48 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A speed=
> 1.6x elapsed=0:00:37.39
> bench: utime=165.301s stime=3.171s rtime=37.400s
> bench: maxrss=130208KiB
>
>
> ================================================================================================================================================================
>
> Using -ffast-math with -fno-tree-vectorize:
>
> root@spacemit-k1-x-MUSE-Pi-Pro-board:~# ./ffp/bin/ffmpeg -benchmark -i
> 1080p_hevc_mp3.mp4 -f null -
> ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
> developers
> built with gcc 16.0.0 (g3fc902e738b) 20250519 (experimental)
> configuration: --prefix=/home/pz9115/ffpv --disable-ffplay
> --arch=riscv --extra-cflags='-march=rv64gcv_zba_zbb_zbs -O3 -ffast-math
> -fno-tree-vectorize' --extra-ldflags=-static
> --cross-prefix=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-
> --cc=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-gcc
> --cxx=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-g++ --enable-static
> --enable-cross-compile --target-os=linux --disable-rvv
> libavutil 60. 2.100 / 60. 2.100
> libavcodec 62. 3.101 / 62. 3.101
> libavformat 62. 0.102 / 62. 0.102
> libavdevice 62. 0.100 / 62. 0.100
> libavfilter 11. 0.100 / 11. 0.100
> libswscale 9. 0.100 / 9. 0.100
> libswresample 6. 0.100 / 6. 0.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> encoder : Lavf60.16.100
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> genre : Animation
> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
> fps, 30 tbr, 15360 tbn (default)
> Metadata:
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> encoder : Lavc60.31.102 libx265
> Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
> 48000 Hz, stereo, fltp, 128 kb/s (default)
> Metadata:
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> Stream mapping:
> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
> Press [q] to stop, [?] for help
> Odd rotation angle.
> If you want to help, upload a sample of this file to
> https://streams.videolan.org/upload/ and contact the ffmpeg-devel
> mailing list. (ffmpeg-devel@ffmpeg.org)Output #0, null, to 'pipe:':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> genre : Animation
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> encoder : Lavf62.0.102
> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
> 1080x1920 [SAR 1:1 DAR 9:16], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
> Metadata:
> encoder : Lavc62.3.101 wrapped_avframe
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> (default)
> Metadata:
> encoder : Lavc62.3.101 pcm_s16le
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> [out#0/null @ 0x2915150] video:731KiB audio:11250KiB subtitle:0KiB other
> streams:0KiB global headers:0KiB muxing overhead: unknown
> frame= 1800 fps= 12 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
> speed=0.391x elapsed=0:02:33.38
> bench: utime=445.912s stime=8.033s rtime=153.385s
> bench: maxrss=2212516KiB
>
>
> >
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
Here is a particularly bad example of autovectorisation across many
compilers:
https://gcc.godbolt.org/z/rjEqzf1hh
Kieran
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 18:21 ` Frank Plowman
@ 2025-05-22 6:32 ` Jiawei
2025-05-24 1:46 ` Kieran Kunhya via ffmpeg-devel
2025-05-24 16:10 ` Rémi Denis-Courmont
0 siblings, 2 replies; 28+ messages in thread
From: Jiawei @ 2025-05-22 6:32 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: post
在 2025/5/22 2:21, Frank Plowman 写道:
> On 21/05/2025 11:17, Jiawei wrote:
>> 在 2025/5/21 14:52, Nicolas George 写道:
>>> Jiawei (HE12025-05-21):
>>>> particularly improving
>>>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
>>> Benchmark needed.
>>>
>>> Regards,
>>
>> Hi Nicolas,
>>
>>
>> Since I am a gcc developer, I'm not so familiar with the FFmpeg test
>> flow, here is my test process,
>> if there exists anything uncorrect, please point me out:
>>
>>
>> 1. Download the video bbb_sunflower_2160p_30fps_normal.mp4.zip
>> <https://download.blender.org/demo/movies/BBB/bbb_sunflower_2160p_30fps_normal.mp4.zip>
>> from https://download.blender.org/demo/movies/BBB/,
>>
>> ```
>>
>> ffmpeg -i bbb_sunflower_2160p_30fps_normal.mp4 -t 60 -vf
>> "scale=1920:1080" -c:v libx265 -c:a libmp3lame 1080p_hevc_mp3.mp4
>> ```
>>
>> get the 1080p video as Benchmark test video
>>
>>
>> 2. Build two version of FFmpeg, one with the modify, another without
>> the patch modif, using the gcc 13.3 release version,
>>
>> verified with Intel(R) Core(TM) Ultra 9 285HX
>>
>>
>> Using patch:
>>
>> ```
>> ./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
>> ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
>> developers
>> built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
>> configuration: --prefix=/home/pz9115/ffpo --disable-ffplay --arch=x64
>> --extra-cflags=-O3 --enable-static --target-os=linux
>> libavutil 60. 2.100 / 60. 2.100
>> libavcodec 62. 3.101 / 62. 3.101
>> libavformat 62. 0.102 / 62. 0.102
>> libavdevice 62. 0.100 / 62. 0.100
>> libavfilter 11. 0.100 / 11. 0.100
>> libswscale 9. 0.100 / 9. 0.100
>> libswresample 6. 0.100 / 6. 0.100
>> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
>> '/home/pz9115/mp/1080p_hevc_mp3.mp4':
>> Metadata:
>> major_brand : isom
>> minor_version : 512
>> compatible_brands: isomiso2mp41
>> title : Big Buck Bunny, Sunflower version
>> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
>> composer : Sacha Goedegebure
>> encoder : Lavf60.16.100
>> comment : Creative Commons Attribution 3.0 -
>> http://bbb3d.renderfarming.net
>> genre : Animation
>> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
>> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
>> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
>> fps, 30 tbr, 15360 tbn (default)
>> Metadata:
>> handler_name : GPAC ISO Video Handler
>> vendor_id : [0][0][0][0]
>> encoder : Lavc60.31.102 libx265
>> Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
>> 48000 Hz, stereo, fltp, 128 kb/s (default)
>> Metadata:
>> handler_name : GPAC ISO Audio Handler
>> vendor_id : [0][0][0][0]
>> Stream mapping:
>> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
>> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
>> Press [q] to stop, [?] for help
>> Output #0, null, to 'pipe:':
>> Metadata:
>> major_brand : isom
>> minor_version : 512
>> compatible_brands: isomiso2mp41
>> title : Big Buck Bunny, Sunflower version
>> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
>> composer : Sacha Goedegebure
>> genre : Animation
>> comment : Creative Commons Attribution 3.0 -
>> http://bbb3d.renderfarming.net
>> encoder : Lavf62.0.102
>> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
>> 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
>> Metadata:
>> encoder : Lavc62.3.101 wrapped_avframe
>> handler_name : GPAC ISO Video Handler
>> vendor_id : [0][0][0][0]
>> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
>> (default)
>> Metadata:
>> encoder : Lavc62.3.101 pcm_s16le
>> handler_name : GPAC ISO Audio Handler
>> vendor_id : [0][0][0][0]
>> [out#0/null @ 0x565233669eb0] video:731KiB audio:11250KiB subtitle:0KiB
>> other streams:0KiB global headers:0KiB muxing overhead: unknown
>> frame= 1800 fps=635 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
>> speed=21.2x elapsed=0:00:02.83
>> bench: utime=11.324s stime=0.290s rtime=2.834s
>> bench: maxrss=186556KiB
>> ```
>>
>> Without patch(here I add the fno-tree-vectorize directly):
>>
>> ./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
>> ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
>> developers
>> built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
>> configuration: --prefix=/home/pz9115/ffpo --disable-ffplay --arch=x64
>> --extra-cflags='-O3 -fno-tree-vectorize' --enable-static --target-os=linux
>> libavutil 60. 2.100 / 60. 2.100
>> libavcodec 62. 3.101 / 62. 3.101
>> libavformat 62. 0.102 / 62. 0.102
>> libavdevice 62. 0.100 / 62. 0.100
>> libavfilter 11. 0.100 / 11. 0.100
>> libswscale 9. 0.100 / 9. 0.100
>> libswresample 6. 0.100 / 6. 0.100
>> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
>> '/home/pz9115/mp/1080p_hevc_mp3.mp4':
>> Metadata:
>> major_brand : isom
>> minor_version : 512
>> compatible_brands: isomiso2mp41
>> title : Big Buck Bunny, Sunflower version
>> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
>> composer : Sacha Goedegebure
>> encoder : Lavf60.16.100
>> comment : Creative Commons Attribution 3.0 -
>> http://bbb3d.renderfarming.net
>> genre : Animation
>> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
>> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
>> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
>> fps, 30 tbr, 15360 tbn (default)
>> Metadata:
>> handler_name : GPAC ISO Video Handler
>> vendor_id : [0][0][0][0]
>> encoder : Lavc60.31.102 libx265
>> Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
>> 48000 Hz, stereo, fltp, 128 kb/s (default)
>> Metadata:
>> handler_name : GPAC ISO Audio Handler
>> vendor_id : [0][0][0][0]
>> Stream mapping:
>> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
>> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
>> Press [q] to stop, [?] for help
>> Output #0, null, to 'pipe:':
>> Metadata:
>> major_brand : isom
>> minor_version : 512
>> compatible_brands: isomiso2mp41
>> title : Big Buck Bunny, Sunflower version
>> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
>> composer : Sacha Goedegebure
>> genre : Animation
>> comment : Creative Commons Attribution 3.0 -
>> http://bbb3d.renderfarming.net
>> encoder : Lavf62.0.102
>> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
>> 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
>> Metadata:
>> encoder : Lavc62.3.101 wrapped_avframe
>> handler_name : GPAC ISO Video Handler
>> vendor_id : [0][0][0][0]
>> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
>> (default)
>> Metadata:
>> encoder : Lavc62.3.101 pcm_s16le
>> handler_name : GPAC ISO Audio Handler
>> vendor_id : [0][0][0][0]
>> [out#0/null @ 0x55eb196b7eb0] video:731KiB audio:11250KiB subtitle:0KiB
>> other streams:0KiB global headers:0KiB muxing overhead: unknown
>> frame= 1800 fps=509 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
>> speed= 17x elapsed=0:00:03.53
>> bench: utime=21.544s stime=0.349s rtime=3.536s
>> bench: maxrss=181580KiB
>>
>> And I also tested on a RISC-V develop board MUSE Pi Pro, Here following
>> is the configure and result:
>>
>> Using patch:
>>
>> root@spacemit-k1-x-MUSE-Pi-Pro-board:~# ./ffpv/bin/ffmpeg -benchmark -i
>> 1080p_hevc_mp3.mp4 -f null -
>> ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
>> built with gcc 16.0.0 (g3fc902e738b) 20250519 (experimental)
>> configuration: --prefix=/home/pz9115/ffpv --disable-ffplay
>> --arch=riscv --extra-cflags='-march=rv64gcv_zba_zbb_zbs -O3 -ffast-math'
>> --cross-prefix=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-
>> --cc=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-gcc
>> --cxx=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-g++ --enable-static
>> --enable-cross-compile --target-os=linux --disable-rvv
>> libavutil 58. 29.100 / 58. 29.100
>> libavcodec 60. 31.102 / 60. 31.102
>> libavformat 60. 16.100 / 60. 16.100
>> libavdevice 60. 3.100 / 60. 3.100
>> libavfilter 9. 12.100 / 9. 12.100
>> libswscale 7. 5.100 / 7. 5.100
>> libswresample 4. 12.100 / 4. 12.100
>> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
>> Metadata:
>> major_brand : isom
>> minor_version : 512
>> compatible_brands: isomiso2mp41
>> title : Big Buck Bunny, Sunflower version
>> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
>> composer : Sacha Goedegebure
>> encoder : Lavf60.16.100
>> comment : Creative Commons Attribution 3.0 -
>> http://bbb3d.renderfarming.net
>> genre : Animation
>> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
>> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
>> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
>> fps, 30 tbr, 15360 tbn (default)
>> Metadata:
>> handler_name : GPAC ISO Video Handler
>> vendor_id : [0][0][0][0]
>> encoder : Lavc60.31.102 libx265
>> Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
>> stereo, fltp, 128 kb/s (default)
>> Metadata:
>> handler_name : GPAC ISO Audio Handler
>> vendor_id : [0][0][0][0]
>> Stream mapping:
>> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
>> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
>> Press [q] to stop, [?] for help
>> Odd rotation angle.
>> If you want to help, upload a sample of this file to
>> https://streams.videolan.org/upload/ and contact the ffmpeg-devel
>> mailing list. (ffmpeg-devel@ffmpeg.org)Output #0, null, to 'pipe:':
>> Metadata:
>> major_brand : isom
>> minor_version : 512
>> compatible_brands: isomiso2mp41
>> title : Big Buck Bunny, Sunflower version
>> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
>> composer : Sacha Goedegebure
>> genre : Animation
>> comment : Creative Commons Attribution 3.0 -
>> http://bbb3d.renderfarming.net
>> encoder : Lavf60.16.100
>> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
>> 1080x1920 [SAR 1:1 DAR 9:16], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
>> Metadata:
>> handler_name : GPAC ISO Video Handler
>> vendor_id : [0][0][0][0]
>> encoder : Lavc60.31.102 wrapped_avframe
>> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
>> (default)
>> Metadata:
>> handler_name : GPAC ISO Audio Handler
>> vendor_id : [0][0][0][0]
>> encoder : Lavc60.31.102 pcm_s16le
>> [out#0/null @ 0x28a82e0] video:844kB audio:11250kB subtitle:0kB other
>> streams:0kB global headers:0kB muxing overhead: unknown
>> frame= 1800 fps= 42 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
>> speed=1.41x
>> bench: utime=207.150s stime=5.319s rtime=42.608s
>> bench: maxrss=162160kB
>>
>> Without patch(same added the fno-tree-vectorize directly):
>>
>> ./ffp/bin/ffmpeg -benchmark -i 1080p_hevc_mp3.mp4 -f null -
>> ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
>> built with gcc 16.0.0 (g38163c874a3-dirty) 20250515 (experimental)
>> configuration: --prefix=/home/pz9115/ffp --disable-ffplay
>> --arch=riscv --sysroot=/home/pz9115/rv/sysroot
>> --extra-cflags='-march=rv64gcv_zba_zbb_zbc_zbs_zca_zcd -mabi=lp64d -O3
>> -fno-tree-vectorize -static' --extra-ldflags=-static
>> --cross-prefix=/home/pz9115/rv/bin/riscv64-unknown-linux-gnu-
>> --enable-static --enable-cross-compile --target-os=linux --disable-rvv
>> libavutil 58. 29.100 / 58. 29.100
>> libavcodec 60. 31.102 / 60. 31.102
>> libavformat 60. 16.100 / 60. 16.100
>> libavdevice 60. 3.100 / 60. 3.100
>> libavfilter 9. 12.100 / 9. 12.100
>> libswscale 7. 5.100 / 7. 5.100
>> libswresample 4. 12.100 / 4. 12.100
>> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
>> Metadata:
>> major_brand : isom
>> minor_version : 512
>> compatible_brands: isomiso2mp41
>> title : Big Buck Bunny, Sunflower version
>> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
>> composer : Sacha Goedegebure
>> encoder : Lavf60.16.100
>> comment : Creative Commons Attribution 3.0 -
>> http://bbb3d.renderfarming.net
>> genre : Animation
>> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
>> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
>> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
>> fps, 30 tbr, 15360 tbn (default)
>> Metadata:
>> handler_name : GPAC ISO Video Handler
>> vendor_id : [0][0][0][0]
>> encoder : Lavc60.31.102 libx265
>> Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
>> stereo, fltp, 128 kb/s (default)
>> Metadata:
>> handler_name : GPAC ISO Audio Handler
>> vendor_id : [0][0][0][0]
>> Stream mapping:
>> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
>> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
>> Press [q] to stop, [?] for help
>> Output #0, null, to 'pipe:':
>> Metadata:
>> major_brand : isom
>> minor_version : 512
>> compatible_brands: isomiso2mp41
>> title : Big Buck Bunny, Sunflower version
>> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
>> composer : Sacha Goedegebure
>> genre : Animation
>> comment : Creative Commons Attribution 3.0 -
>> http://bbb3d.renderfarming.net
>> encoder : Lavf60.16.100
>> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
>> 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
>> Metadata:
>> handler_name : GPAC ISO Video Handler
>> vendor_id : [0][0][0][0]
>> encoder : Lavc60.31.102 wrapped_avframe
>> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
>> (default)
>> Metadata:
>> handler_name : GPAC ISO Audio Handler
>> vendor_id : [0][0][0][0]
>> encoder : Lavc60.31.102 pcm_s16le
>> [out#0/null @ 0x2729630] video:844kB audio:11250kB subtitle:0kB other
>> streams:0kB global headers:0kB muxing overhead: unknown
>> frame= 1800 fps= 30 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
>> speed= 1x
>> bench: utime=321.145s stime=2.475s rtime=59.960s
>> bench: maxrss=131532kB
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> The RISC-V autovectorised output looks like it has a warning "Odd
> rotation angle" which is not present in the non-autovectorised output.
I found this occured when using '-ffast-math' in RISC-V, also occur in
-O3 -ffast-math -fno-tree-vectorize case(much slower due to the
-ffast-math),supplementary more comparison results here:
No -ffast-math, no -fno-tree-vectorize
root@spacemit-k1-x-MUSE-Pi-Pro-board:~# ./ffpv/bin/ffmpeg -benchmark -i
1080p_hevc_mp3.mp4 -f null -
ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
developers
built with gcc 16.0.0 (g3fc902e738b) 20250519 (experimental)
configuration: --prefix=/home/pz9115/ffpv --disable-ffplay
--arch=riscv --extra-cflags='-march=rv64gcv_zba_zbb_zbs -O3'
--extra-ldflags=-static
--cross-prefix=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-
--cc=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-gcc
--cxx=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-g++ --enable-static
--enable-cross-compile --target-os=linux --disable-rvv
libavutil 60. 2.100 / 60. 2.100
libavcodec 62. 3.101 / 62. 3.101
libavformat 62. 0.102 / 62. 0.102
libavdevice 62. 0.100 / 62. 0.100
libavfilter 11. 0.100 / 11. 0.100
libswscale 9. 0.100 / 9. 0.100
libswresample 6. 0.100 / 6. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf62.0.102
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
encoder : Lavc62.3.101 wrapped_avframe
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
encoder : Lavc62.3.101 pcm_s16le
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
[out#0/null @ 0x2b66150] video:731KiB audio:11250KiB subtitle:0KiB other
streams:0KiB global headers:0KiB muxing overhead: unknown
frame= 1800 fps= 48 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A speed=
1.6x elapsed=0:00:37.39
bench: utime=165.301s stime=3.171s rtime=37.400s
bench: maxrss=130208KiB
================================================================================================================================================================
Using -ffast-math with -fno-tree-vectorize:
root@spacemit-k1-x-MUSE-Pi-Pro-board:~# ./ffp/bin/ffmpeg -benchmark -i
1080p_hevc_mp3.mp4 -f null -
ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
developers
built with gcc 16.0.0 (g3fc902e738b) 20250519 (experimental)
configuration: --prefix=/home/pz9115/ffpv --disable-ffplay
--arch=riscv --extra-cflags='-march=rv64gcv_zba_zbb_zbs -O3 -ffast-math
-fno-tree-vectorize' --extra-ldflags=-static
--cross-prefix=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-
--cc=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-gcc
--cxx=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-g++ --enable-static
--enable-cross-compile --target-os=linux --disable-rvv
libavutil 60. 2.100 / 60. 2.100
libavcodec 62. 3.101 / 62. 3.101
libavformat 62. 0.102 / 62. 0.102
libavdevice 62. 0.100 / 62. 0.100
libavfilter 11. 0.100 / 11. 0.100
libswscale 9. 0.100 / 9. 0.100
libswresample 6. 0.100 / 6. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Odd rotation angle.
If you want to help, upload a sample of this file to
https://streams.videolan.org/upload/ and contact the ffmpeg-devel
mailing list. (ffmpeg-devel@ffmpeg.org)Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf62.0.102
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1080x1920 [SAR 1:1 DAR 9:16], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
encoder : Lavc62.3.101 wrapped_avframe
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
encoder : Lavc62.3.101 pcm_s16le
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
[out#0/null @ 0x2915150] video:731KiB audio:11250KiB subtitle:0KiB other
streams:0KiB global headers:0KiB muxing overhead: unknown
frame= 1800 fps= 12 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
speed=0.391x elapsed=0:02:33.38
bench: utime=445.912s stime=8.033s rtime=153.385s
bench: maxrss=2212516KiB
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 10:17 ` Jiawei
@ 2025-05-21 18:21 ` Frank Plowman
2025-05-22 6:32 ` Jiawei
0 siblings, 1 reply; 28+ messages in thread
From: Frank Plowman @ 2025-05-21 18:21 UTC (permalink / raw)
To: ffmpeg-devel
On 21/05/2025 11:17, Jiawei wrote:
>
> 在 2025/5/21 14:52, Nicolas George 写道:
>> Jiawei (HE12025-05-21):
>>> particularly improving
>>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
>> Benchmark needed.
>>
>> Regards,
>
>
> Hi Nicolas,
>
>
> Since I am a gcc developer, I'm not so familiar with the FFmpeg test
> flow, here is my test process,
> if there exists anything uncorrect, please point me out:
>
>
> 1. Download the video bbb_sunflower_2160p_30fps_normal.mp4.zip
> <https://download.blender.org/demo/movies/BBB/bbb_sunflower_2160p_30fps_normal.mp4.zip>
> from https://download.blender.org/demo/movies/BBB/,
>
> ```
>
> ffmpeg -i bbb_sunflower_2160p_30fps_normal.mp4 -t 60 -vf
> "scale=1920:1080" -c:v libx265 -c:a libmp3lame 1080p_hevc_mp3.mp4
> ```
>
> get the 1080p video as Benchmark test video
>
>
> 2. Build two version of FFmpeg, one with the modify, another without
> the patch modif, using the gcc 13.3 release version,
>
> verified with Intel(R) Core(TM) Ultra 9 285HX
>
>
> Using patch:
>
> ```
> ./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
> ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
> developers
> built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
> configuration: --prefix=/home/pz9115/ffpo --disable-ffplay --arch=x64
> --extra-cflags=-O3 --enable-static --target-os=linux
> libavutil 60. 2.100 / 60. 2.100
> libavcodec 62. 3.101 / 62. 3.101
> libavformat 62. 0.102 / 62. 0.102
> libavdevice 62. 0.100 / 62. 0.100
> libavfilter 11. 0.100 / 11. 0.100
> libswscale 9. 0.100 / 9. 0.100
> libswresample 6. 0.100 / 6. 0.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> '/home/pz9115/mp/1080p_hevc_mp3.mp4':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> encoder : Lavf60.16.100
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> genre : Animation
> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
> fps, 30 tbr, 15360 tbn (default)
> Metadata:
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> encoder : Lavc60.31.102 libx265
> Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
> 48000 Hz, stereo, fltp, 128 kb/s (default)
> Metadata:
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> Stream mapping:
> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
> Press [q] to stop, [?] for help
> Output #0, null, to 'pipe:':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> genre : Animation
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> encoder : Lavf62.0.102
> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
> 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
> Metadata:
> encoder : Lavc62.3.101 wrapped_avframe
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> (default)
> Metadata:
> encoder : Lavc62.3.101 pcm_s16le
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> [out#0/null @ 0x565233669eb0] video:731KiB audio:11250KiB subtitle:0KiB
> other streams:0KiB global headers:0KiB muxing overhead: unknown
> frame= 1800 fps=635 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
> speed=21.2x elapsed=0:00:02.83
> bench: utime=11.324s stime=0.290s rtime=2.834s
> bench: maxrss=186556KiB
> ```
>
> Without patch(here I add the fno-tree-vectorize directly):
>
> ./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
> ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
> developers
> built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
> configuration: --prefix=/home/pz9115/ffpo --disable-ffplay --arch=x64
> --extra-cflags='-O3 -fno-tree-vectorize' --enable-static --target-os=linux
> libavutil 60. 2.100 / 60. 2.100
> libavcodec 62. 3.101 / 62. 3.101
> libavformat 62. 0.102 / 62. 0.102
> libavdevice 62. 0.100 / 62. 0.100
> libavfilter 11. 0.100 / 11. 0.100
> libswscale 9. 0.100 / 9. 0.100
> libswresample 6. 0.100 / 6. 0.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> '/home/pz9115/mp/1080p_hevc_mp3.mp4':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> encoder : Lavf60.16.100
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> genre : Animation
> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
> fps, 30 tbr, 15360 tbn (default)
> Metadata:
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> encoder : Lavc60.31.102 libx265
> Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
> 48000 Hz, stereo, fltp, 128 kb/s (default)
> Metadata:
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> Stream mapping:
> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
> Press [q] to stop, [?] for help
> Output #0, null, to 'pipe:':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> genre : Animation
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> encoder : Lavf62.0.102
> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
> 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
> Metadata:
> encoder : Lavc62.3.101 wrapped_avframe
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> (default)
> Metadata:
> encoder : Lavc62.3.101 pcm_s16le
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> [out#0/null @ 0x55eb196b7eb0] video:731KiB audio:11250KiB subtitle:0KiB
> other streams:0KiB global headers:0KiB muxing overhead: unknown
> frame= 1800 fps=509 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
> speed= 17x elapsed=0:00:03.53
> bench: utime=21.544s stime=0.349s rtime=3.536s
> bench: maxrss=181580KiB
>
> And I also tested on a RISC-V develop board MUSE Pi Pro, Here following
> is the configure and result:
>
> Using patch:
>
> root@spacemit-k1-x-MUSE-Pi-Pro-board:~# ./ffpv/bin/ffmpeg -benchmark -i
> 1080p_hevc_mp3.mp4 -f null -
> ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
> built with gcc 16.0.0 (g3fc902e738b) 20250519 (experimental)
> configuration: --prefix=/home/pz9115/ffpv --disable-ffplay
> --arch=riscv --extra-cflags='-march=rv64gcv_zba_zbb_zbs -O3 -ffast-math'
> --cross-prefix=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-
> --cc=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-gcc
> --cxx=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-g++ --enable-static
> --enable-cross-compile --target-os=linux --disable-rvv
> libavutil 58. 29.100 / 58. 29.100
> libavcodec 60. 31.102 / 60. 31.102
> libavformat 60. 16.100 / 60. 16.100
> libavdevice 60. 3.100 / 60. 3.100
> libavfilter 9. 12.100 / 9. 12.100
> libswscale 7. 5.100 / 7. 5.100
> libswresample 4. 12.100 / 4. 12.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> encoder : Lavf60.16.100
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> genre : Animation
> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
> fps, 30 tbr, 15360 tbn (default)
> Metadata:
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> encoder : Lavc60.31.102 libx265
> Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
> stereo, fltp, 128 kb/s (default)
> Metadata:
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> Stream mapping:
> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
> Press [q] to stop, [?] for help
> Odd rotation angle.
> If you want to help, upload a sample of this file to
> https://streams.videolan.org/upload/ and contact the ffmpeg-devel
> mailing list. (ffmpeg-devel@ffmpeg.org)Output #0, null, to 'pipe:':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> genre : Animation
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> encoder : Lavf60.16.100
> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
> 1080x1920 [SAR 1:1 DAR 9:16], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
> Metadata:
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> encoder : Lavc60.31.102 wrapped_avframe
> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> (default)
> Metadata:
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> encoder : Lavc60.31.102 pcm_s16le
> [out#0/null @ 0x28a82e0] video:844kB audio:11250kB subtitle:0kB other
> streams:0kB global headers:0kB muxing overhead: unknown
> frame= 1800 fps= 42 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
> speed=1.41x
> bench: utime=207.150s stime=5.319s rtime=42.608s
> bench: maxrss=162160kB
>
> Without patch(same added the fno-tree-vectorize directly):
>
> ./ffp/bin/ffmpeg -benchmark -i 1080p_hevc_mp3.mp4 -f null -
> ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
> built with gcc 16.0.0 (g38163c874a3-dirty) 20250515 (experimental)
> configuration: --prefix=/home/pz9115/ffp --disable-ffplay
> --arch=riscv --sysroot=/home/pz9115/rv/sysroot
> --extra-cflags='-march=rv64gcv_zba_zbb_zbc_zbs_zca_zcd -mabi=lp64d -O3
> -fno-tree-vectorize -static' --extra-ldflags=-static
> --cross-prefix=/home/pz9115/rv/bin/riscv64-unknown-linux-gnu-
> --enable-static --enable-cross-compile --target-os=linux --disable-rvv
> libavutil 58. 29.100 / 58. 29.100
> libavcodec 60. 31.102 / 60. 31.102
> libavformat 60. 16.100 / 60. 16.100
> libavdevice 60. 3.100 / 60. 3.100
> libavfilter 9. 12.100 / 9. 12.100
> libswscale 7. 5.100 / 7. 5.100
> libswresample 4. 12.100 / 4. 12.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> encoder : Lavf60.16.100
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> genre : Animation
> Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
> Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
> yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
> fps, 30 tbr, 15360 tbn (default)
> Metadata:
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> encoder : Lavc60.31.102 libx265
> Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
> stereo, fltp, 128 kb/s (default)
> Metadata:
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> Stream mapping:
> Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
> Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
> Press [q] to stop, [?] for help
> Output #0, null, to 'pipe:':
> Metadata:
> major_brand : isom
> minor_version : 512
> compatible_brands: isomiso2mp41
> title : Big Buck Bunny, Sunflower version
> artist : Blender Foundation 2008, Janus Bager Kristensen 2013
> composer : Sacha Goedegebure
> genre : Animation
> comment : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
> encoder : Lavf60.16.100
> Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
> 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
> Metadata:
> handler_name : GPAC ISO Video Handler
> vendor_id : [0][0][0][0]
> encoder : Lavc60.31.102 wrapped_avframe
> Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> (default)
> Metadata:
> handler_name : GPAC ISO Audio Handler
> vendor_id : [0][0][0][0]
> encoder : Lavc60.31.102 pcm_s16le
> [out#0/null @ 0x2729630] video:844kB audio:11250kB subtitle:0kB other
> streams:0kB global headers:0kB muxing overhead: unknown
> frame= 1800 fps= 30 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
> speed= 1x
> bench: utime=321.145s stime=2.475s rtime=59.960s
> bench: maxrss=131532kB
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
The RISC-V autovectorised output looks like it has a warning "Odd
rotation angle" which is not present in the non-autovectorised output.
--
Frank
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 12:22 ` Martin Storsjö
@ 2025-05-21 18:12 ` softworkz .
0 siblings, 0 replies; 28+ messages in thread
From: softworkz . @ 2025-05-21 18:12 UTC (permalink / raw)
To: FFmpeg development discussions and patches
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces@ffmpeg.org> On Behalf Of Martin
> Storsjö
> Sent: Mittwoch, 21. Mai 2025 14:22
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
>
> On Wed, 21 May 2025, Andreas Rheinhardt wrote:
>
> > Martin Storsjö:
> >> On Wed, 21 May 2025, Andreas Rheinhardt wrote:
> >>
> >>> Jiawei:
> >>>> This patch modifies the FFmpeg build system to remove the explicit
> >>>> disabling
> >>>> of GCC's auto-vectorization feature.
> >>>>
> >>>> Modern GCC versions (>= 10.0) have demonstrated stable auto-
> >>>> vectorization
> >>>> capabilities through extensive optimizations in loop analysis and SIMD
> >>>> code generation. The explicit -fno-tree-vectorize flag originally added
> >>>> in commit 973859f (2009) to workaround early GCC vectorization
> >>>> instability
> >>>> is no longer necessary.
> >>>>
> >>>> Key improvements justifying this change:
> >>>> 1. Enhanced heuristics for loop vectorization cost models
> >>>> 2. Mature handling of alignment and memory access patterns
> >>>> 3. Robust fallback mechanisms for unsupported architectures
> >>>>
> >>>> This change allows FFmpeg to benefit from automated SIMD optimizations
> >>>> when built with -O3 optimization level, particularly improving
> >>>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
> >>>>
> >>>> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/
> >>>> commit/973859f5230e77beea7bb59dc081870689d6d191
> >>>>
> >>>> ---
> >>>> configure | 1 -
> >>>> 1 file changed, 1 deletion(-)
> >>>>
> >>>> diff --git a/configure b/configure
> >>>> index 3730b0524c..b9e95ce4ec 100755
> >>>> --- a/configure
> >>>> +++ b/configure
> >>>> @@ -7656,7 +7656,6 @@ if enabled icc; then
> >>>> disable aligned_stack
> >>>> fi
> >>>> elif enabled gcc; then
> >>>> - check_optflags -fno-tree-vectorize
> >>>> check_cflags -Werror=format-security
> >>>> check_cflags -Werror=implicit-function-declaration
> >>>> check_cflags -Werror=missing-prototypes
> >>>
> >>> FYI: The last discussion about auto-vectorization is here:
> >>> https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299405.html
> >>> It contains a report about a failing build with vectorization enabled:
> >>> https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299421.html
> >>> I don't know whether this is still reproducible with the latest GCC.
> >>
> >> The issue which was reported last time, when compiling for i686 mingw32
> >> with --cpu=haswell, seems to have gone away in
> >> 182663a58a7a099e02e76da3b0f96d63e5c26a6d, where we made the whole
> >> problematic x86 inline cabac assembly noinline on i386. (That whole
> >> inline assembly block has been problematic in a large number of cases
> >> anyway.)
> >>
> >
> > So there are currently no known miscompilations due to vectorization
> > with GCC?
>
> I'm not aware of any, but I haven't tested widely. It certainly is worth
> evalulating.
>
> (From dav1d, I can anecdotally add that autovectorization does seem to
> help, somewhat, especially when there's not 100% assembly coverage for the
> use case. For some cases it make things slower than without
> autovectorization, but generally the net result is positive.)
>
> // Martin
> _______________________________________________
Hi,
a few years ago, I had spent days on that subject. Intel have some great
tools which allow precise analysis of how the compiler applies those
vectorization and loop optimizations - and it also works when it was
compiled with gcc, which is what I had been investigating. Focus was
the code in the vf_tonemap filter, later I briefly confirmed my findings
by looking at some other examples. Platform was x86_x64 only.
The outcome was that enabling tree-vectorize is beneficial, but combining
it with -O3 has adverse effects. Since then, we are using -O2 with
tree-vectorization enabled on all platforms.
For CPU tone mapping, I still ended up doing a SIMD implementation using
Intel intrinsics 😊
Best
sw
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 12:14 ` Andreas Rheinhardt
@ 2025-05-21 12:22 ` Martin Storsjö
2025-05-21 18:12 ` softworkz .
0 siblings, 1 reply; 28+ messages in thread
From: Martin Storsjö @ 2025-05-21 12:22 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Wed, 21 May 2025, Andreas Rheinhardt wrote:
> Martin Storsjö:
>> On Wed, 21 May 2025, Andreas Rheinhardt wrote:
>>
>>> Jiawei:
>>>> This patch modifies the FFmpeg build system to remove the explicit
>>>> disabling
>>>> of GCC's auto-vectorization feature.
>>>>
>>>> Modern GCC versions (>= 10.0) have demonstrated stable auto-
>>>> vectorization
>>>> capabilities through extensive optimizations in loop analysis and SIMD
>>>> code generation. The explicit -fno-tree-vectorize flag originally added
>>>> in commit 973859f (2009) to workaround early GCC vectorization
>>>> instability
>>>> is no longer necessary.
>>>>
>>>> Key improvements justifying this change:
>>>> 1. Enhanced heuristics for loop vectorization cost models
>>>> 2. Mature handling of alignment and memory access patterns
>>>> 3. Robust fallback mechanisms for unsupported architectures
>>>>
>>>> This change allows FFmpeg to benefit from automated SIMD optimizations
>>>> when built with -O3 optimization level, particularly improving
>>>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
>>>>
>>>> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/
>>>> commit/973859f5230e77beea7bb59dc081870689d6d191
>>>>
>>>> ---
>>>> configure | 1 -
>>>> 1 file changed, 1 deletion(-)
>>>>
>>>> diff --git a/configure b/configure
>>>> index 3730b0524c..b9e95ce4ec 100755
>>>> --- a/configure
>>>> +++ b/configure
>>>> @@ -7656,7 +7656,6 @@ if enabled icc; then
>>>> disable aligned_stack
>>>> fi
>>>> elif enabled gcc; then
>>>> - check_optflags -fno-tree-vectorize
>>>> check_cflags -Werror=format-security
>>>> check_cflags -Werror=implicit-function-declaration
>>>> check_cflags -Werror=missing-prototypes
>>>
>>> FYI: The last discussion about auto-vectorization is here:
>>> https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299405.html
>>> It contains a report about a failing build with vectorization enabled:
>>> https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299421.html
>>> I don't know whether this is still reproducible with the latest GCC.
>>
>> The issue which was reported last time, when compiling for i686 mingw32
>> with --cpu=haswell, seems to have gone away in
>> 182663a58a7a099e02e76da3b0f96d63e5c26a6d, where we made the whole
>> problematic x86 inline cabac assembly noinline on i386. (That whole
>> inline assembly block has been problematic in a large number of cases
>> anyway.)
>>
>
> So there are currently no known miscompilations due to vectorization
> with GCC?
I'm not aware of any, but I haven't tested widely. It certainly is worth
evalulating.
(From dav1d, I can anecdotally add that autovectorization does seem to
help, somewhat, especially when there's not 100% assembly coverage for the
use case. For some cases it make things slower than without
autovectorization, but generally the net result is positive.)
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 12:09 ` Martin Storsjö
@ 2025-05-21 12:14 ` Andreas Rheinhardt
2025-05-21 12:22 ` Martin Storsjö
0 siblings, 1 reply; 28+ messages in thread
From: Andreas Rheinhardt @ 2025-05-21 12:14 UTC (permalink / raw)
To: ffmpeg-devel
Martin Storsjö:
> On Wed, 21 May 2025, Andreas Rheinhardt wrote:
>
>> Jiawei:
>>> This patch modifies the FFmpeg build system to remove the explicit
>>> disabling
>>> of GCC's auto-vectorization feature.
>>>
>>> Modern GCC versions (>= 10.0) have demonstrated stable auto-
>>> vectorization
>>> capabilities through extensive optimizations in loop analysis and SIMD
>>> code generation. The explicit -fno-tree-vectorize flag originally added
>>> in commit 973859f (2009) to workaround early GCC vectorization
>>> instability
>>> is no longer necessary.
>>>
>>> Key improvements justifying this change:
>>> 1. Enhanced heuristics for loop vectorization cost models
>>> 2. Mature handling of alignment and memory access patterns
>>> 3. Robust fallback mechanisms for unsupported architectures
>>>
>>> This change allows FFmpeg to benefit from automated SIMD optimizations
>>> when built with -O3 optimization level, particularly improving
>>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
>>>
>>> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/
>>> commit/973859f5230e77beea7bb59dc081870689d6d191
>>>
>>> ---
>>> configure | 1 -
>>> 1 file changed, 1 deletion(-)
>>>
>>> diff --git a/configure b/configure
>>> index 3730b0524c..b9e95ce4ec 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -7656,7 +7656,6 @@ if enabled icc; then
>>> disable aligned_stack
>>> fi
>>> elif enabled gcc; then
>>> - check_optflags -fno-tree-vectorize
>>> check_cflags -Werror=format-security
>>> check_cflags -Werror=implicit-function-declaration
>>> check_cflags -Werror=missing-prototypes
>>
>> FYI: The last discussion about auto-vectorization is here:
>> https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299405.html
>> It contains a report about a failing build with vectorization enabled:
>> https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299421.html
>> I don't know whether this is still reproducible with the latest GCC.
>
> The issue which was reported last time, when compiling for i686 mingw32
> with --cpu=haswell, seems to have gone away in
> 182663a58a7a099e02e76da3b0f96d63e5c26a6d, where we made the whole
> problematic x86 inline cabac assembly noinline on i386. (That whole
> inline assembly block has been problematic in a large number of cases
> anyway.)
>
So there are currently no known miscompilations due to vectorization
with GCC?
- Andreas
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 10:33 ` Andreas Rheinhardt
@ 2025-05-21 12:09 ` Martin Storsjö
2025-05-21 12:14 ` Andreas Rheinhardt
0 siblings, 1 reply; 28+ messages in thread
From: Martin Storsjö @ 2025-05-21 12:09 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Wed, 21 May 2025, Andreas Rheinhardt wrote:
> Jiawei:
>> This patch modifies the FFmpeg build system to remove the explicit disabling
>> of GCC's auto-vectorization feature.
>>
>> Modern GCC versions (>= 10.0) have demonstrated stable auto-vectorization
>> capabilities through extensive optimizations in loop analysis and SIMD
>> code generation. The explicit -fno-tree-vectorize flag originally added
>> in commit 973859f (2009) to workaround early GCC vectorization instability
>> is no longer necessary.
>>
>> Key improvements justifying this change:
>> 1. Enhanced heuristics for loop vectorization cost models
>> 2. Mature handling of alignment and memory access patterns
>> 3. Robust fallback mechanisms for unsupported architectures
>>
>> This change allows FFmpeg to benefit from automated SIMD optimizations
>> when built with -O3 optimization level, particularly improving
>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
>>
>> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191
>>
>> ---
>> configure | 1 -
>> 1 file changed, 1 deletion(-)
>>
>> diff --git a/configure b/configure
>> index 3730b0524c..b9e95ce4ec 100755
>> --- a/configure
>> +++ b/configure
>> @@ -7656,7 +7656,6 @@ if enabled icc; then
>> disable aligned_stack
>> fi
>> elif enabled gcc; then
>> - check_optflags -fno-tree-vectorize
>> check_cflags -Werror=format-security
>> check_cflags -Werror=implicit-function-declaration
>> check_cflags -Werror=missing-prototypes
>
> FYI: The last discussion about auto-vectorization is here:
> https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299405.html
> It contains a report about a failing build with vectorization enabled:
> https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299421.html
> I don't know whether this is still reproducible with the latest GCC.
The issue which was reported last time, when compiling for i686 mingw32
with --cpu=haswell, seems to have gone away in
182663a58a7a099e02e76da3b0f96d63e5c26a6d, where we made the whole
problematic x86 inline cabac assembly noinline on i386. (That whole inline
assembly block has been problematic in a large number of cases anyway.)
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 10:32 ` Jiawei
@ 2025-05-21 11:09 ` Michael Niedermayer
0 siblings, 0 replies; 28+ messages in thread
From: Michael Niedermayer @ 2025-05-21 11:09 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 3122 bytes --]
On Wed, May 21, 2025 at 06:32:49PM +0800, Jiawei wrote:
>
> 在 2025/5/21 15:46, Michael Niedermayer 写道:
> > On Wed, May 21, 2025 at 02:17:50PM +0800, Jiawei wrote:
> > > This patch modifies the FFmpeg build system to remove the explicit disabling
> > > of GCC's auto-vectorization feature.
> > >
> > > Modern GCC versions (>= 10.0) have demonstrated stable auto-vectorization
> > > capabilities through extensive optimizations in loop analysis and SIMD
> > > code generation. The explicit -fno-tree-vectorize flag originally added
> > > in commit 973859f (2009) to workaround early GCC vectorization instability
> > > is no longer necessary.
> > >
> > > Key improvements justifying this change:
> > > 1. Enhanced heuristics for loop vectorization cost models
> > > 2. Mature handling of alignment and memory access patterns
> > > 3. Robust fallback mechanisms for unsupported architectures
> > >
> > > This change allows FFmpeg to benefit from automated SIMD optimizations
> > > when built with -O3 optimization level, particularly improving
> > > performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
> > >
> > > [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191
> > >
> > > ---
> > > configure | 1 -
> > > 1 file changed, 1 deletion(-)
> > >
> > > diff --git a/configure b/configure
> > > index 3730b0524c..b9e95ce4ec 100755
> > > --- a/configure
> > > +++ b/configure
> > > @@ -7656,7 +7656,6 @@ if enabled icc; then
> > > disable aligned_stack
> > > fi
> > > elif enabled gcc; then
> > > - check_optflags -fno-tree-vectorize
> > > check_cflags -Werror=format-security
> > > check_cflags -Werror=implicit-function-declaration
> > > check_cflags -Werror=missing-prototypes
> > Your text speaks about this change being ok in a gcc version dependant
> > way
> >
> > Your patch has no gcc version dependancy
> >
> > If you claim that all issues where solved, please show the issues happening
> > in version v and no longer happening in w>v . Then it make sense to
> > change the flags for version w
> >
> > Thx
> > [...]
>
>
> Sorry I forgot about that, thanks for reminding me. Here still exist many
> old version gcc user,
>
> And I am not sure how will this impact them.
>
> Maybe a later version gcc checking is good, like gcc 13-15, what you think
> about it?
i cannot speak about gcc versions, i know of them little more than i know
numbers from a dice throw.
But if we can turn on optimizations and make the code faster without breaking
anything, iam in favor of that. Its just that i cannot awnser the question
what checks, what exact version or other spatial limitation may be needed.
You would have to verify that the issues people encountered previously
no longer affect version XY and then put a XY check in the patch.
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The greatest way to live with honor in this world is to be what we pretend
to be. -- Socrates
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 6:17 Jiawei
` (2 preceding siblings ...)
2025-05-21 9:04 ` Zhao Zhili
@ 2025-05-21 10:33 ` Andreas Rheinhardt
2025-05-21 12:09 ` Martin Storsjö
2025-05-24 12:00 ` Rémi Denis-Courmont
4 siblings, 1 reply; 28+ messages in thread
From: Andreas Rheinhardt @ 2025-05-21 10:33 UTC (permalink / raw)
To: ffmpeg-devel
Jiawei:
> This patch modifies the FFmpeg build system to remove the explicit disabling
> of GCC's auto-vectorization feature.
>
> Modern GCC versions (>= 10.0) have demonstrated stable auto-vectorization
> capabilities through extensive optimizations in loop analysis and SIMD
> code generation. The explicit -fno-tree-vectorize flag originally added
> in commit 973859f (2009) to workaround early GCC vectorization instability
> is no longer necessary.
>
> Key improvements justifying this change:
> 1. Enhanced heuristics for loop vectorization cost models
> 2. Mature handling of alignment and memory access patterns
> 3. Robust fallback mechanisms for unsupported architectures
>
> This change allows FFmpeg to benefit from automated SIMD optimizations
> when built with -O3 optimization level, particularly improving
> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
>
> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191
>
> ---
> configure | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/configure b/configure
> index 3730b0524c..b9e95ce4ec 100755
> --- a/configure
> +++ b/configure
> @@ -7656,7 +7656,6 @@ if enabled icc; then
> disable aligned_stack
> fi
> elif enabled gcc; then
> - check_optflags -fno-tree-vectorize
> check_cflags -Werror=format-security
> check_cflags -Werror=implicit-function-declaration
> check_cflags -Werror=missing-prototypes
FYI: The last discussion about auto-vectorization is here:
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299405.html
It contains a report about a failing build with vectorization enabled:
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299421.html
I don't know whether this is still reproducible with the latest GCC.
- Andreas
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 7:46 ` Michael Niedermayer
@ 2025-05-21 10:32 ` Jiawei
2025-05-21 11:09 ` Michael Niedermayer
0 siblings, 1 reply; 28+ messages in thread
From: Jiawei @ 2025-05-21 10:32 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: michael
在 2025/5/21 15:46, Michael Niedermayer 写道:
> On Wed, May 21, 2025 at 02:17:50PM +0800, Jiawei wrote:
>> This patch modifies the FFmpeg build system to remove the explicit disabling
>> of GCC's auto-vectorization feature.
>>
>> Modern GCC versions (>= 10.0) have demonstrated stable auto-vectorization
>> capabilities through extensive optimizations in loop analysis and SIMD
>> code generation. The explicit -fno-tree-vectorize flag originally added
>> in commit 973859f (2009) to workaround early GCC vectorization instability
>> is no longer necessary.
>>
>> Key improvements justifying this change:
>> 1. Enhanced heuristics for loop vectorization cost models
>> 2. Mature handling of alignment and memory access patterns
>> 3. Robust fallback mechanisms for unsupported architectures
>>
>> This change allows FFmpeg to benefit from automated SIMD optimizations
>> when built with -O3 optimization level, particularly improving
>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
>>
>> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191
>>
>> ---
>> configure | 1 -
>> 1 file changed, 1 deletion(-)
>>
>> diff --git a/configure b/configure
>> index 3730b0524c..b9e95ce4ec 100755
>> --- a/configure
>> +++ b/configure
>> @@ -7656,7 +7656,6 @@ if enabled icc; then
>> disable aligned_stack
>> fi
>> elif enabled gcc; then
>> - check_optflags -fno-tree-vectorize
>> check_cflags -Werror=format-security
>> check_cflags -Werror=implicit-function-declaration
>> check_cflags -Werror=missing-prototypes
> Your text speaks about this change being ok in a gcc version dependant
> way
>
> Your patch has no gcc version dependancy
>
> If you claim that all issues where solved, please show the issues happening
> in version v and no longer happening in w>v . Then it make sense to
> change the flags for version w
>
> Thx
> [...]
Sorry I forgot about that, thanks for reminding me. Here still exist
many old version gcc user,
And I am not sure how will this impact them.
Maybe a later version gcc checking is good, like gcc 13-15, what you
think about it?
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 9:04 ` Zhao Zhili
@ 2025-05-21 10:26 ` Jiawei
0 siblings, 0 replies; 28+ messages in thread
From: Jiawei @ 2025-05-21 10:26 UTC (permalink / raw)
To: Zhao Zhili, FFmpeg development discussions and patches
在 2025/5/21 17:04, Zhao Zhili 写道:
>
>> On May 21, 2025, at 14:17, Jiawei <jiawei@iscas.ac.cn> wrote:
>>
>> This patch modifies the FFmpeg build system to remove the explicit disabling
>> of GCC's auto-vectorization feature.
>>
>> Modern GCC versions (>= 10.0) have demonstrated stable auto-vectorization
>> capabilities through extensive optimizations in loop analysis and SIMD
>> code generation. The explicit -fno-tree-vectorize flag originally added
>> in commit 973859f (2009) to workaround early GCC vectorization instability
>> is no longer necessary.
> This isn’t the whole story.
>
> The flag was added by 973859f in 2009.
> Then it was reverted by cb8646af in 2016.
> Shortly after that, the revert was reverted again by fd6dbc5 in 2016.
>
>> Key improvements justifying this change:
>> 1. Enhanced heuristics for loop vectorization cost models
>> 2. Mature handling of alignment and memory access patterns
>> 3. Robust fallback mechanisms for unsupported architectures
>>
>> This change allows FFmpeg to benefit from automated SIMD optimizations
>> when built with -O3 optimization level, particularly improving
>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
> Those flags can only be enabled in tightly controlled environments (e.g., built and run on the same
> machine), while FFmpeg has hand written assembly, runtime cpu probe and dynamic binding/dispatch.
>
> Those auto-vectorization and ARCH flags can be enabled manually, but be careful.
Thank you point this out, since I am using x64 AVX2 and RISC-V RVV, when
I enable the vector feature
by -O3 -mavx(-march=rv64gcv for RV). This configure will adds the
`-fno-tree-vectorize` option automatically.
It will still add the vector load/store instructions in the result, but
no vector operation here.
GCC import the explicit option to controll if there need generate the
vectorized instructions. It's okay to use -O3
but not do auto-vectorization.
>
>> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191
>>
>> ---
>> configure | 1 -
>> 1 file changed, 1 deletion(-)
>>
>> diff --git a/configure b/configure
>> index 3730b0524c..b9e95ce4ec 100755
>> --- a/configure
>> +++ b/configure
>> @@ -7656,7 +7656,6 @@ if enabled icc; then
>> disable aligned_stack
>> fi
>> elif enabled gcc; then
>> - check_optflags -fno-tree-vectorize
>> check_cflags -Werror=format-security
>> check_cflags -Werror=implicit-function-declaration
>> check_cflags -Werror=missing-prototypes
>> --
>> 2.43.0
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 6:52 ` Nicolas George
@ 2025-05-21 10:17 ` Jiawei
2025-05-21 18:21 ` Frank Plowman
0 siblings, 1 reply; 28+ messages in thread
From: Jiawei @ 2025-05-21 10:17 UTC (permalink / raw)
To: ffmpeg-devel
在 2025/5/21 14:52, Nicolas George 写道:
> Jiawei (HE12025-05-21):
>> particularly improving
>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
> Benchmark needed.
>
> Regards,
Hi Nicolas,
Since I am a gcc developer, I'm not so familiar with the FFmpeg test
flow, here is my test process,
if there exists anything uncorrect, please point me out:
1. Download the video bbb_sunflower_2160p_30fps_normal.mp4.zip
<https://download.blender.org/demo/movies/BBB/bbb_sunflower_2160p_30fps_normal.mp4.zip>
from https://download.blender.org/demo/movies/BBB/,
```
ffmpeg -i bbb_sunflower_2160p_30fps_normal.mp4 -t 60 -vf
"scale=1920:1080" -c:v libx265 -c:a libmp3lame 1080p_hevc_mp3.mp4
```
get the 1080p video as Benchmark test video
2. Build two version of FFmpeg, one with the modify, another without
the patch modif, using the gcc 13.3 release version,
verified with Intel(R) Core(TM) Ultra 9 285HX
Using patch:
```
./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
developers
built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
configuration: --prefix=/home/pz9115/ffpo --disable-ffplay --arch=x64
--extra-cflags=-O3 --enable-static --target-os=linux
libavutil 60. 2.100 / 60. 2.100
libavcodec 62. 3.101 / 62. 3.101
libavformat 62. 0.102 / 62. 0.102
libavdevice 62. 0.100 / 62. 0.100
libavfilter 11. 0.100 / 11. 0.100
libswscale 9. 0.100 / 9. 0.100
libswresample 6. 0.100 / 6. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/home/pz9115/mp/1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf62.0.102
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
encoder : Lavc62.3.101 wrapped_avframe
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
encoder : Lavc62.3.101 pcm_s16le
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
[out#0/null @ 0x565233669eb0] video:731KiB audio:11250KiB subtitle:0KiB
other streams:0KiB global headers:0KiB muxing overhead: unknown
frame= 1800 fps=635 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
speed=21.2x elapsed=0:00:02.83
bench: utime=11.324s stime=0.290s rtime=2.834s
bench: maxrss=186556KiB
```
Without patch(here I add the fno-tree-vectorize directly):
./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
developers
built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
configuration: --prefix=/home/pz9115/ffpo --disable-ffplay --arch=x64
--extra-cflags='-O3 -fno-tree-vectorize' --enable-static --target-os=linux
libavutil 60. 2.100 / 60. 2.100
libavcodec 62. 3.101 / 62. 3.101
libavformat 62. 0.102 / 62. 0.102
libavdevice 62. 0.100 / 62. 0.100
libavfilter 11. 0.100 / 11. 0.100
libswscale 9. 0.100 / 9. 0.100
libswresample 6. 0.100 / 6. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/home/pz9115/mp/1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf62.0.102
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
encoder : Lavc62.3.101 wrapped_avframe
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
encoder : Lavc62.3.101 pcm_s16le
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
[out#0/null @ 0x55eb196b7eb0] video:731KiB audio:11250KiB subtitle:0KiB
other streams:0KiB global headers:0KiB muxing overhead: unknown
frame= 1800 fps=509 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
speed= 17x elapsed=0:00:03.53
bench: utime=21.544s stime=0.349s rtime=3.536s
bench: maxrss=181580KiB
And I also tested on a RISC-V develop board MUSE Pi Pro, Here following
is the configure and result:
Using patch:
root@spacemit-k1-x-MUSE-Pi-Pro-board:~# ./ffpv/bin/ffmpeg -benchmark -i
1080p_hevc_mp3.mp4 -f null -
ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 16.0.0 (g3fc902e738b) 20250519 (experimental)
configuration: --prefix=/home/pz9115/ffpv --disable-ffplay
--arch=riscv --extra-cflags='-march=rv64gcv_zba_zbb_zbs -O3 -ffast-math'
--cross-prefix=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-
--cc=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-gcc
--cxx=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-g++ --enable-static
--enable-cross-compile --target-os=linux --disable-rvv
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Odd rotation angle.
If you want to help, upload a sample of this file to
https://streams.videolan.org/upload/ and contact the ffmpeg-devel
mailing list. (ffmpeg-devel@ffmpeg.org)Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf60.16.100
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1080x1920 [SAR 1:1 DAR 9:16], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 wrapped_avframe
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 pcm_s16le
[out#0/null @ 0x28a82e0] video:844kB audio:11250kB subtitle:0kB other
streams:0kB global headers:0kB muxing overhead: unknown
frame= 1800 fps= 42 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
speed=1.41x
bench: utime=207.150s stime=5.319s rtime=42.608s
bench: maxrss=162160kB
Without patch(same added the fno-tree-vectorize directly):
./ffp/bin/ffmpeg -benchmark -i 1080p_hevc_mp3.mp4 -f null -
ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 16.0.0 (g38163c874a3-dirty) 20250515 (experimental)
configuration: --prefix=/home/pz9115/ffp --disable-ffplay
--arch=riscv --sysroot=/home/pz9115/rv/sysroot
--extra-cflags='-march=rv64gcv_zba_zbb_zbc_zbs_zca_zcd -mabi=lp64d -O3
-fno-tree-vectorize -static' --extra-ldflags=-static
--cross-prefix=/home/pz9115/rv/bin/riscv64-unknown-linux-gnu-
--enable-static --enable-cross-compile --target-os=linux --disable-rvv
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf60.16.100
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 wrapped_avframe
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 pcm_s16le
[out#0/null @ 0x2729630] video:844kB audio:11250kB subtitle:0kB other
streams:0kB global headers:0kB muxing overhead: unknown
frame= 1800 fps= 30 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
speed= 1x
bench: utime=321.145s stime=2.475s rtime=59.960s
bench: maxrss=131532kB
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
@ 2025-05-21 10:08 Jiawei
0 siblings, 0 replies; 28+ messages in thread
From: Jiawei @ 2025-05-21 10:08 UTC (permalink / raw)
To: george; +Cc: ffmpeg-devel
> > -----原始邮件-----
> > 发件人: "Nicolas George" <george@nsup.org>
> > 发送时间: 2025-05-21 14:52:12 (星期三)
> > 收件人: "FFmpeg development discussions and patches"
> <ffmpeg-devel@ffmpeg.org>
> > 抄送:
> > 主题: Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
> >
> > Jiawei (HE12025-05-21):
> > > particularly improving
> > > performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV)
> architectures.
> >
> > Benchmark needed.
> >
> > Regards,
> >
> > --
> > Nicolas George
>
Hi Nicolas,
Since I am a gcc developer, I'm not so familiar with the FFmpeg test
flow, here is my test process,
if there exists anything uncorrect, please point me out:
1. Download the video bbb_sunflower_2160p_30fps_normal.mp4.zip
<https://download.blender.org/demo/movies/BBB/bbb_sunflower_2160p_30fps_normal.mp4.zip>
from https://download.blender.org/demo/movies/BBB/,
```
ffmpeg -i bbb_sunflower_2160p_30fps_normal.mp4 -t 60 -vf
"scale=1920:1080" -c:v libx265 -c:a libmp3lame 1080p_hevc_mp3.mp4
```
get the 1080p video as Benchmark test video
2. Build two version of FFmpeg, one with the modify, another without
the patch modif, using the gcc 13.3 release version,
verified with Intel(R) Core(TM) Ultra 9 285HX
Using patch:
```
./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
developers
built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
configuration: --prefix=/home/pz9115/ffpo --disable-ffplay --arch=x64
--extra-cflags=-O3 --enable-static --target-os=linux
libavutil 60. 2.100 / 60. 2.100
libavcodec 62. 3.101 / 62. 3.101
libavformat 62. 0.102 / 62. 0.102
libavdevice 62. 0.100 / 62. 0.100
libavfilter 11. 0.100 / 11. 0.100
libswscale 9. 0.100 / 9. 0.100
libswresample 6. 0.100 / 6. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/home/pz9115/mp/1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf62.0.102
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
encoder : Lavc62.3.101 wrapped_avframe
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
encoder : Lavc62.3.101 pcm_s16le
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
[out#0/null @ 0x565233669eb0] video:731KiB audio:11250KiB subtitle:0KiB
other streams:0KiB global headers:0KiB muxing overhead: unknown
frame= 1800 fps=635 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
speed=21.2x elapsed=0:00:02.83
bench: utime=11.324s stime=0.290s rtime=2.834s
bench: maxrss=186556KiB
```
Without patch(here I add the fno-tree-vectorize directly):
./ffmpeg -benchmark -i ~/mp/1080p_hevc_mp3.mp4 -f null -
ffmpeg version N-119636-g96518c8d8d Copyright (c) 2000-2025 the FFmpeg
developers
built with gcc 13 (Ubuntu 13.3.0-6ubuntu2~24.04)
configuration: --prefix=/home/pz9115/ffpo --disable-ffplay --arch=x64
--extra-cflags='-O3 -fno-tree-vectorize' --enable-static --target-os=linux
libavutil 60. 2.100 / 60. 2.100
libavcodec 62. 3.101 / 62. 3.101
libavformat 62. 0.102 / 62. 0.102
libavdevice 62. 0.100 / 62. 0.100
libavfilter 11. 0.100 / 11. 0.100
libswscale 9. 0.100 / 9. 0.100
libswresample 6. 0.100 / 6. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'/home/pz9115/mp/1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp3float) (mp4a / 0x6134706D),
48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf62.0.102
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
encoder : Lavc62.3.101 wrapped_avframe
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
encoder : Lavc62.3.101 pcm_s16le
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
[out#0/null @ 0x55eb196b7eb0] video:731KiB audio:11250KiB subtitle:0KiB
other streams:0KiB global headers:0KiB muxing overhead: unknown
frame= 1800 fps=509 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A
speed= 17x elapsed=0:00:03.53
bench: utime=21.544s stime=0.349s rtime=3.536s
bench: maxrss=181580KiB
And I also tested on a RISC-V develop board MUSE Pi Pro, Here following
is the configure and result:
Using patch:
root@spacemit-k1-x-MUSE-Pi-Pro-board:~# ./ffpv/bin/ffmpeg -benchmark -i
1080p_hevc_mp3.mp4 -f null -
ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 16.0.0 (g3fc902e738b) 20250519 (experimental)
configuration: --prefix=/home/pz9115/ffpv --disable-ffplay
--arch=riscv --extra-cflags='-march=rv64gcv_zba_zbb_zbs -O3 -ffast-math'
--cross-prefix=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-
--cc=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-gcc
--cxx=/home/pz9115/rvv/bin/riscv64-unknown-linux-gnu-g++ --enable-static
--enable-cross-compile --target-os=linux --disable-rvv
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Odd rotation angle.
If you want to help, upload a sample of this file to
https://streams.videolan.org/upload/ and contact the ffmpeg-devel
mailing list. (ffmpeg-devel@ffmpeg.org)Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf60.16.100
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1080x1920 [SAR 1:1 DAR 9:16], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 wrapped_avframe
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 pcm_s16le
[out#0/null @ 0x28a82e0] video:844kB audio:11250kB subtitle:0kB other
streams:0kB global headers:0kB muxing overhead: unknown
frame= 1800 fps= 42 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
speed=1.41x
bench: utime=207.150s stime=5.319s rtime=42.608s
bench: maxrss=162160kB
Without patch(same added the fno-tree-vectorize directly):
./ffp/bin/ffmpeg -benchmark -i 1080p_hevc_mp3.mp4 -f null -
ffmpeg version n6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 16.0.0 (g38163c874a3-dirty) 20250515 (experimental)
configuration: --prefix=/home/pz9115/ffp --disable-ffplay
--arch=riscv --sysroot=/home/pz9115/rv/sysroot
--extra-cflags='-march=rv64gcv_zba_zbb_zbc_zbs_zca_zcd -mabi=lp64d -O3
-fno-tree-vectorize -static' --extra-ldflags=-static
--cross-prefix=/home/pz9115/rv/bin/riscv64-unknown-linux-gnu-
--enable-static --enable-cross-compile --target-os=linux --disable-rvv
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1080p_hevc_mp3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
encoder : Lavf60.16.100
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
Duration: 00:01:00.00, start: 0.000000, bitrate: 1564 kb/s
Stream #0:0[0x1](und): Video: hevc (Main) (hev1 / 0x31766568),
yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1429 kb/s, 30
fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 libx265
Stream #0:1[0x2](und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
composer : Sacha Goedegebure
genre : Animation
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
encoder : Lavf60.16.100
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, progressive),
1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
handler_name : GPAC ISO Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 wrapped_avframe
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
(default)
Metadata:
handler_name : GPAC ISO Audio Handler
vendor_id : [0][0][0][0]
encoder : Lavc60.31.102 pcm_s16le
[out#0/null @ 0x2729630] video:844kB audio:11250kB subtitle:0kB other
streams:0kB global headers:0kB muxing overhead: unknown
frame= 1800 fps= 30 q=-0.0 Lsize=N/A time=00:00:59.97 bitrate=N/A
speed= 1x
bench: utime=321.145s stime=2.475s rtime=59.960s
bench: maxrss=131532kB
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 6:17 Jiawei
2025-05-21 6:52 ` Nicolas George
2025-05-21 7:46 ` Michael Niedermayer
@ 2025-05-21 9:04 ` Zhao Zhili
2025-05-21 10:26 ` Jiawei
2025-05-21 10:33 ` Andreas Rheinhardt
2025-05-24 12:00 ` Rémi Denis-Courmont
4 siblings, 1 reply; 28+ messages in thread
From: Zhao Zhili @ 2025-05-21 9:04 UTC (permalink / raw)
To: FFmpeg development discussions and patches; +Cc: Jiawei
> On May 21, 2025, at 14:17, Jiawei <jiawei@iscas.ac.cn> wrote:
>
> This patch modifies the FFmpeg build system to remove the explicit disabling
> of GCC's auto-vectorization feature.
>
> Modern GCC versions (>= 10.0) have demonstrated stable auto-vectorization
> capabilities through extensive optimizations in loop analysis and SIMD
> code generation. The explicit -fno-tree-vectorize flag originally added
> in commit 973859f (2009) to workaround early GCC vectorization instability
> is no longer necessary.
This isn’t the whole story.
The flag was added by 973859f in 2009.
Then it was reverted by cb8646af in 2016.
Shortly after that, the revert was reverted again by fd6dbc5 in 2016.
>
> Key improvements justifying this change:
> 1. Enhanced heuristics for loop vectorization cost models
> 2. Mature handling of alignment and memory access patterns
> 3. Robust fallback mechanisms for unsupported architectures
>
> This change allows FFmpeg to benefit from automated SIMD optimizations
> when built with -O3 optimization level, particularly improving
> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
Those flags can only be enabled in tightly controlled environments (e.g., built and run on the same
machine), while FFmpeg has hand written assembly, runtime cpu probe and dynamic binding/dispatch.
Those auto-vectorization and ARCH flags can be enabled manually, but be careful.
>
> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191
>
> ---
> configure | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/configure b/configure
> index 3730b0524c..b9e95ce4ec 100755
> --- a/configure
> +++ b/configure
> @@ -7656,7 +7656,6 @@ if enabled icc; then
> disable aligned_stack
> fi
> elif enabled gcc; then
> - check_optflags -fno-tree-vectorize
> check_cflags -Werror=format-security
> check_cflags -Werror=implicit-function-declaration
> check_cflags -Werror=missing-prototypes
> --
> 2.43.0
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 6:17 Jiawei
2025-05-21 6:52 ` Nicolas George
@ 2025-05-21 7:46 ` Michael Niedermayer
2025-05-21 10:32 ` Jiawei
2025-05-21 9:04 ` Zhao Zhili
` (2 subsequent siblings)
4 siblings, 1 reply; 28+ messages in thread
From: Michael Niedermayer @ 2025-05-21 7:46 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 2187 bytes --]
On Wed, May 21, 2025 at 02:17:50PM +0800, Jiawei wrote:
> This patch modifies the FFmpeg build system to remove the explicit disabling
> of GCC's auto-vectorization feature.
>
> Modern GCC versions (>= 10.0) have demonstrated stable auto-vectorization
> capabilities through extensive optimizations in loop analysis and SIMD
> code generation. The explicit -fno-tree-vectorize flag originally added
> in commit 973859f (2009) to workaround early GCC vectorization instability
> is no longer necessary.
>
> Key improvements justifying this change:
> 1. Enhanced heuristics for loop vectorization cost models
> 2. Mature handling of alignment and memory access patterns
> 3. Robust fallback mechanisms for unsupported architectures
>
> This change allows FFmpeg to benefit from automated SIMD optimizations
> when built with -O3 optimization level, particularly improving
> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
>
> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191
>
> ---
> configure | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/configure b/configure
> index 3730b0524c..b9e95ce4ec 100755
> --- a/configure
> +++ b/configure
> @@ -7656,7 +7656,6 @@ if enabled icc; then
> disable aligned_stack
> fi
> elif enabled gcc; then
> - check_optflags -fno-tree-vectorize
> check_cflags -Werror=format-security
> check_cflags -Werror=implicit-function-declaration
> check_cflags -Werror=missing-prototypes
Your text speaks about this change being ok in a gcc version dependant
way
Your patch has no gcc version dependancy
If you claim that all issues where solved, please show the issues happening
in version v and no longer happening in w>v . Then it make sense to
change the flags for version w
Thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
2025-05-21 6:17 Jiawei
@ 2025-05-21 6:52 ` Nicolas George
2025-05-21 10:17 ` Jiawei
2025-05-21 7:46 ` Michael Niedermayer
` (3 subsequent siblings)
4 siblings, 1 reply; 28+ messages in thread
From: Nicolas George @ 2025-05-21 6:52 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Jiawei (HE12025-05-21):
> particularly improving
> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
Benchmark needed.
Regards,
--
Nicolas George
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
* [FFmpeg-devel] gcc: Remove auto-vectorization limitation.
@ 2025-05-21 6:17 Jiawei
2025-05-21 6:52 ` Nicolas George
` (4 more replies)
0 siblings, 5 replies; 28+ messages in thread
From: Jiawei @ 2025-05-21 6:17 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Jiawei
This patch modifies the FFmpeg build system to remove the explicit disabling
of GCC's auto-vectorization feature.
Modern GCC versions (>= 10.0) have demonstrated stable auto-vectorization
capabilities through extensive optimizations in loop analysis and SIMD
code generation. The explicit -fno-tree-vectorize flag originally added
in commit 973859f (2009) to workaround early GCC vectorization instability
is no longer necessary.
Key improvements justifying this change:
1. Enhanced heuristics for loop vectorization cost models
2. Mature handling of alignment and memory access patterns
3. Robust fallback mechanisms for unsupported architectures
This change allows FFmpeg to benefit from automated SIMD optimizations
when built with -O3 optimization level, particularly improving
performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
[1] https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/973859f5230e77beea7bb59dc081870689d6d191
---
configure | 1 -
1 file changed, 1 deletion(-)
diff --git a/configure b/configure
index 3730b0524c..b9e95ce4ec 100755
--- a/configure
+++ b/configure
@@ -7656,7 +7656,6 @@ if enabled icc; then
disable aligned_stack
fi
elif enabled gcc; then
- check_optflags -fno-tree-vectorize
check_cflags -Werror=format-security
check_cflags -Werror=implicit-function-declaration
check_cflags -Werror=missing-prototypes
--
2.43.0
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-06-04 11:13 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-21 10:14 [FFmpeg-devel] gcc: Remove auto-vectorization limitation Jiawei
-- strict thread matches above, loose matches on Subject: below --
2025-05-21 10:08 Jiawei
2025-05-21 6:17 Jiawei
2025-05-21 6:52 ` Nicolas George
2025-05-21 10:17 ` Jiawei
2025-05-21 18:21 ` Frank Plowman
2025-05-22 6:32 ` Jiawei
2025-05-24 1:46 ` Kieran Kunhya via ffmpeg-devel
2025-05-24 4:10 ` Jiawei
2025-05-24 16:10 ` Rémi Denis-Courmont
2025-05-25 21:37 ` Michael Niedermayer
2025-05-26 8:43 ` Rémi Denis-Courmont
2025-05-30 0:46 ` Michael Niedermayer
2025-05-30 6:58 ` Rémi Denis-Courmont
2025-05-31 13:39 ` Michael Niedermayer
2025-06-03 16:14 ` Niklas Haas
2025-06-04 11:13 ` Rémi Denis-Courmont
2025-05-21 7:46 ` Michael Niedermayer
2025-05-21 10:32 ` Jiawei
2025-05-21 11:09 ` Michael Niedermayer
2025-05-21 9:04 ` Zhao Zhili
2025-05-21 10:26 ` Jiawei
2025-05-21 10:33 ` Andreas Rheinhardt
2025-05-21 12:09 ` Martin Storsjö
2025-05-21 12:14 ` Andreas Rheinhardt
2025-05-21 12:22 ` Martin Storsjö
2025-05-21 18:12 ` softworkz .
2025-05-24 12:00 ` Rémi Denis-Courmont
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git