> On Feb 8, 2025, at 00:05, Ronald S. Bultje wrote: > > Hi, > > On Fri, Feb 7, 2025 at 8:44 AM Zhao Zhili < > quinkblack-at-foxmail.com@ffmpeg.org> wrote: > >>> On Feb 7, 2025, at 21:26, Ronald S. Bultje wrote: >>> On Fri, Feb 7, 2025 at 6:22 AM Andreas Rheinhardt < >>> andreas.rheinhardt@outlook.com> wrote: >>>> Ronald S. Bultje: >>>>> Fixes #11456. >>>>> --- >>>>> libavcodec/threadprogress.c | 3 +-- >>>>> 1 file changed, 1 insertion(+), 2 deletions(-) >>>>> >>>>> diff --git a/libavcodec/threadprogress.c b/libavcodec/threadprogress.c >>>>> index 62c4fd898b..aa72ff80e7 100644 >>>>> --- a/libavcodec/threadprogress.c >>>>> +++ b/libavcodec/threadprogress.c >>>>> @@ -55,9 +55,8 @@ void ff_thread_progress_report(ThreadProgress *pro, >>>> int n) >>>>> if (atomic_load_explicit(&pro->progress, memory_order_relaxed) >= >> n) >>>>> return; >>>>> >>>>> - atomic_store_explicit(&pro->progress, n, memory_order_release); >>>>> - >>>>> ff_mutex_lock(&pro->progress_mutex); >>>>> + atomic_store_explicit(&pro->progress, n, memory_order_release); >>>>> ff_cond_broadcast(&pro->progress_cond); >>>>> ff_mutex_unlock(&pro->progress_mutex); >>>>> } >>>> >>>> I don't really understand why this is supposed to fix a race; after all, >>>> the synchronisation of ff_thread_progress_(report|await) is not supposed >>>> to be provided by the mutex (which is avoided altogether in the fast >>>> path in ff_thread_report_await()), but by storing and loading the >>>> progress variable. >>>> That's also the reason why I moved this outside of the mutex (compared >>>> to ff_thread_report_progress(). (This way it is possible for a consumer >>>> thread to see the new progress value earlier and possibly avoid the >>>> mutex altogether.) >>> >>> >>> The consumer thread already checks the value without the lock. so the >>> significance of that last point seems minor to me. This would be >> different >>> if the wait() counterpart had no lockless path. Or am I missing >> something? >> >> What Andreas says is atomic_store before mutex_lock makes the first >> atomic_load in progress_wait has a higher chance to succeed. The earlier >> progress is set, the higher chance of progress_wait go into the fast path. > > > I understand that is true in theory - but I have doubts on whether this is > in any way significant in practice if wait() already has behaviour to > pre-empty locklessly > > I measured this in the most un-scientific way possible by decoding > gizmo.webm (from Firefox' bug report) 10x before and after my patch, taking > the average and standard deviation, and comparing these with each other. I > repeated this a couple of times. The values (before vs after avg +/- > stddev) are obviously never exactly the same, but they swarm around each > other like a random noise generator. Or to say it differently: in my highly > unscientific test, I see no performance difference. > > So ... Is this really worth it? I did another test by measure fast_path / (fast_path + slow_path) on macOS of hevc decoding with 10 threads. 1. Before the patch, it’s 99.741%. 2. With the patch, it’s 99.743%. 3. With while (atomic_load_explicit(&pro->progress, memory_order_acquire) < n), it’s 99.741%. So, it doesn’t matter for performance. Current patch is the most elegant solution in my opinion.