> On Feb 8, 2025, at 00:05, Ronald S. Bultje <rsbultje@gmail.com> wrote:
> 
> Hi,
> 
> On Fri, Feb 7, 2025 at 8:44 AM Zhao Zhili <
> quinkblack-at-foxmail.com@ffmpeg.org> wrote:
> 
>>> On Feb 7, 2025, at 21:26, Ronald S. Bultje <rsbultje@gmail.com> wrote:
>>> On Fri, Feb 7, 2025 at 6:22 AM Andreas Rheinhardt <
>>> andreas.rheinhardt@outlook.com> wrote:
>>>> Ronald S. Bultje:
>>>>> Fixes #11456.
>>>>> ---
>>>>> libavcodec/threadprogress.c | 3 +--
>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/libavcodec/threadprogress.c b/libavcodec/threadprogress.c
>>>>> index 62c4fd898b..aa72ff80e7 100644
>>>>> --- a/libavcodec/threadprogress.c
>>>>> +++ b/libavcodec/threadprogress.c
>>>>> @@ -55,9 +55,8 @@ void ff_thread_progress_report(ThreadProgress *pro,
>>>> int n)
>>>>>    if (atomic_load_explicit(&pro->progress, memory_order_relaxed) >=
>> n)
>>>>>        return;
>>>>> 
>>>>> -    atomic_store_explicit(&pro->progress, n, memory_order_release);
>>>>> -
>>>>>    ff_mutex_lock(&pro->progress_mutex);
>>>>> +    atomic_store_explicit(&pro->progress, n, memory_order_release);
>>>>>    ff_cond_broadcast(&pro->progress_cond);
>>>>>    ff_mutex_unlock(&pro->progress_mutex);
>>>>> }
>>>> 
>>>> I don't really understand why this is supposed to fix a race; after all,
>>>> the synchronisation of ff_thread_progress_(report|await) is not supposed
>>>> to be provided by the mutex (which is avoided altogether in the fast
>>>> path in ff_thread_report_await()), but by storing and loading the
>>>> progress variable.
>>>> That's also the reason why I moved this outside of the mutex (compared
>>>> to ff_thread_report_progress(). (This way it is possible for a consumer
>>>> thread to see the new progress value earlier and possibly avoid the
>>>> mutex altogether.)
>>> 
>>> 
>>> The consumer thread already checks the value without the lock. so the
>>> significance of that last point seems minor to me. This would be
>> different
>>> if the wait() counterpart had no lockless path. Or am I missing
>> something?
>> 
>> What Andreas says is atomic_store before mutex_lock makes the first
>> atomic_load in progress_wait has a higher chance to succeed. The earlier
>> progress is set, the higher chance of progress_wait go into the fast path.
> 
> 
> I understand that is true in theory - but I have doubts on whether this is
> in any way significant in practice if wait() already has behaviour to
> pre-empty locklessly
> 
> I measured this in the most un-scientific way possible by decoding
> gizmo.webm (from Firefox' bug report) 10x before and after my patch, taking
> the average and standard deviation, and comparing these with each other. I
> repeated this a couple of times. The values (before vs after avg +/-
> stddev) are obviously never exactly the same, but they swarm around each
> other like a random noise generator. Or to say it differently: in my highly
> unscientific test, I see no performance difference.
> 
> So ... Is this really worth it?

I did another test by measure fast_path / (fast_path + slow_path) on macOS of hevc decoding with 10 threads.

1. Before the patch, it’s 99.741%.
2. With the patch, it’s 99.743%.
3. With while (atomic_load_explicit(&pro->progress, memory_order_acquire) < n), it’s 99.741%.

So, it doesn’t matter for performance. Current patch is the most elegant solution in my opinion.