Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH] avcodec/amfenc: Fix for windows imprecise sleep
@ 2023-10-16  9:13 Evgeny Pavlov
  2023-10-16 21:24 ` Mark Thompson
  2023-11-13 14:37 ` [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows Evgeny Pavlov
  0 siblings, 2 replies; 14+ messages in thread
From: Evgeny Pavlov @ 2023-10-16  9:13 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Evgeny Pavlov

This commit reduces the sleep time on Windows to improve AMF encoding
performance on low resolution input videos.
This fix is for Windows only, because sleep() function isn't
very accurate on Windows OS.

Fix for issue #10622

Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
---
 libavcodec/amfenc.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 061859f85c..0c95465d6e 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
         if (query_output_data_flag == 0) {
             if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) {
                 block_and_wait = 1;
+#ifdef _WIN32
+                av_usleep(0); //Sleep() is not precise on Windows OS.
+#else
                 av_usleep(1000);
+#endif
             }
         }
     } while (block_and_wait);
-- 
2.41.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avcodec/amfenc: Fix for windows imprecise sleep
  2023-10-16  9:13 [FFmpeg-devel] [PATCH] avcodec/amfenc: Fix for windows imprecise sleep Evgeny Pavlov
@ 2023-10-16 21:24 ` Mark Thompson
  2023-10-17  1:25   ` Zhao Zhili
  2023-11-13 14:37 ` [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows Evgeny Pavlov
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Thompson @ 2023-10-16 21:24 UTC (permalink / raw)
  To: ffmpeg-devel

On 16/10/2023 10:13, Evgeny Pavlov wrote:
> This commit reduces the sleep time on Windows to improve AMF encoding
> performance on low resolution input videos.
> This fix is for Windows only, because sleep() function isn't
> very accurate on Windows OS.
> 
> Fix for issue #10622
> 
> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
> ---
>   libavcodec/amfenc.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> index 061859f85c..0c95465d6e 100644
> --- a/libavcodec/amfenc.c
> +++ b/libavcodec/amfenc.c
> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
>           if (query_output_data_flag == 0) {
>               if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) {
>                   block_and_wait = 1;
> +#ifdef _WIN32
> +                av_usleep(0); //Sleep() is not precise on Windows OS.
> +#else
>                   av_usleep(1000);
> +#endif
>               }
>           }
>       } while (block_and_wait);

Wasting lots of power by spinning on a CPU core does not seem like a good answer to this problem.  (I mean, presumably that is why Windows isn't honouring your request for a short sleep, because it wants timers to have larger gaps to avoid wasting power.)

Why is there a sleep here at all, anyway?  An API for hardware encoding should be providing a way for the caller to wait for an outstanding operation to complete.

Thanks,

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avcodec/amfenc: Fix for windows imprecise sleep
  2023-10-16 21:24 ` Mark Thompson
@ 2023-10-17  1:25   ` Zhao Zhili
  2023-10-17 17:11     ` Evgeny Pavlov
  0 siblings, 1 reply; 14+ messages in thread
From: Zhao Zhili @ 2023-10-17  1:25 UTC (permalink / raw)
  To: FFmpeg development discussions and patches


> 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道:
> 
> On 16/10/2023 10:13, Evgeny Pavlov wrote:
>> This commit reduces the sleep time on Windows to improve AMF encoding
>> performance on low resolution input videos.
>> This fix is for Windows only, because sleep() function isn't
>> very accurate on Windows OS.
>> Fix for issue #10622
>> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
>> ---
>>  libavcodec/amfenc.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
>> index 061859f85c..0c95465d6e 100644
>> --- a/libavcodec/amfenc.c
>> +++ b/libavcodec/amfenc.c
>> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
>>          if (query_output_data_flag == 0) {
>>              if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) {
>>                  block_and_wait = 1;
>> +#ifdef _WIN32
>> +                av_usleep(0); //Sleep() is not precise on Windows OS.
>> +#else
>>                  av_usleep(1000);
>> +#endif
>>              }
>>          }
>>      } while (block_and_wait);
> 
> Wasting lots of power by spinning on a CPU core does not seem like a good answer to this problem.  (I mean, presumably that is why Windows isn't honouring your request for a short sleep, because it wants timers to have larger gaps to avoid wasting power.)

If av_usleep is implemented via Sleep like current case, sleep 0 means yield current thread, so it’s not busy wait in normal case (but it can be busy wait).

av_usleep(500) may looks better and do the same job by depending 500/1000 = 0.

I agree use sleep without real async is like a bug.

> 
> Why is there a sleep here at all, anyway?  An API for hardware encoding should be providing a way for the caller to wait for an outstanding operation to complete.
> 
> Thanks,
> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avcodec/amfenc: Fix for windows imprecise sleep
  2023-10-17  1:25   ` Zhao Zhili
@ 2023-10-17 17:11     ` Evgeny Pavlov
  2023-10-17 19:45       ` Kacper Michajlow
  2023-10-18 20:36       ` [FFmpeg-devel] [PATCH] amfenc: Use a blocking call instead of sleeping and polling Mark Thompson
  0 siblings, 2 replies; 14+ messages in thread
From: Evgeny Pavlov @ 2023-10-17 17:11 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

The reason for using av_usleep() here is that AMF API doesn’t provide an
API for explicit wait. There are two modes to get output from encoder:

1. Polling with some sleep to avoid CPU thrashing – currently used in FFmpeg

2. Set timeout parameter on AMF encoder and QueryOutput call will block
till output is available or the timeout happens.

#2 is the preferable way but it is designed more to be used with a separate
polling thread. With a single-thread approach in FFmpeg, the use of timeout
can block input submission making things slower.  This is even more
pronounced when B-frames are enabled and several inputs are needed to produce
the first output.

The condition of this sleep is in special events (primarily when amf input
queue is full), not the core loop part. During the experiments the cpu
increasing is about 2-4% or so, not a burst.

For low resolution encoding,  these changes bring significant performance
improvement (about 15%). It will not bring improvement for high resolution
such as 4K.


Thanks,

Evgeny

вт, 17 окт. 2023 г. в 03:26, Zhao Zhili <quinkblack@foxmail.com>:

>
> > 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道:
> >
> > On 16/10/2023 10:13, Evgeny Pavlov wrote:
> >> This commit reduces the sleep time on Windows to improve AMF encoding
> >> performance on low resolution input videos.
> >> This fix is for Windows only, because sleep() function isn't
> >> very accurate on Windows OS.
> >> Fix for issue #10622
> >> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
> >> ---
> >>  libavcodec/amfenc.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> >> index 061859f85c..0c95465d6e 100644
> >> --- a/libavcodec/amfenc.c
> >> +++ b/libavcodec/amfenc.c
> >> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx,
> AVPacket *avpkt)
> >>          if (query_output_data_flag == 0) {
> >>              if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain
> || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >=
> ctx->hwsurfaces_in_queue_max)) {
> >>                  block_and_wait = 1;
> >> +#ifdef _WIN32
> >> +                av_usleep(0); //Sleep() is not precise on Windows OS.
> >> +#else
> >>                  av_usleep(1000);
> >> +#endif
> >>              }
> >>          }
> >>      } while (block_and_wait);
> >
> > Wasting lots of power by spinning on a CPU core does not seem like a
> good answer to this problem.  (I mean, presumably that is why Windows isn't
> honouring your request for a short sleep, because it wants timers to have
> larger gaps to avoid wasting power.)
>
> If av_usleep is implemented via Sleep like current case, sleep 0 means
> yield current thread, so it’s not busy wait in normal case (but it can be
> busy wait).
>
> av_usleep(500) may looks better and do the same job by depending 500/1000
> = 0.
>
> I agree use sleep without real async is like a bug.
>
> >
> > Why is there a sleep here at all, anyway?  An API for hardware encoding
> should be providing a way for the caller to wait for an outstanding
> operation to complete.
> >
> > Thanks,
> >
> > - Mark
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avcodec/amfenc: Fix for windows imprecise sleep
  2023-10-17 17:11     ` Evgeny Pavlov
@ 2023-10-17 19:45       ` Kacper Michajlow
  2023-10-18 10:32         ` Evgeny Pavlov
  2023-10-18 20:36       ` [FFmpeg-devel] [PATCH] amfenc: Use a blocking call instead of sleeping and polling Mark Thompson
  1 sibling, 1 reply; 14+ messages in thread
From: Kacper Michajlow @ 2023-10-17 19:45 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Tue, 17 Oct 2023 at 19:34, Evgeny Pavlov <lucenticus@gmail.com> wrote:
>
> The reason for using av_usleep() here is that AMF API doesn’t provide an
> API for explicit wait. There are two modes to get output from encoder:
>
> 1. Polling with some sleep to avoid CPU thrashing – currently used in FFmpeg
>
> 2. Set timeout parameter on AMF encoder and QueryOutput call will block
> till output is available or the timeout happens.
>
> #2 is the preferable way but it is designed more to be used with a separate
> polling thread. With a single-thread approach in FFmpeg, the use of timeout
> can block input submission making things slower.  This is even more
> pronounced when B-frames are enabled and several inputs are needed to produce
> the first output.
>
> The condition of this sleep is in special events (primarily when amf input
> queue is full), not the core loop part. During the experiments the cpu
> increasing is about 2-4% or so, not a burst.
>
> For low resolution encoding,  these changes bring significant performance
> improvement (about 15%). It will not bring improvement for high resolution
> such as 4K.
>
>
> Thanks,
>
> Evgeny
>
> вт, 17 окт. 2023 г. в 03:26, Zhao Zhili <quinkblack@foxmail.com>:
>
> >
> > > 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道:
> > >
> > > On 16/10/2023 10:13, Evgeny Pavlov wrote:
> > >> This commit reduces the sleep time on Windows to improve AMF encoding
> > >> performance on low resolution input videos.
> > >> This fix is for Windows only, because sleep() function isn't
> > >> very accurate on Windows OS.
> > >> Fix for issue #10622
> > >> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
> > >> ---
> > >>  libavcodec/amfenc.c | 4 ++++
> > >>  1 file changed, 4 insertions(+)
> > >> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> > >> index 061859f85c..0c95465d6e 100644
> > >> --- a/libavcodec/amfenc.c
> > >> +++ b/libavcodec/amfenc.c
> > >> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext *avctx,
> > AVPacket *avpkt)
> > >>          if (query_output_data_flag == 0) {
> > >>              if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain
> > || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >=
> > ctx->hwsurfaces_in_queue_max)) {
> > >>                  block_and_wait = 1;
> > >> +#ifdef _WIN32
> > >> +                av_usleep(0); //Sleep() is not precise on Windows OS.
> > >> +#else
> > >>                  av_usleep(1000);
> > >> +#endif
> > >>              }
> > >>          }
> > >>      } while (block_and_wait);
> > >
> > > Wasting lots of power by spinning on a CPU core does not seem like a
> > good answer to this problem.  (I mean, presumably that is why Windows isn't
> > honouring your request for a short sleep, because it wants timers to have
> > larger gaps to avoid wasting power.)
> >
> > If av_usleep is implemented via Sleep like current case, sleep 0 means
> > yield current thread, so it’s not busy wait in normal case (but it can be
> > busy wait).
> >
> > av_usleep(500) may looks better and do the same job by depending 500/1000
> > = 0.
> >
> > I agree use sleep without real async is like a bug.
> >
> > >
> > > Why is there a sleep here at all, anyway?  An API for hardware encoding
> > should be providing a way for the caller to wait for an outstanding
> > operation to complete.
> > >
> > > Thanks,
> > >
> > > - Mark
> > > _______________________________________________
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel@ffmpeg.org
> > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
> > > To unsubscribe, visit link above, or email
> >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

Please don't top-post. I'll bottom-post now and no one will know how
to read this email.

If you need more precise sleep on Windows, your application should use
timeBeginPeriod/timeEndPeriod API, see
https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod

This sleep shouldn't be there to begin with and removing it only for
Windows, seems like a hacky workaround.

Sleep on Windows is accurate, when you request a timer resolution
appropriate for your application. You probably don't do that, and have
unexpectedly long sleeps, but it is not because they are "not
accurate", it is because you don't ask for it.

Side note, with `Sleep()` you can request only 1 ms sleep, but with
with waitable timers
https://learn.microsoft.com/en-us/windows/win32/sync/waitable-timer-objects
you can go down to 0.5 ms, which seems currently be the lowest
interval that Windows kernel will wake anything up in practice.

- Kacper
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH] avcodec/amfenc: Fix for windows imprecise sleep
  2023-10-17 19:45       ` Kacper Michajlow
@ 2023-10-18 10:32         ` Evgeny Pavlov
  0 siblings, 0 replies; 14+ messages in thread
From: Evgeny Pavlov @ 2023-10-18 10:32 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Tue, Oct 17, 2023 at 9:45 PM Kacper Michajlow <kasper93@gmail.com> wrote:

> On Tue, 17 Oct 2023 at 19:34, Evgeny Pavlov <lucenticus@gmail.com> wrote:
> >
> > The reason for using av_usleep() here is that AMF API doesn’t provide an
> > API for explicit wait. There are two modes to get output from encoder:
> >
> > 1. Polling with some sleep to avoid CPU thrashing – currently used in
> FFmpeg
> >
> > 2. Set timeout parameter on AMF encoder and QueryOutput call will block
> > till output is available or the timeout happens.
> >
> > #2 is the preferable way but it is designed more to be used with a
> separate
> > polling thread. With a single-thread approach in FFmpeg, the use of
> timeout
> > can block input submission making things slower.  This is even more
> > pronounced when B-frames are enabled and several inputs are needed to
> produce
> > the first output.
> >
> > The condition of this sleep is in special events (primarily when amf
> input
> > queue is full), not the core loop part. During the experiments the cpu
> > increasing is about 2-4% or so, not a burst.
> >
> > For low resolution encoding,  these changes bring significant performance
> > improvement (about 15%). It will not bring improvement for high
> resolution
> > such as 4K.
> >
> >
> > Thanks,
> >
> > Evgeny
> >
> > вт, 17 окт. 2023 г. в 03:26, Zhao Zhili <quinkblack@foxmail.com>:
> >
> > >
> > > > 在 2023年10月17日,上午5:24,Mark Thompson <sw@jkqxz.net> 写道:
> > > >
> > > > On 16/10/2023 10:13, Evgeny Pavlov wrote:
> > > >> This commit reduces the sleep time on Windows to improve AMF
> encoding
> > > >> performance on low resolution input videos.
> > > >> This fix is for Windows only, because sleep() function isn't
> > > >> very accurate on Windows OS.
> > > >> Fix for issue #10622
> > > >> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
> > > >> ---
> > > >>  libavcodec/amfenc.c | 4 ++++
> > > >>  1 file changed, 4 insertions(+)
> > > >> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> > > >> index 061859f85c..0c95465d6e 100644
> > > >> --- a/libavcodec/amfenc.c
> > > >> +++ b/libavcodec/amfenc.c
> > > >> @@ -770,7 +770,11 @@ int ff_amf_receive_packet(AVCodecContext
> *avctx,
> > > AVPacket *avpkt)
> > > >>          if (query_output_data_flag == 0) {
> > > >>              if (res_resubmit == AMF_INPUT_FULL ||
> ctx->delayed_drain
> > > || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >=
> > > ctx->hwsurfaces_in_queue_max)) {
> > > >>                  block_and_wait = 1;
> > > >> +#ifdef _WIN32
> > > >> +                av_usleep(0); //Sleep() is not precise on Windows
> OS.
> > > >> +#else
> > > >>                  av_usleep(1000);
> > > >> +#endif
> > > >>              }
> > > >>          }
> > > >>      } while (block_and_wait);
> > > >
> > > > Wasting lots of power by spinning on a CPU core does not seem like a
> > > good answer to this problem.  (I mean, presumably that is why Windows
> isn't
> > > honouring your request for a short sleep, because it wants timers to
> have
> > > larger gaps to avoid wasting power.)
> > >
> > > If av_usleep is implemented via Sleep like current case, sleep 0 means
> > > yield current thread, so it’s not busy wait in normal case (but it can
> be
> > > busy wait).
> > >
> > > av_usleep(500) may looks better and do the same job by depending
> 500/1000
> > > = 0.
> > >
> > > I agree use sleep without real async is like a bug.
> > >
> > > >
> > > > Why is there a sleep here at all, anyway?  An API for hardware
> encoding
> > > should be providing a way for the caller to wait for an outstanding
> > > operation to complete.
> > > >
> > > > Thanks,
> > > >
> > > > - Mark
> > > > _______________________________________________
> > > > ffmpeg-devel mailing list
> > > > ffmpeg-devel@ffmpeg.org
> > > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > > >
> > > > To unsubscribe, visit link above, or email
> > >
> > > _______________________________________________
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel@ffmpeg.org
> > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
> > > To unsubscribe, visit link above, or email
> > > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> > >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
> Please don't top-post. I'll bottom-post now and no one will know how
> to read this email.
>
> If you need more precise sleep on Windows, your application should use
> timeBeginPeriod/timeEndPeriod API, see
>
> https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
>
> This sleep shouldn't be there to begin with and removing it only for
> Windows, seems like a hacky workaround.
>
> Sleep on Windows is accurate, when you request a timer resolution
> appropriate for your application. You probably don't do that, and have
> unexpectedly long sleeps, but it is not because they are "not
> accurate", it is because you don't ask for it.
>
> Side note, with `Sleep()` you can request only 1 ms sleep, but with
> with waitable timers
> https://learn.microsoft.com/en-us/windows/win32/sync/waitable-timer-objects
> you can go down to 0.5 ms, which seems currently be the lowest
> interval that Windows kernel will wake anything up in practice.
>
> - Kacper
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>

I can use the code similar from AMF with timeBeginPeriod/timeEndPeriod
API(please ignore commented code)
AMF/amf/public/common/Windows/ThreadWindows.cpp
at master · GPUOpen-LibrariesAndSDKs/AMF · GitHub
<https://github.com/GPUOpen-LibrariesAndSDKs/AMF/blob/master/amf/public/common/Windows/ThreadWindows.cpp#L303>
But in my opinion, an alternative suggestion from Zhao Zhili to use
av_usleep(500) will be more suitable for ffmpeg,
because I found similar code for QSV components in ffmpeg.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [FFmpeg-devel] [PATCH] amfenc: Use a blocking call instead of sleeping and polling
  2023-10-17 17:11     ` Evgeny Pavlov
  2023-10-17 19:45       ` Kacper Michajlow
@ 2023-10-18 20:36       ` Mark Thompson
  2023-10-19 16:13         ` Evgeny Pavlov
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Thompson @ 2023-10-18 20:36 UTC (permalink / raw)
  To: ffmpeg-devel

---
On 17/10/2023 18:11, Evgeny Pavlov wrote:
> The reason for using av_usleep() here is that AMF API doesn’t provide an
> API for explicit wait. There are two modes to get output from encoder:
> 
> 1. Polling with some sleep to avoid CPU thrashing – currently used in FFmpeg
> 
> 2. Set timeout parameter on AMF encoder and QueryOutput call will block
> till output is available or the timeout happens.
> 
> #2 is the preferable way but it is designed more to be used with a separate
> polling thread. With a single-thread approach in FFmpeg, the use of timeout
> can block input submission making things slower.  This is even more
> pronounced when B-frames are enabled and several inputs are needed to produce
> the first output.

This approach seems like it should work here?  Run non-blocking until the queue is full, then switch to blocking when you need to wait for some output.

I tried the patch enclosing (H.264 only, different proprties needed for other codecs), but it doesn't seem to work - the test assert always hits immediately and timing shows that QueryOutput didn't block even though the timeout should be set?  I'm probably doing something incorrect, maybe you would know how to fix it.

> The condition of this sleep is in special events (primarily when amf input
> queue is full), not the core loop part. During the experiments the cpu
> increasing is about 2-4% or so, not a burst.

What cases are you experimenting with?

The most problematic case I can think of is multiple encodes running simultaneously sharing the same instance so that each one has to wait for others to complete and therefore all queues fill up.

The busy wait will end up being the only place where it can block (since everything else runs asynchronously), so you will peg one CPU at close to 100% per encode running.

Thanks,

- Mark

  libavcodec/amfenc.c | 22 +++++++++++++++++++---
  libavcodec/amfenc.h |  1 +
  2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 061859f85c..db7ddbb083 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -713,13 +713,22 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
          }
      }

-
+    block_and_wait = 0;
      do {
-        block_and_wait = 0;
          // poll data
          if (!avpkt->data && !avpkt->buf) {
+            int64_t timeout = block_and_wait ? 100 : 0;
+            if (timeout != ctx->output_query_timeout) {
+                av_log(avctx, AV_LOG_INFO, "Set output query timeout to %"PRId64"\n", timeout);
+                AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder, AMF_VIDEO_ENCODER_QUERY_TIMEOUT, timeout);
+                AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN, "Failed to set output query timeout\n");
+                ctx->output_query_timeout = timeout;
+            }
+
              res_query = ctx->encoder->pVtbl->QueryOutput(ctx->encoder, &data);
              if (data) {
+                av_log(avctx, AV_LOG_INFO, "QueryOutput returned with data\n");
+
                  // copy data to packet
                  AMFBuffer *buffer;
                  AMFGuid guid = IID_AMFBuffer();
@@ -740,7 +749,13 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)
                  data->pVtbl->Release(data);

                  AMF_RETURN_IF_FALSE(ctx, ret >= 0, ret, "amf_copy_buffer() failed with error %d\n", ret);
+            } else {
+                av_log(avctx, AV_LOG_INFO, "QueryOutput returned with nothing (%d)\n", res_query);
+                // For testing, shouldn't hit this unless machine is otherwise very loaded.
+                av_assert0(!block_and_wait);
              }
+
+            block_and_wait = 0;
          }
          res_resubmit = AMF_OK;
          if (ctx->delayed_surface != NULL) { // try to resubmit frame
@@ -769,8 +784,9 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket *avpkt)

          if (query_output_data_flag == 0) {
              if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain || (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >= ctx->hwsurfaces_in_queue_max)) {
+                av_log(avctx, AV_LOG_INFO, "Need to wait for output\n");
                  block_and_wait = 1;
-                av_usleep(1000);
+                //av_usleep(1000);
              }
          }
      } while (block_and_wait);
diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h
index 2dbd378ef8..64c77115b6 100644
--- a/libavcodec/amfenc.h
+++ b/libavcodec/amfenc.h
@@ -72,6 +72,7 @@ typedef struct AmfContext {
      int                 delayed_drain;
      AMFSurface         *delayed_surface;
      AVFrame            *delayed_frame;
+    int64_t             output_query_timeout;

      // shift dts back by max_b_frames in timing
      AVFifo             *timestamp_list;
-- 
2.39.2
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH] amfenc: Use a blocking call instead of sleeping and polling
  2023-10-18 20:36       ` [FFmpeg-devel] [PATCH] amfenc: Use a blocking call instead of sleeping and polling Mark Thompson
@ 2023-10-19 16:13         ` Evgeny Pavlov
  2023-10-22 14:30           ` Mark Thompson
  0 siblings, 1 reply; 14+ messages in thread
From: Evgeny Pavlov @ 2023-10-19 16:13 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Wed, Oct 18, 2023 at 10:36 PM Mark Thompson <sw@jkqxz.net> wrote:

> ---
> On 17/10/2023 18:11, Evgeny Pavlov wrote:
> > The reason for using av_usleep() here is that AMF API doesn’t provide an
> > API for explicit wait. There are two modes to get output from encoder:
> >
> > 1. Polling with some sleep to avoid CPU thrashing – currently used in
> FFmpeg
> >
> > 2. Set timeout parameter on AMF encoder and QueryOutput call will block
> > till output is available or the timeout happens.
> >
> > #2 is the preferable way but it is designed more to be used with a
> separate
> > polling thread. With a single-thread approach in FFmpeg, the use of
> timeout
> > can block input submission making things slower.  This is even more
> > pronounced when B-frames are enabled and several inputs are needed to
> produce
> > the first output.
>
> This approach seems like it should work here?  Run non-blocking until the
> queue is full, then switch to blocking when you need to wait for some
> output.
>
> I tried the patch enclosing (H.264 only, different proprties needed for
> other codecs), but it doesn't seem to work - the test assert always hits
> immediately and timing shows that QueryOutput didn't block even though the
> timeout should be set?  I'm probably doing something incorrect, maybe you
> would know how to fix it.
>
> > The condition of this sleep is in special events (primarily when amf
> input
> > queue is full), not the core loop part. During the experiments the cpu
> > increasing is about 2-4% or so, not a burst.
>
> What cases are you experimenting with?
>
> The most problematic case I can think of is multiple encodes running
> simultaneously sharing the same instance so that each one has to wait for
> others to complete and therefore all queues fill up.
>
> The busy wait will end up being the only place where it can block (since
> everything else runs asynchronously), so you will peg one CPU at close to
> 100% per encode running.
>
> Thanks,
>
> - Mark
>
>   libavcodec/amfenc.c | 22 +++++++++++++++++++---
>   libavcodec/amfenc.h |  1 +
>   2 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> index 061859f85c..db7ddbb083 100644
> --- a/libavcodec/amfenc.c
> +++ b/libavcodec/amfenc.c
> @@ -713,13 +713,22 @@ int ff_amf_receive_packet(AVCodecContext *avctx,
> AVPacket *avpkt)
>           }
>       }
>
> -
> +    block_and_wait = 0;
>       do {
> -        block_and_wait = 0;
>           // poll data
>           if (!avpkt->data && !avpkt->buf) {
> +            int64_t timeout = block_and_wait ? 100 : 0;
> +            if (timeout != ctx->output_query_timeout) {
> +                av_log(avctx, AV_LOG_INFO, "Set output query timeout to
> %"PRId64"\n", timeout);
> +                AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder,
> AMF_VIDEO_ENCODER_QUERY_TIMEOUT, timeout);
> +                AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN,
> "Failed to set output query timeout\n");
> +                ctx->output_query_timeout = timeout;
> +            }
> +
>               res_query = ctx->encoder->pVtbl->QueryOutput(ctx->encoder,
> &data);
>               if (data) {
> +                av_log(avctx, AV_LOG_INFO, "QueryOutput returned with
> data\n");
> +
>                   // copy data to packet
>                   AMFBuffer *buffer;
>                   AMFGuid guid = IID_AMFBuffer();
> @@ -740,7 +749,13 @@ int ff_amf_receive_packet(AVCodecContext *avctx,
> AVPacket *avpkt)
>                   data->pVtbl->Release(data);
>
>                   AMF_RETURN_IF_FALSE(ctx, ret >= 0, ret,
> "amf_copy_buffer() failed with error %d\n", ret);
> +            } else {
> +                av_log(avctx, AV_LOG_INFO, "QueryOutput returned with
> nothing (%d)\n", res_query);
> +                // For testing, shouldn't hit this unless machine is
> otherwise very loaded.
> +                av_assert0(!block_and_wait);
>               }
> +
> +            block_and_wait = 0;
>           }
>           res_resubmit = AMF_OK;
>           if (ctx->delayed_surface != NULL) { // try to resubmit frame
> @@ -769,8 +784,9 @@ int ff_amf_receive_packet(AVCodecContext *avctx,
> AVPacket *avpkt)
>
>           if (query_output_data_flag == 0) {
>               if (res_resubmit == AMF_INPUT_FULL || ctx->delayed_drain ||
> (ctx->eof && res_query != AMF_EOF) || (ctx->hwsurfaces_in_queue >=
> ctx->hwsurfaces_in_queue_max)) {
> +                av_log(avctx, AV_LOG_INFO, "Need to wait for output\n");
>                   block_and_wait = 1;
> -                av_usleep(1000);
> +                //av_usleep(1000);
>               }
>           }
>       } while (block_and_wait);
> diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h
> index 2dbd378ef8..64c77115b6 100644
> --- a/libavcodec/amfenc.h
> +++ b/libavcodec/amfenc.h
> @@ -72,6 +72,7 @@ typedef struct AmfContext {
>       int                 delayed_drain;
>       AMFSurface         *delayed_surface;
>       AVFrame            *delayed_frame;
> +    int64_t             output_query_timeout;
>
>       // shift dts back by max_b_frames in timing
>       AVFifo             *timestamp_list;
> --
> 2.39.2
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>

Dynamic switching between non-blocking & blocking approaches isn’t
supported in AMF at this time.

We might request to implement this feature for AMF team, but it might took
some time to implement this.

I would suggest using av_usleep(500) until this feature is implemented.

> What cases are you experimenting with?

This issue is very easy to reproduce when:

1) low resolution transcoding

2) hardware accelerated decoding

The command line sample:  ffmpeg -hwaccel d3d11va -hwaccel_output_format
d3d11 -i input_480x360_h264.mp4 -c:v hevc_amf  output_480x360_hevc.mp4
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH] amfenc: Use a blocking call instead of sleeping and polling
  2023-10-19 16:13         ` Evgeny Pavlov
@ 2023-10-22 14:30           ` Mark Thompson
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Thompson @ 2023-10-22 14:30 UTC (permalink / raw)
  To: ffmpeg-devel

On 19/10/2023 17:13, Evgeny Pavlov wrote:
> On Wed, Oct 18, 2023 at 10:36 PM Mark Thompson <sw@jkqxz.net> wrote:
> 
>> ---
>> On 17/10/2023 18:11, Evgeny Pavlov wrote:
>>> The reason for using av_usleep() here is that AMF API doesn’t provide an
>>> API for explicit wait. There are two modes to get output from encoder:
>>>
>>> 1. Polling with some sleep to avoid CPU thrashing – currently used in
>> FFmpeg
>>>
>>> 2. Set timeout parameter on AMF encoder and QueryOutput call will block
>>> till output is available or the timeout happens.
>>>
>>> #2 is the preferable way but it is designed more to be used with a
>> separate
>>> polling thread. With a single-thread approach in FFmpeg, the use of
>> timeout
>>> can block input submission making things slower.  This is even more
>>> pronounced when B-frames are enabled and several inputs are needed to
>> produce
>>> the first output.
>>
>> This approach seems like it should work here?  Run non-blocking until the
>> queue is full, then switch to blocking when you need to wait for some
>> output.
>>
>> I tried the patch enclosing (H.264 only, different proprties needed for
>> other codecs), but it doesn't seem to work - the test assert always hits
>> immediately and timing shows that QueryOutput didn't block even though the
>> timeout should be set?  I'm probably doing something incorrect, maybe you
>> would know how to fix it.
>>
>>> The condition of this sleep is in special events (primarily when amf
>> input
>>> queue is full), not the core loop part. During the experiments the cpu
>>> increasing is about 2-4% or so, not a burst.
>>
>> What cases are you experimenting with?
>>
>> The most problematic case I can think of is multiple encodes running
>> simultaneously sharing the same instance so that each one has to wait for
>> others to complete and therefore all queues fill up.
>>
>> The busy wait will end up being the only place where it can block (since
>> everything else runs asynchronously), so you will peg one CPU at close to
>> 100% per encode running.
>>
>> Thanks,
>>
>> - Mark
>>
>>    libavcodec/amfenc.c | 22 +++++++++++++++++++---
>>    libavcodec/amfenc.h |  1 +
>>    2 files changed, 20 insertions(+), 3 deletions(-)
>>
>> ...
> 
> Dynamic switching between non-blocking & blocking approaches isn’t
> supported in AMF at this time.
> 
> We might request to implement this feature for AMF team, but it might took
> some time to implement this.

That is unfortunate, but it sounds like something like this is required.

> I would suggest using av_usleep(500) until this feature is implemented.
> 
>> What cases are you experimenting with?
> 
> This issue is very easy to reproduce when:
> 
> 1) low resolution transcoding
> 
> 2) hardware accelerated decoding
> 
> The command line sample:  ffmpeg -hwaccel d3d11va -hwaccel_output_format
> d3d11 -i input_480x360_h264.mp4 -c:v hevc_amf  output_480x360_hevc.mp4

To clarify, I meant: what cases are you experimenting with to verify that this doesn't cause problems elsewhere?

I agree (and can reproduce) that the specific case with one low-resolution stream slightly improves throughput at the cost of increased CPU use.

 >> The most problematic case I can think of is multiple encodes running
 >> simultaneously sharing the same instance so that each one has to wait for
 >> others to complete and therefore all queues fill up.
 >>
 >> The busy wait will end up being the only place where it can block (since
 >> everything else runs asynchronously), so you will peg one CPU at close to
 >> 100% per encode running.

I tried this case with two 4K streams and indeed it is a huge regression.  CPU use goes from 1-2% of one core for both streams to spinning on two cores, around a 100x increase.

Total throughput also decreased by about 10% in my testing, though since I'm running on a low-power device that might be an artefact of the CPU spinning wasting so much power that other clocks are reduced.

(My test was two instances of

$ ./ffmpeg_g.exe -extra_hw_frames 100 -hwaccel d3d11va -hwaccel_output_format d3d11 -i input-4k.mp4 -an -vf loop=loop=20:size=100:start=0 -c:v h264_amf -f null -

running simulataneously, looking at the steady state in the loop after the first hundred frames with the decoder are complete.)

Please consider this patch rejected in its current form.  IMO this is a hole in the AMF API and it needs to be improved to be able to wait for operations to complete rather than polling in the user code.

Thanks,

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows
  2023-10-16  9:13 [FFmpeg-devel] [PATCH] avcodec/amfenc: Fix for windows imprecise sleep Evgeny Pavlov
  2023-10-16 21:24 ` Mark Thompson
@ 2023-11-13 14:37 ` Evgeny Pavlov
  2023-11-20 16:01   ` Evgeny Pavlov
  2023-11-27 13:42   ` Mark Thompson
  1 sibling, 2 replies; 14+ messages in thread
From: Evgeny Pavlov @ 2023-11-13 14:37 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Evgeny Pavlov

This commit increase precision of Sleep() function on Windows.
This fix reduces the sleep time on Windows to improve AMF encoding
performance on low resolution input videos.

Fix for issue #10622

v2: use timeBeginPeriod/timeEndPeriod for increasing precision of Sleep()

Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
---
 libavcodec/amfenc.c | 31 +++++++++++++++++++++++++++++++
 libavcodec/amfenc.h |  3 +++
 2 files changed, 34 insertions(+)

diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 061859f85c..55e24856e8 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -42,7 +42,12 @@
 #endif
 
 #ifdef _WIN32
+#include <timeapi.h>
 #include "compat/w32dlfcn.h"
+
+typedef MMRESULT (*timeapi_fun)(UINT uPeriod);
+#define WINMM_DLL "winmm.dll"
+
 #else
 #include <dlfcn.h>
 #endif
@@ -113,6 +118,9 @@ static int amf_load_library(AVCodecContext *avctx)
     AMFInit_Fn         init_fun;
     AMFQueryVersion_Fn version_fun;
     AMF_RESULT         res;
+#ifdef _WIN32
+    timeapi_fun time_begin_fun;
+#endif
 
     ctx->delayed_frame = av_frame_alloc();
     if (!ctx->delayed_frame) {
@@ -145,6 +153,16 @@ static int amf_load_library(AVCodecContext *avctx)
     AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN, "GetTrace() failed with error %d\n", res);
     res = ctx->factory->pVtbl->GetDebug(ctx->factory, &ctx->debug);
     AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN, "GetDebug() failed with error %d\n", res);
+
+#ifdef _WIN32
+    // Increase precision of Sleep() function on Windows platform
+    ctx->winmm_lib = dlopen(WINMM_DLL, RTLD_NOW | RTLD_LOCAL);
+    AMF_RETURN_IF_FALSE(ctx, ctx->winmm_lib != NULL, 0, "DLL %s failed to open\n", WINMM_DLL);
+    time_begin_fun = (timeapi_fun)dlsym(ctx->winmm_lib, "timeBeginPeriod");
+    AMF_RETURN_IF_FALSE(ctx, time_begin_fun != NULL, 0, "DLL %s failed to find function %s\n", WINMM_DLL, "timeBeginPeriod");
+    time_begin_fun(1);
+#endif //_WIN32
+
     return 0;
 }
 
@@ -375,6 +393,9 @@ static int amf_init_encoder(AVCodecContext *avctx)
 int av_cold ff_amf_encode_close(AVCodecContext *avctx)
 {
     AmfContext *ctx = avctx->priv_data;
+#ifdef _WIN32
+    timeapi_fun time_end_fun;
+#endif //_WIN32
 
     if (ctx->delayed_surface) {
         ctx->delayed_surface->pVtbl->Release(ctx->delayed_surface);
@@ -410,6 +431,16 @@ int av_cold ff_amf_encode_close(AVCodecContext *avctx)
     av_frame_free(&ctx->delayed_frame);
     av_fifo_freep2(&ctx->timestamp_list);
 
+#ifdef _WIN32
+    if (ctx->winmm_lib) {
+        time_end_fun = (timeapi_fun)dlsym(ctx->winmm_lib, "timeEndPeriod");
+        AMF_RETURN_IF_FALSE(ctx, time_end_fun != NULL, 0, "DLL %s failed to find function %s\n", WINMM_DLL, "timeEndPeriod");
+        time_end_fun(1);
+        dlclose(ctx->winmm_lib);
+        ctx->winmm_lib = NULL;
+    }
+#endif //_WIN32
+
     return 0;
 }
 
diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h
index 2dbd378ef8..35bcf1dfe3 100644
--- a/libavcodec/amfenc.h
+++ b/libavcodec/amfenc.h
@@ -50,6 +50,9 @@ typedef struct AmfContext {
     AVClass            *avclass;
     // access to AMF runtime
     amf_handle          library; ///< handle to DLL library
+#ifdef _WIN32
+    amf_handle          winmm_lib; ///< handle to winmm DLL library
+#endif //_WIN32
     AMFFactory         *factory; ///< pointer to AMF factory
     AMFDebug           *debug;   ///< pointer to AMF debug interface
     AMFTrace           *trace;   ///< pointer to AMF trace interface
-- 
2.42.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows
  2023-11-13 14:37 ` [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows Evgeny Pavlov
@ 2023-11-20 16:01   ` Evgeny Pavlov
  2023-11-27 13:42   ` Mark Thompson
  1 sibling, 0 replies; 14+ messages in thread
From: Evgeny Pavlov @ 2023-11-20 16:01 UTC (permalink / raw)
  To: ffmpeg-devel

On Mon, Nov 13, 2023 at 3:41 PM Evgeny Pavlov <lucenticus@gmail.com> wrote:

> This commit increase precision of Sleep() function on Windows.
> This fix reduces the sleep time on Windows to improve AMF encoding
> performance on low resolution input videos.
>
> Fix for issue #10622
>
> v2: use timeBeginPeriod/timeEndPeriod for increasing precision of Sleep()
>
> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
> ---
>  libavcodec/amfenc.c | 31 +++++++++++++++++++++++++++++++
>  libavcodec/amfenc.h |  3 +++
>  2 files changed, 34 insertions(+)
>
> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> index 061859f85c..55e24856e8 100644
> --- a/libavcodec/amfenc.c
> +++ b/libavcodec/amfenc.c
> @@ -42,7 +42,12 @@
>  #endif
>
>  #ifdef _WIN32
> +#include <timeapi.h>
>  #include "compat/w32dlfcn.h"
> +
> +typedef MMRESULT (*timeapi_fun)(UINT uPeriod);
> +#define WINMM_DLL "winmm.dll"
> +
>  #else
>  #include <dlfcn.h>
>  #endif
> @@ -113,6 +118,9 @@ static int amf_load_library(AVCodecContext *avctx)
>      AMFInit_Fn         init_fun;
>      AMFQueryVersion_Fn version_fun;
>      AMF_RESULT         res;
> +#ifdef _WIN32
> +    timeapi_fun time_begin_fun;
> +#endif
>
>      ctx->delayed_frame = av_frame_alloc();
>      if (!ctx->delayed_frame) {
> @@ -145,6 +153,16 @@ static int amf_load_library(AVCodecContext *avctx)
>      AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN, "GetTrace()
> failed with error %d\n", res);
>      res = ctx->factory->pVtbl->GetDebug(ctx->factory, &ctx->debug);
>      AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN, "GetDebug()
> failed with error %d\n", res);
> +
> +#ifdef _WIN32
> +    // Increase precision of Sleep() function on Windows platform
> +    ctx->winmm_lib = dlopen(WINMM_DLL, RTLD_NOW | RTLD_LOCAL);
> +    AMF_RETURN_IF_FALSE(ctx, ctx->winmm_lib != NULL, 0, "DLL %s failed to
> open\n", WINMM_DLL);
> +    time_begin_fun = (timeapi_fun)dlsym(ctx->winmm_lib,
> "timeBeginPeriod");
> +    AMF_RETURN_IF_FALSE(ctx, time_begin_fun != NULL, 0, "DLL %s failed to
> find function %s\n", WINMM_DLL, "timeBeginPeriod");
> +    time_begin_fun(1);
> +#endif //_WIN32
> +
>      return 0;
>  }
>
> @@ -375,6 +393,9 @@ static int amf_init_encoder(AVCodecContext *avctx)
>  int av_cold ff_amf_encode_close(AVCodecContext *avctx)
>  {
>      AmfContext *ctx = avctx->priv_data;
> +#ifdef _WIN32
> +    timeapi_fun time_end_fun;
> +#endif //_WIN32
>
>      if (ctx->delayed_surface) {
>          ctx->delayed_surface->pVtbl->Release(ctx->delayed_surface);
> @@ -410,6 +431,16 @@ int av_cold ff_amf_encode_close(AVCodecContext *avctx)
>      av_frame_free(&ctx->delayed_frame);
>      av_fifo_freep2(&ctx->timestamp_list);
>
> +#ifdef _WIN32
> +    if (ctx->winmm_lib) {
> +        time_end_fun = (timeapi_fun)dlsym(ctx->winmm_lib,
> "timeEndPeriod");
> +        AMF_RETURN_IF_FALSE(ctx, time_end_fun != NULL, 0, "DLL %s failed
> to find function %s\n", WINMM_DLL, "timeEndPeriod");
> +        time_end_fun(1);
> +        dlclose(ctx->winmm_lib);
> +        ctx->winmm_lib = NULL;
> +    }
> +#endif //_WIN32
> +
>      return 0;
>  }
>
> diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h
> index 2dbd378ef8..35bcf1dfe3 100644
> --- a/libavcodec/amfenc.h
> +++ b/libavcodec/amfenc.h
> @@ -50,6 +50,9 @@ typedef struct AmfContext {
>      AVClass            *avclass;
>      // access to AMF runtime
>      amf_handle          library; ///< handle to DLL library
> +#ifdef _WIN32
> +    amf_handle          winmm_lib; ///< handle to winmm DLL library
> +#endif //_WIN32
>      AMFFactory         *factory; ///< pointer to AMF factory
>      AMFDebug           *debug;   ///< pointer to AMF debug interface
>      AMFTrace           *trace;   ///< pointer to AMF trace interface
> --
> 2.42.0
>
>
Please take a look on this patch, it helps to improve AMF encoding
performance on small resolution video on Windows platform by using more
precise Sleep()
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows
  2023-11-13 14:37 ` [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows Evgeny Pavlov
  2023-11-20 16:01   ` Evgeny Pavlov
@ 2023-11-27 13:42   ` Mark Thompson
  2023-11-27 14:04     ` Henrik Gramner via ffmpeg-devel
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Thompson @ 2023-11-27 13:42 UTC (permalink / raw)
  To: ffmpeg-devel

On 13/11/2023 14:37, Evgeny Pavlov wrote:
> This commit increase precision of Sleep() function on Windows.
> This fix reduces the sleep time on Windows to improve AMF encoding
> performance on low resolution input videos.
> 
> Fix for issue #10622
> 
> v2: use timeBeginPeriod/timeEndPeriod for increasing precision of Sleep()
> 
> Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
> ---
>   libavcodec/amfenc.c | 31 +++++++++++++++++++++++++++++++
>   libavcodec/amfenc.h |  3 +++
>   2 files changed, 34 insertions(+)
> 
> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
> index 061859f85c..55e24856e8 100644
> --- a/libavcodec/amfenc.c
> +++ b/libavcodec/amfenc.c
> @@ -42,7 +42,12 @@
>   #endif
>   
>   #ifdef _WIN32
> +#include <timeapi.h>
>   #include "compat/w32dlfcn.h"
> +
> +typedef MMRESULT (*timeapi_fun)(UINT uPeriod);
> +#define WINMM_DLL "winmm.dll"
> +
>   #else
>   #include <dlfcn.h>
>   #endif
> @@ -113,6 +118,9 @@ static int amf_load_library(AVCodecContext *avctx)
>       AMFInit_Fn         init_fun;
>       AMFQueryVersion_Fn version_fun;
>       AMF_RESULT         res;
> +#ifdef _WIN32
> +    timeapi_fun time_begin_fun;
> +#endif
>   
>       ctx->delayed_frame = av_frame_alloc();
>       if (!ctx->delayed_frame) {
> @@ -145,6 +153,16 @@ static int amf_load_library(AVCodecContext *avctx)
>       AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN, "GetTrace() failed with error %d\n", res);
>       res = ctx->factory->pVtbl->GetDebug(ctx->factory, &ctx->debug);
>       AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN, "GetDebug() failed with error %d\n", res);
> +
> +#ifdef _WIN32
> +    // Increase precision of Sleep() function on Windows platform
> +    ctx->winmm_lib = dlopen(WINMM_DLL, RTLD_NOW | RTLD_LOCAL);
> +    AMF_RETURN_IF_FALSE(ctx, ctx->winmm_lib != NULL, 0, "DLL %s failed to open\n", WINMM_DLL);
> +    time_begin_fun = (timeapi_fun)dlsym(ctx->winmm_lib, "timeBeginPeriod");
> +    AMF_RETURN_IF_FALSE(ctx, time_begin_fun != NULL, 0, "DLL %s failed to find function %s\n", WINMM_DLL, "timeBeginPeriod");
> +    time_begin_fun(1);
> +#endif //_WIN32
> +
>       return 0;
>   }
>   
> @@ -375,6 +393,9 @@ static int amf_init_encoder(AVCodecContext *avctx)
>   int av_cold ff_amf_encode_close(AVCodecContext *avctx)
>   {
>       AmfContext *ctx = avctx->priv_data;
> +#ifdef _WIN32
> +    timeapi_fun time_end_fun;
> +#endif //_WIN32
>   
>       if (ctx->delayed_surface) {
>           ctx->delayed_surface->pVtbl->Release(ctx->delayed_surface);
> @@ -410,6 +431,16 @@ int av_cold ff_amf_encode_close(AVCodecContext *avctx)
>       av_frame_free(&ctx->delayed_frame);
>       av_fifo_freep2(&ctx->timestamp_list);
>   
> +#ifdef _WIN32
> +    if (ctx->winmm_lib) {
> +        time_end_fun = (timeapi_fun)dlsym(ctx->winmm_lib, "timeEndPeriod");
> +        AMF_RETURN_IF_FALSE(ctx, time_end_fun != NULL, 0, "DLL %s failed to find function %s\n", WINMM_DLL, "timeEndPeriod");
> +        time_end_fun(1);
> +        dlclose(ctx->winmm_lib);
> +        ctx->winmm_lib = NULL;
> +    }
> +#endif //_WIN32
> +
>       return 0;
>   }
>   
> diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h
> index 2dbd378ef8..35bcf1dfe3 100644
> --- a/libavcodec/amfenc.h
> +++ b/libavcodec/amfenc.h
> @@ -50,6 +50,9 @@ typedef struct AmfContext {
>       AVClass            *avclass;
>       // access to AMF runtime
>       amf_handle          library; ///< handle to DLL library
> +#ifdef _WIN32
> +    amf_handle          winmm_lib; ///< handle to winmm DLL library
> +#endif //_WIN32
>       AMFFactory         *factory; ///< pointer to AMF factory
>       AMFDebug           *debug;   ///< pointer to AMF debug interface
>       AMFTrace           *trace;   ///< pointer to AMF trace interface

Is it reasonable to set this global state from a library without the parent program knowing?  We'd really prefer not to affect the global state unexpectedly.

It's also unclear to me what the effect of this tradeoff on power is, given that the whole reason why this happens is that Windows is trying to keep the CPU asleep for as long as possible to save power.

Thanks,

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows
  2023-11-27 13:42   ` Mark Thompson
@ 2023-11-27 14:04     ` Henrik Gramner via ffmpeg-devel
  2024-02-19 15:26       ` Evgeny Pavlov
  0 siblings, 1 reply; 14+ messages in thread
From: Henrik Gramner via ffmpeg-devel @ 2023-11-27 14:04 UTC (permalink / raw)
  To: FFmpeg development discussions and patches; +Cc: Henrik Gramner

On Mon, Nov 27, 2023 at 2:42 PM Mark Thompson <sw@jkqxz.net> wrote:
> Is it reasonable to set this global state from a library without the parent program knowing?  We'd really prefer not to affect the global state unexpectedly.

CreateWaitableTimerExW() with the
CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag might be an alternative?
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows
  2023-11-27 14:04     ` Henrik Gramner via ffmpeg-devel
@ 2024-02-19 15:26       ` Evgeny Pavlov
  0 siblings, 0 replies; 14+ messages in thread
From: Evgeny Pavlov @ 2024-02-19 15:26 UTC (permalink / raw)
  To: FFmpeg development discussions and patches; +Cc: Henrik Gramner

On Mon, Nov 27, 2023 at 3:05 PM Henrik Gramner via ffmpeg-devel
<ffmpeg-devel@ffmpeg.org> wrote:
>
> On Mon, Nov 27, 2023 at 2:42 PM Mark Thompson <sw@jkqxz.net> wrote:
> > Is it reasonable to set this global state from a library without the parent program knowing?  We'd really prefer not to affect the global state unexpectedly.
>
> CreateWaitableTimerExW() with the
> CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag might be an alternative?
>

We evaluated CreateWaitableTimerExW with
CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag. In fact, this function has
the same precision level as the Sleep() function.

Usually changing the time resolution will only affect the current
process and will not impact other processes, thus it will not cause a
global effect on the current system. Here is an info from
documentation on timeBeginPeriod
https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod

"Prior to Windows 10, version 2004, this function affects a global
Windows setting. For all processes Windows uses the lowest value (that
is, highest resolution) requested by any process. Starting with
Windows 10, version 2004, this function no longer affects global timer
resolution. For processes which call this function, Windows uses the
lowest value (that is, highest resolution) requested by any process.
For processes which have not called this function, Windows does not
guarantee a higher resolution than the default system resolution."

We provide the following measurement to show performance improvements
with this patch.

1. Performance tests show that this high precision sleep will improve
performance, especially for low resolution sequences, it can get about
20% improvement.

Frames Per Second (FPS) being encoded by the hardware encoder (Navi 31
RX7900XT ):

Source Type: H.264 ,  Output Type: H.264
(Sorry for bad formatting)
No. |   Sequence Resolution | No. of Frames|    FPS Before patch    |
FPS after patch   | Difference    | Improvement %
----|-----------------------|--------------|------------------------|-------------------|---------------|----------
1   |   480x360             | 8290         |        2030            |
     2365        | 335           | 16.5%
2   |   720x576             | 8290         |        1440            |
     1790        | 350           | 24.3%
3 |     1280x720            | 8290         |        1120            |
     1190        | 70            | 6.3%
4   |   1920x1080           | 8290         |        692             |
     714         | 22            | 3.2%
5   |   3840x2160           | 8290         |        200             |
     200         | 0             | 0.0%

The sample ffmpeg command line:
$ ffmpeg.exe -y -hwaccel d3d11va -hwaccel_output_format d3d11 -i
input.mp4 -c:v h264_amf out.mp4
where input.mp4 should be changed to corresponding resolution input
H.264 format bitstream.

2. The power tests show an increase in power is within limit scope.

The purpose of the power test is to examine the increase in CPU power
consumption due to the improvement in CPU time resolution after using
this patch. We were testing a product from AMD called Phoenix, which
we refer to as an APU. It combines a general-purpose AMD CPU and a 3D
integrated graphics processing unit (IGPU) on a single die. Only the
APU has a DAP connector to the board's power rails.

We got the power test data shown below:

|                        | 480x360   |  720x576   | 1280x720 |
1920x1080 | 3840x2160 | average
|------------------------|-----------|------------|----------|-----------|-----------|--------
|CPU  power change       |  1.93%    |  2.43%     | -1.69%   | 3.49%
  | 2.92%     | 1.82%
|APU power total change  |  0.86%    |  1.34%     | -0.62%   | 1.54%
  | -0.58%    | 0.51

When using a high precision clock by applying the patch, the average
power consumption for CPU increases 1.82%, and the APU total increases
0.51%. We can see the power increase in power not very significant.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-02-19 15:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-16  9:13 [FFmpeg-devel] [PATCH] avcodec/amfenc: Fix for windows imprecise sleep Evgeny Pavlov
2023-10-16 21:24 ` Mark Thompson
2023-10-17  1:25   ` Zhao Zhili
2023-10-17 17:11     ` Evgeny Pavlov
2023-10-17 19:45       ` Kacper Michajlow
2023-10-18 10:32         ` Evgeny Pavlov
2023-10-18 20:36       ` [FFmpeg-devel] [PATCH] amfenc: Use a blocking call instead of sleeping and polling Mark Thompson
2023-10-19 16:13         ` Evgeny Pavlov
2023-10-22 14:30           ` Mark Thompson
2023-11-13 14:37 ` [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows Evgeny Pavlov
2023-11-20 16:01   ` Evgeny Pavlov
2023-11-27 13:42   ` Mark Thompson
2023-11-27 14:04     ` Henrik Gramner via ffmpeg-devel
2024-02-19 15:26       ` Evgeny Pavlov

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git