Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android
@ 2024-06-07  9:12 Martin Storsjö
  2024-06-07  9:27 ` Rémi Denis-Courmont
  2024-06-13 12:54 ` Martin Storsjö
  0 siblings, 2 replies; 6+ messages in thread
From: Martin Storsjö @ 2024-06-07  9:12 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Zhao Zhili

The default timer register pmccntr_el0 usually requires enabling
access with e.g. a kernel module.
---
cntvct_el0 has significantly better resolution than
av_gettime_relative (while the unscaled nanosecond output of
clock_gettime is much higher resolution).

In one tested case, the cntvct_el0 timer has a frequency of 25 MHz
(readable via the register cntfrq_el0).
---
 libavutil/aarch64/timer.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/libavutil/aarch64/timer.h b/libavutil/aarch64/timer.h
index fadc9568f8..966f17081a 100644
--- a/libavutil/aarch64/timer.h
+++ b/libavutil/aarch64/timer.h
@@ -33,7 +33,16 @@ static inline uint64_t read_time(void)
     uint64_t cycle_counter;
     __asm__ volatile(
         "isb                   \t\n"
+#if defined(__ANDROID__)
+        // cntvct_el0 has lower resolution than pmccntr_el0, but is usually
+        // accessible from user space by default.
+        "mrs %0, cntvct_el0        "
+#else
+        // pmccntr_el0 has higher resolution, but is usually not accessible
+        // from user space by default (but access can be enabled with a custom
+        // kernel module).
         "mrs %0, pmccntr_el0       "
+#endif
         : "=r"(cycle_counter) :: "memory" );
 
     return cycle_counter;
-- 
2.39.3 (Apple Git-146)

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android
  2024-06-07  9:12 [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android Martin Storsjö
@ 2024-06-07  9:27 ` Rémi Denis-Courmont
  2024-06-07 10:00   ` Martin Storsjö
  2024-06-13 12:54 ` Martin Storsjö
  1 sibling, 1 reply; 6+ messages in thread
From: Rémi Denis-Courmont @ 2024-06-07  9:27 UTC (permalink / raw)
  To: FFmpeg development discussions and patches



Le 7 juin 2024 12:12:45 GMT+03:00, "Martin Storsjö" <martin@martin.st> a écrit :
>The default timer register pmccntr_el0 usually requires enabling
>access with e.g. a kernel module.
>---
>cntvct_el0 has significantly better resolution than
>av_gettime_relative (while the unscaled nanosecond output of
>clock_gettime is much higher resolution).
>
>In one tested case, the cntvct_el0 timer has a frequency of 25 MHz
>(readable via the register cntfrq_el0).
>---
> libavutil/aarch64/timer.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
>diff --git a/libavutil/aarch64/timer.h b/libavutil/aarch64/timer.h
>index fadc9568f8..966f17081a 100644
>--- a/libavutil/aarch64/timer.h
>+++ b/libavutil/aarch64/timer.h
>@@ -33,7 +33,16 @@ static inline uint64_t read_time(void)
>     uint64_t cycle_counter;
>     __asm__ volatile(
>         "isb                   \t\n"
>+#if defined(__ANDROID__)
>+        // cntvct_el0 has lower resolution than pmccntr_el0, but is usually
>+        // accessible from user space by default.
>+        "mrs %0, cntvct_el0        "
>+#else
>+        // pmccntr_el0 has higher resolution, but is usually not accessible
>+        // from user space by default (but access can be enabled with a custom
>+        // kernel module).
>         "mrs %0, pmccntr_el0       "
>+#endif

It feels a little newbie-hostile choice to pick something that's broken by default but only on non-Android Linux. Is there at least a runtime warning?

Also technically this function is not specific to checkasm and is supposed to return time, so CNTVCT_EL0 actually seems more semantically correct.

IMO we should have checkasm read both cycle and time counters always and print both values but that's obviously beyond the scope here.

>         : "=r"(cycle_counter) :: "memory" );
> 
>     return cycle_counter;
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android
  2024-06-07  9:27 ` Rémi Denis-Courmont
@ 2024-06-07 10:00   ` Martin Storsjö
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Storsjö @ 2024-06-07 10:00 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Fri, 7 Jun 2024, Rémi Denis-Courmont wrote:

> Le 7 juin 2024 12:12:45 GMT+03:00, "Martin Storsjö" <martin@martin.st> a écrit :
>> The default timer register pmccntr_el0 usually requires enabling
>> access with e.g. a kernel module.
>> ---
>> cntvct_el0 has significantly better resolution than
>> av_gettime_relative (while the unscaled nanosecond output of
>> clock_gettime is much higher resolution).
>>
>> In one tested case, the cntvct_el0 timer has a frequency of 25 MHz
>> (readable via the register cntfrq_el0).
>> ---
>> libavutil/aarch64/timer.h | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/libavutil/aarch64/timer.h b/libavutil/aarch64/timer.h
>> index fadc9568f8..966f17081a 100644
>> --- a/libavutil/aarch64/timer.h
>> +++ b/libavutil/aarch64/timer.h
>> @@ -33,7 +33,16 @@ static inline uint64_t read_time(void)
>>     uint64_t cycle_counter;
>>     __asm__ volatile(
>>         "isb                   \t\n"
>> +#if defined(__ANDROID__)
>> +        // cntvct_el0 has lower resolution than pmccntr_el0, but is usually
>> +        // accessible from user space by default.
>> +        "mrs %0, cntvct_el0        "
>> +#else
>> +        // pmccntr_el0 has higher resolution, but is usually not accessible
>> +        // from user space by default (but access can be enabled with a custom
>> +        // kernel module).
>>         "mrs %0, pmccntr_el0       "
>> +#endif
>
> It feels a little newbie-hostile choice to pick something that's broken 
> by default but only on non-Android Linux.

Actually, on aarch64 Windows (both native Windows, and Linux binaries in 
WSL), pmccntr_el0 is accessible in user mode (and this is a documented 
feature, not an accident). Hence the "usually" in the comments.

But yes, the default is a bit hostile in that sense, as it's usually not 
usable as such on Linux.

This patch is at least a small baby step in a different direction; more 
configurability probably would be good, but I didn't want to take on 
fixing that here. But for those actually working on assembly, pmccntr_el0 
still is the best and usually worth the hassle - so we want to get rid of 
it entirely.

> Is there at least a runtime warning?

Since ac40c3bb07781e72f3eb1e30ea450019cc1f6302 we do have a runtime test 
in checkasm, if --bench is passed, giving the user a somewhat generic 
message, rather than crashing. (For other code using START/STOP_TIMER, 
there's no runtime check/warning though.)

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android
  2024-06-07  9:12 [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android Martin Storsjö
  2024-06-07  9:27 ` Rémi Denis-Courmont
@ 2024-06-13 12:54 ` Martin Storsjö
  2024-06-14  9:11   ` Zhao Zhili
  1 sibling, 1 reply; 6+ messages in thread
From: Martin Storsjö @ 2024-06-13 12:54 UTC (permalink / raw)
  To: ffmpeg-devel; +Cc: Zhao Zhili

On Fri, 7 Jun 2024, Martin Storsjö wrote:

> The default timer register pmccntr_el0 usually requires enabling
> access with e.g. a kernel module.
> ---
> cntvct_el0 has significantly better resolution than
> av_gettime_relative (while the unscaled nanosecond output of
> clock_gettime is much higher resolution).
>
> In one tested case, the cntvct_el0 timer has a frequency of 25 MHz
> (readable via the register cntfrq_el0).
> ---
> libavutil/aarch64/timer.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/libavutil/aarch64/timer.h b/libavutil/aarch64/timer.h
> index fadc9568f8..966f17081a 100644
> --- a/libavutil/aarch64/timer.h
> +++ b/libavutil/aarch64/timer.h
> @@ -33,7 +33,16 @@ static inline uint64_t read_time(void)
>     uint64_t cycle_counter;
>     __asm__ volatile(
>         "isb                   \t\n"
> +#if defined(__ANDROID__)
> +        // cntvct_el0 has lower resolution than pmccntr_el0, but is usually
> +        // accessible from user space by default.
> +        "mrs %0, cntvct_el0        "
> +#else
> +        // pmccntr_el0 has higher resolution, but is usually not accessible
> +        // from user space by default (but access can be enabled with a custom
> +        // kernel module).
>         "mrs %0, pmccntr_el0       "
> +#endif
>         : "=r"(cycle_counter) :: "memory" );

Zhao, does this implementation seem useful to you? Does it give you better 
(more accurate, less noisy?) benchmarking numbers on Android, than the 
fallback based on clock_gettime?

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android
  2024-06-13 12:54 ` Martin Storsjö
@ 2024-06-14  9:11   ` Zhao Zhili
  2024-06-14 10:58     ` Martin Storsjö
  0 siblings, 1 reply; 6+ messages in thread
From: Zhao Zhili @ 2024-06-14  9:11 UTC (permalink / raw)
  To: FFmpeg development discussions and patches



> On Jun 13, 2024, at 20:54, Martin Storsjö <martin@martin.st> wrote:
> 
> On Fri, 7 Jun 2024, Martin Storsjö wrote:
> 
>> The default timer register pmccntr_el0 usually requires enabling
>> access with e.g. a kernel module.
>> ---
>> cntvct_el0 has significantly better resolution than
>> av_gettime_relative (while the unscaled nanosecond output of
>> clock_gettime is much higher resolution).
>> 
>> In one tested case, the cntvct_el0 timer has a frequency of 25 MHz
>> (readable via the register cntfrq_el0).
>> ---
>> libavutil/aarch64/timer.h | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>> 
>> diff --git a/libavutil/aarch64/timer.h b/libavutil/aarch64/timer.h
>> index fadc9568f8..966f17081a 100644
>> --- a/libavutil/aarch64/timer.h
>> +++ b/libavutil/aarch64/timer.h
>> @@ -33,7 +33,16 @@ static inline uint64_t read_time(void)
>>    uint64_t cycle_counter;
>>    __asm__ volatile(
>>        "isb                   \t\n"
>> +#if defined(__ANDROID__)
>> +        // cntvct_el0 has lower resolution than pmccntr_el0, but is usually
>> +        // accessible from user space by default.
>> +        "mrs %0, cntvct_el0        "
>> +#else
>> +        // pmccntr_el0 has higher resolution, but is usually not accessible
>> +        // from user space by default (but access can be enabled with a custom
>> +        // kernel module).
>>        "mrs %0, pmccntr_el0       "
>> +#endif
>>        : "=r"(cycle_counter) :: "memory" );
> 
> Zhao, does this implementation seem useful to you? Does it give you better (more accurate, less noisy?) benchmarking numbers on Android, than the fallback based on clock_gettime?

Hi Martin, this works on Android and macOS both, so maybe you can enable it for macOS too.

I have compared the result of this implementation and mach_absolute_time, this looks like
the implementation has smaller variable Deviation than mach_absolute_time. I guess the result
is the same when compared to clock_gettime.

We have linux perf on Android, and kperf on macOS. Linux perf has the benefit to reduce interference
from other processes on statistical results, if I understand correctly. I’m not sure about the benefit of
macOS kperf.l

> 
> // Martin
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org <mailto:ffmpeg-devel@ffmpeg.org>
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org <mailto:ffmpeg-devel-request@ffmpeg.org> with subject "unsubscribe".

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android
  2024-06-14  9:11   ` Zhao Zhili
@ 2024-06-14 10:58     ` Martin Storsjö
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Storsjö @ 2024-06-14 10:58 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Fri, 14 Jun 2024, Zhao Zhili wrote:

>
>
>> On Jun 13, 2024, at 20:54, Martin Storsjö <martin@martin.st> wrote:
>> 
>> On Fri, 7 Jun 2024, Martin Storsjö wrote:
>> 
>>> The default timer register pmccntr_el0 usually requires enabling
>>> access with e.g. a kernel module.
>>> ---
>>> cntvct_el0 has significantly better resolution than
>>> av_gettime_relative (while the unscaled nanosecond output of
>>> clock_gettime is much higher resolution).
>>> 
>>> In one tested case, the cntvct_el0 timer has a frequency of 25 MHz
>>> (readable via the register cntfrq_el0).
>>> ---
>>> libavutil/aarch64/timer.h | 9 +++++++++
>>> 1 file changed, 9 insertions(+)
>>> 
>>> diff --git a/libavutil/aarch64/timer.h b/libavutil/aarch64/timer.h
>>> index fadc9568f8..966f17081a 100644
>>> --- a/libavutil/aarch64/timer.h
>>> +++ b/libavutil/aarch64/timer.h
>>> @@ -33,7 +33,16 @@ static inline uint64_t read_time(void)
>>>    uint64_t cycle_counter;
>>>    __asm__ volatile(
>>>        "isb                   \t\n"
>>> +#if defined(__ANDROID__)
>>> +        // cntvct_el0 has lower resolution than pmccntr_el0, but is usually
>>> +        // accessible from user space by default.
>>> +        "mrs %0, cntvct_el0        "
>>> +#else
>>> +        // pmccntr_el0 has higher resolution, but is usually not accessible
>>> +        // from user space by default (but access can be enabled with a custom
>>> +        // kernel module).
>>>        "mrs %0, pmccntr_el0       "
>>> +#endif
>>>        : "=r"(cycle_counter) :: "memory" );
>> 
>> Zhao, does this implementation seem useful to you? Does it give you better (more accurate, less noisy?) benchmarking numbers on Android, than the fallback based on clock_gettime?
>
> Hi Martin, this works on Android and macOS both, so maybe you can enable it for macOS too.
>
> I have compared the result of this implementation and 
> mach_absolute_time, this looks like the implementation has smaller 
> variable Deviation than mach_absolute_time. I guess the result is the 
> same when compared to clock_gettime.

Right, it does seem to use the same scale as mach_absolute_time - but it 
probably has less overhead when we can fetch it by just reading a 
register, instead of calling out to a system function.

So then I guess I could extend this patch to enable it for 
defined(__APPLE__) too.

> We have linux perf on Android, and kperf on macOS. Linux perf has the 
> benefit to reduce interference from other processes on statistical 
> results, if I understand correctly.

Yes, possibly, but on the other hand, it also has a bit more noise and 
overhead over just using pmccntr_el0; if e.g. tuning and comparing small 
differences in functions, pmccntr_el0 usually gives the best result.

But anyway, as those are configurable, users building with linux perf will 
get that, and users disabling it will get the more accurate register 
instead.

> I’m not sure about the benefit of macOS kperf.

macOS kperf gives the best and most accurate numbers you can get, on that 
HW, but unfortunately, it's undocumented and unofficial (and requires 
running with sudo). It does give numbers comparable to linux perf, I 
think, i.e. proper clock cycle level numbers.

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-14 10:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-07  9:12 [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android Martin Storsjö
2024-06-07  9:27 ` Rémi Denis-Courmont
2024-06-07 10:00   ` Martin Storsjö
2024-06-13 12:54 ` Martin Storsjö
2024-06-14  9:11   ` Zhao Zhili
2024-06-14 10:58     ` Martin Storsjö

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git