Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] [PATCH] tx_float_neon: Do not access outside stack.
@ 2022-10-09 13:14 Reimar Döffinger
  2022-10-09 14:11 ` Rémi Denis-Courmont
  0 siblings, 1 reply; 4+ messages in thread
From: Reimar Döffinger @ 2022-10-09 13:14 UTC (permalink / raw)
  To: ffmpeg-devel

Use load/store instructions that modify sp to save
registers to stack, like it is done for all other
functions.
At least valgrind complains about the current code.
---
 libavutil/aarch64/tx_float_neon.S | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/libavutil/aarch64/tx_float_neon.S b/libavutil/aarch64/tx_float_neon.S
index 4126c3b812..4be93cc963 100644
--- a/libavutil/aarch64/tx_float_neon.S
+++ b/libavutil/aarch64/tx_float_neon.S
@@ -866,10 +866,10 @@ FFT16_FN ns_float, 1
 
 .macro FFT32_FN name, no_perm
 function ff_tx_fft32_\name\()_neon, export=1
-        stp             d8,  d9,  [sp, #-16]
-        stp             d10, d11, [sp, #-32]
-        stp             d12, d13, [sp, #-48]
-        stp             d14, d15, [sp, #-64]
+        stp             d8,  d9,  [sp, #-16]!
+        stp             d10, d11, [sp, #-16]!
+        stp             d12, d13, [sp, #-16]!
+        stp             d14, d15, [sp, #-16]!
 
         LOAD_SUBADD
         SETUP_SR_RECOMB 32, x7, x8, x9
@@ -911,10 +911,10 @@ function ff_tx_fft32_\name\()_neon, export=1
         zip2            v31.2d, v11.2d, v15.2d
         st1             { v28.4s, v29.4s, v30.4s, v31.4s }, [x1]
 
-        ldp             d14, d15, [sp, #-64]
-        ldp             d12, d13, [sp, #-48]
-        ldp             d10, d11, [sp, #-32]
-        ldp             d8,  d9,  [sp, #-16]
+        ldp             d14, d15, [sp], #16
+        ldp             d12, d13, [sp], #16
+        ldp             d10, d11, [sp], #16
+        ldp             d8,  d9,  [sp], #16
 
         ret
 endfunc
-- 
2.37.2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [FFmpeg-devel] [PATCH] tx_float_neon: Do not access outside stack.
  2022-10-09 13:14 [FFmpeg-devel] [PATCH] tx_float_neon: Do not access outside stack Reimar Döffinger
@ 2022-10-09 14:11 ` Rémi Denis-Courmont
  2022-10-09 16:36   ` Reimar Döffinger
  0 siblings, 1 reply; 4+ messages in thread
From: Rémi Denis-Courmont @ 2022-10-09 14:11 UTC (permalink / raw)
  To: ffmpeg-devel

Le sunnuntaina 9. lokakuuta 2022, 16.14.47 EEST Reimar Döffinger a écrit :
> Use load/store instructions that modify sp to save
> registers to stack, like it is done for all other
> functions.
> At least valgrind complains about the current code.
> ---
>  libavutil/aarch64/tx_float_neon.S | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/libavutil/aarch64/tx_float_neon.S
> b/libavutil/aarch64/tx_float_neon.S index 4126c3b812..4be93cc963 100644
> --- a/libavutil/aarch64/tx_float_neon.S
> +++ b/libavutil/aarch64/tx_float_neon.S
> @@ -866,10 +866,10 @@ FFT16_FN ns_float, 1
> 
>  .macro FFT32_FN name, no_perm
>  function ff_tx_fft32_\name\()_neon, export=1
> -        stp             d8,  d9,  [sp, #-16]
> -        stp             d10, d11, [sp, #-32]
> -        stp             d12, d13, [sp, #-48]
> -        stp             d14, d15, [sp, #-64]
> +        stp             d8,  d9,  [sp, #-16]!
> +        stp             d10, d11, [sp, #-16]!
> +        stp             d12, d13, [sp, #-16]!
> +        stp             d14, d15, [sp, #-16]!

While this fixes the ABI violation, it introduces multiple data dependencies on 
stack pointer due to write-back.

The idiomatic way to do this is to allocate the entire needed stack space in 
the first store / last load, and use positive offsets elsewhence.

> 
>          LOAD_SUBADD
>          SETUP_SR_RECOMB 32, x7, x8, x9
> @@ -911,10 +911,10 @@ function ff_tx_fft32_\name\()_neon, export=1
>          zip2            v31.2d, v11.2d, v15.2d
>          st1             { v28.4s, v29.4s, v30.4s, v31.4s }, [x1]
> 
> -        ldp             d14, d15, [sp, #-64]
> -        ldp             d12, d13, [sp, #-48]
> -        ldp             d10, d11, [sp, #-32]
> -        ldp             d8,  d9,  [sp, #-16]
> +        ldp             d14, d15, [sp], #16
> +        ldp             d12, d13, [sp], #16
> +        ldp             d10, d11, [sp], #16
> +        ldp             d8,  d9,  [sp], #16
> 
>          ret
>  endfunc


-- 
Rémi Denis-Courmont
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [FFmpeg-devel] [PATCH] tx_float_neon: Do not access outside stack.
  2022-10-09 14:11 ` Rémi Denis-Courmont
@ 2022-10-09 16:36   ` Reimar Döffinger
  2022-10-09 16:54     ` Rémi Denis-Courmont
  0 siblings, 1 reply; 4+ messages in thread
From: Reimar Döffinger @ 2022-10-09 16:36 UTC (permalink / raw)
  To: FFmpeg development discussions and patches



> On 9 Oct 2022, at 16:11, Rémi Denis-Courmont <remi@remlab.net> wrote:
> 
> Le sunnuntaina 9. lokakuuta 2022, 16.14.47 EEST Reimar Döffinger a écrit :
>> Use load/store instructions that modify sp to save
>> registers to stack, like it is done for all other
>> functions.
>> At least valgrind complains about the current code.
>> ---
>> libavutil/aarch64/tx_float_neon.S | 16 ++++++++--------
>> 1 file changed, 8 insertions(+), 8 deletions(-)
>> 
>> diff --git a/libavutil/aarch64/tx_float_neon.S
>> b/libavutil/aarch64/tx_float_neon.S index 4126c3b812..4be93cc963 100644
>> --- a/libavutil/aarch64/tx_float_neon.S
>> +++ b/libavutil/aarch64/tx_float_neon.S
>> @@ -866,10 +866,10 @@ FFT16_FN ns_float, 1
>> 
>> .macro FFT32_FN name, no_perm
>> function ff_tx_fft32_\name\()_neon, export=1
>> -        stp             d8,  d9,  [sp, #-16]
>> -        stp             d10, d11, [sp, #-32]
>> -        stp             d12, d13, [sp, #-48]
>> -        stp             d14, d15, [sp, #-64]
>> +        stp             d8,  d9,  [sp, #-16]!
>> +        stp             d10, d11, [sp, #-16]!
>> +        stp             d12, d13, [sp, #-16]!
>> +        stp             d14, d15, [sp, #-16]!
> 
> While this fixes the ABI violation, it introduces multiple data dependencies on 
> stack pointer due to write-back.

That is true in principle, this is not done consistently at all.

> The idiomatic way to do this is to allocate the entire needed stack space in 
> the first store / last load, and use positive offsets elsewhence.

Are you sure this is really relevant at all, considering it's so rarely done in the code?
I'm fine looking into doing that consistently if that's what it takes, but I don't think it's good to make this the only function in the file (if not the whole lib) that does it this way.
If we want the idiomatic way to be used, we should
update all the code so people get the right idea.
(I'll note that some places might be tricky, there seems to be code that updates sp back and forth in a loop which looks like it might be for no reason, or there is a weird reason - check SR_TRANSFORM_DEF in the same file).

Just to clarify the details, then end result should look like this?
-        stp             d8,  d9,  [sp, #-16]
-        stp             d10, d11, [sp, #-32]
-        stp             d12, d13, [sp, #-48]
-        stp             d14, d15, [sp, #-64]
+        stp             d14, d15, [sp, #-16*4]!
+        stp             d8,  d9,  [sp, #16*3]
+        stp             d10, d11, [sp, #16*2]
+        stp             d12, d13, [sp, #16]

Or something slightly different?

Best regards,
Reimar
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [FFmpeg-devel] [PATCH] tx_float_neon: Do not access outside stack.
  2022-10-09 16:36   ` Reimar Döffinger
@ 2022-10-09 16:54     ` Rémi Denis-Courmont
  0 siblings, 0 replies; 4+ messages in thread
From: Rémi Denis-Courmont @ 2022-10-09 16:54 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

Le sunnuntaina 9. lokakuuta 2022, 19.36.24 EEST Reimar Döffinger a écrit :
> > While this fixes the ABI violation, it introduces multiple data
> > dependencies on stack pointer due to write-back.
> 
> That is true in principle, this is not done consistently at all.

I have not checked the FFmpeg code base, but this *is* done consistently by 
compilers and in a number of other assembler-heavy project, notably the Linux 
kernel (see especially arch/arm64/crypto/*.S).

> > The idiomatic way to do this is to allocate the entire needed stack space
> > in the first store / last load, and use positive offsets elsewhence.
> 
> Are you sure this is really relevant at all, considering it's so rarely done
> in the code?

I don't know what you base that statement on. I have rarely seen it not done 
that way. And even then, it mostly occurs inside loops, where it cannot fully 
be avoided and where there will be a couple of other instructions that don't 
suffer the data dependency.

> Just to clarify the details, then end result should look like this?
> -        stp             d8,  d9,  [sp, #-16]
> -        stp             d10, d11, [sp, #-32]
> -        stp             d12, d13, [sp, #-48]
> -        stp             d14, d15, [sp, #-64]
> +        stp             d14, d15, [sp, #-16*4]!
> +        stp             d8,  d9,  [sp, #16*3]
> +        stp             d10, d11, [sp, #16*2]
> +        stp             d12, d13, [sp, #16]

Yes.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-09 16:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-09 13:14 [FFmpeg-devel] [PATCH] tx_float_neon: Do not access outside stack Reimar Döffinger
2022-10-09 14:11 ` Rémi Denis-Courmont
2022-10-09 16:36   ` Reimar Döffinger
2022-10-09 16:54     ` Rémi Denis-Courmont

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git