* [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC
@ 2022-07-10 16:08 Triang3l
2022-07-10 22:17 ` Martin Storsjö
0 siblings, 1 reply; 6+ messages in thread
From: Triang3l @ 2022-07-10 16:08 UTC (permalink / raw)
To: ffmpeg-devel
Android, starting from version 6 (API level 23, from the year 2015),
requires all shared libraries to be position-independent, and refuses to
load shared libraries (which are the most common native binary type on
Android as in Android applications, native code is placed in shared
libraries accessed via the Java Native Interface) containing dynamic
relocations.
To support PIC, all AArch64 assembly code in FFmpeg uses the `movrel`
macro to obtain addresses of labels, such as lookup table constants, in
a way that with CONFIG_PIC being 1, PC-relative addresses of labels are
computed via the `adrp` instruction. This approach, however, is suitable
only for labels defined in the same object file. For `adrp` to work
directly between object files, the linker has to perform a text
relocation. And in my scenario (libavcodec being a static library linked
to a shared library, though I'm not sure if this is relevant), this
resulted in the following LLVM linker errors, making my application not
buildable for Android:
"relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used against symbol
ff_cos_32; recompile with -fPIC"
"can't create dynamic relocation R_AARCH64_ADD_ABS_LO12_NC against
symbol: ff_cos_32 in readonly segment; recompile object files with -fPIC
or pass '-Wl,-z,notext' to allow text relocations in the output"
This commit brings the solution that is already implemented in FFmpeg on
AArch32 to AArch64 - a separate macro, `movrelx`, which emits
instructions for computing the address of an external label through the
Global Object Table (GOT).
The same targets as `movrel` is implemented for are covered by this
commit. For Apple targets, the instruction sequence is the one that is
generated for referencing an extern variable in C code by Clang for the
`aarch64-apple-darwin` target triple. For other targets (Linux), this is
the sequence emitted by Clang for the `aarch64-unknown-linux-gnu`
target, by GCC, and specified in the "Assembly expressions" section of
the Arm Compiler Reference Guide. The hardware-assisted AddressSanitizer
has no effect on the sequence - Clang generates the same `:got:` and
`:got_lo12:` addressing regardless of whether `-fsanitize=hwaddress` is
used. Windows has no concept of PIC, and Windows builds should be done
with CONFIG_PIC being 0, so it doesn't need to be handled.
The literal offset, however, is always applied using a separate `add` or
`sub` instruction as the actual address is loaded indirectly for an
extern object that is the whole lookup table itself. The only place
where the offset is currently used with `movrelx` is VP9, with the
offset being 256 or 512 bytes there. Unfortunately, that offset can't be
moved to the positive immediate offset encoded in load/store
instructions there without major restructuring, as the actual memory
accesses are performed in a function that is common to different offset
values, with the offset being pre-applied to one of its arguments
instead. Without PIC though, `movrelx` is implemented exactly the same
as `movrel`, with the offset applied directly to the `ldr` literal, so
the non-PIC path is unaffected by this change.
Testing was performed on my local build setup for Android based on
ndk-build. Two things were tested:
- Regression testing was done using the `libavcodec/tests/fft.c` test.
`libavcodec` was built as a static library, and the test was built as a
native executable (which, unlike shared libraries, isn't required to be
position-independent). Both the executable without the changes and the
executable with the new code were launched on a physical AArch64 device
using Termux. As the length of the instruction sequences for `movrel`
and `movrelx` without the offset is the same, comparing the two binaries
in a diff tool has shown the expected 13 differences in the code - 12 in
`fftN_neon` for different transform sizes, and 1 in `ff_fft_calc_neon`.
The results for the FFT test were the same for both executables with
different transform size values.
- To check sufficiency and suitability for fixing the original issue,
the `fft.c` test was converted into a shared library (with the `main`
function renamed), and a proxy executable performing `dlopen` of the
library and invoking the main test function from it via `dlsym`. Termux
is built with `targetSdkVersion` 28, so the `dlopen` rule of Android API
levels 23 and above disallowing dynamic relocations should apply to it.
The testing device is running Android 11 (API level 30). The test was
executed successfully, meaning that no relocations incompatible with PIC
are required by libavcodec anymore.
Signed-off-by: Triang3l <triang3l@yandex.ru>
---
libavcodec/aarch64/fft_neon.S | 4 ++--
libavcodec/aarch64/sbrdsp_neon.S | 2 +-
libavcodec/aarch64/vp9mc_16bpp_neon.S | 4 ++--
libavcodec/aarch64/vp9mc_neon.S | 4 ++--
libavutil/aarch64/asm.S | 19 +++++++++++++++++++
5 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/libavcodec/aarch64/fft_neon.S b/libavcodec/aarch64/fft_neon.S
index 9ff3f9c526..d52f14626b 100644
--- a/libavcodec/aarch64/fft_neon.S
+++ b/libavcodec/aarch64/fft_neon.S
@@ -353,7 +353,7 @@ function fft\n\()_neon, align=6
sub x0, x28, #\n4*2*8
ldp x28, x30, [sp], #16
AARCH64_VALIDATE_LINK_REGISTER
- movrel x4, X(ff_cos_\n)
+ movrelx x4, X(ff_cos_\n)
mov x2, #\n4>>1
b fft_pass_neon
endfunc
@@ -384,7 +384,7 @@ function ff_fft_calc_neon, export=1
movrel x12, pmmp
ldr x3, [x3, x2, lsl #3]
movrel x13, mppm
- movrel x14, X(ff_cos_16)
+ movrelx x14, X(ff_cos_16)
ld1 {v31.16b}, [x11]
mov x0, x1
ld1 {v29.4s}, [x12] // pmmp
diff --git a/libavcodec/aarch64/sbrdsp_neon.S
b/libavcodec/aarch64/sbrdsp_neon.S
index d23717e760..bbbc3811bd 100644
--- a/libavcodec/aarch64/sbrdsp_neon.S
+++ b/libavcodec/aarch64/sbrdsp_neon.S
@@ -273,7 +273,7 @@ endfunc
.macro apply_noise_common
sxtw x3, w3
sxtw x5, w5
- movrel x7, X(ff_sbr_noise_table)
+ movrelx x7, X(ff_sbr_noise_table)
add x3, x3, #1
1: and x3, x3, #0x1ff
add x8, x7, x3, lsl #3
diff --git a/libavcodec/aarch64/vp9mc_16bpp_neon.S
b/libavcodec/aarch64/vp9mc_16bpp_neon.S
index 53b372c262..ed472eb144 100644
--- a/libavcodec/aarch64/vp9mc_16bpp_neon.S
+++ b/libavcodec/aarch64/vp9mc_16bpp_neon.S
@@ -287,7 +287,7 @@ do_8tap_h_size 16
.macro do_8tap_h_func type, filter, offset, size, bpp
function ff_vp9_\type\()_\filter\()\size\()_h_\bpp\()_neon, export=1
mvni v31.8h, #((0xff << (\bpp - 8)) & 0xff), lsl #8
- movrel x6, X(ff_vp9_subpel_filters), 256*\offset
+ movrelx x6, X(ff_vp9_subpel_filters), 256*\offset
cmp w5, #8
add x9, x6, w5, uxtw #4
mov x5, #2*\size
@@ -574,7 +574,7 @@ do_8tap_4v avg
function ff_vp9_\type\()_\filter\()\size\()_v_\bpp\()_neon, export=1
uxtw x4, w4
mvni v1.8h, #((0xff << (\bpp - 8)) & 0xff), lsl #8
- movrel x5, X(ff_vp9_subpel_filters), 256*\offset
+ movrelx x5, X(ff_vp9_subpel_filters), 256*\offset
add x6, x5, w6, uxtw #4
mov x5, #\size
.if \size >= 8
diff --git a/libavcodec/aarch64/vp9mc_neon.S
b/libavcodec/aarch64/vp9mc_neon.S
index abf2bae9db..48460ae485 100644
--- a/libavcodec/aarch64/vp9mc_neon.S
+++ b/libavcodec/aarch64/vp9mc_neon.S
@@ -353,7 +353,7 @@ do_8tap_h_size 16
.macro do_8tap_h_func type, filter, offset, size
function ff_vp9_\type\()_\filter\()\size\()_h_neon, export=1
- movrel x6, X(ff_vp9_subpel_filters), 256*\offset
+ movrelx x6, X(ff_vp9_subpel_filters), 256*\offset
cmp w5, #8
add x9, x6, w5, uxtw #4
mov x5, #\size
@@ -627,7 +627,7 @@ do_8tap_4v avg, 4, 3
.macro do_8tap_v_func type, filter, offset, size
function ff_vp9_\type\()_\filter\()\size\()_v_neon, export=1
uxtw x4, w4
- movrel x5, X(ff_vp9_subpel_filters), 256*\offset
+ movrelx x5, X(ff_vp9_subpel_filters), 256*\offset
cmp w6, #8
add x6, x5, w6, uxtw #4
mov x5, #\size
diff --git a/libavutil/aarch64/asm.S b/libavutil/aarch64/asm.S
index a7782415d7..a8207bf38a 100644
--- a/libavutil/aarch64/asm.S
+++ b/libavutil/aarch64/asm.S
@@ -229,6 +229,25 @@ ELF .size \name, . - \name
#endif
.endm
+.macro movrelx rd, val, offset=0
+#if CONFIG_PIC
+#if defined(__APPLE__)
+ adrp \rd, \val@GOTPAGE
+ ldr \rd, [\rd, \val@GOTPAGEOFF]
+#else
+ adrp \rd, :got:\val
+ ldr \rd, [\rd, :got_lo12:\val]
+#endif
+ .if \offset > 0
+ add \rd, \rd, \offset
+ .elseif \offset < 0
+ sub \rd, \rd, -(\offset)
+ .endif
+#else
+ ldr \rd, =\val+\offset
+#endif
+.endm
+
#define GLUE(a, b) a ## b
#define JOIN(a, b) GLUE(a, b)
#define X(s) JOIN(EXTERN_ASM, s)
--
2.17.0.windows.1
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC
2022-07-10 16:08 [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC Triang3l
@ 2022-07-10 22:17 ` Martin Storsjö
2022-07-10 22:28 ` Martin Storsjö
0 siblings, 1 reply; 6+ messages in thread
From: Martin Storsjö @ 2022-07-10 22:17 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Hi,
Thanks for your patch! While the patch probably is worthwhile to pursue,
ffmpeg does work on Android 6 and newer (with the aarch64 assembly), so
there's some gaps/misses in the reasoning in the patch description.
On Sun, 10 Jul 2022, Triang3l wrote:
> To support PIC, all AArch64 assembly code in FFmpeg uses the `movrel`
> macro to obtain addresses of labels, such as lookup table constants, in
> a way that with CONFIG_PIC being 1, PC-relative addresses of labels are
> computed via the `adrp` instruction.
> This approach, however, is suitable only for labels defined in the same
> object file. For `adrp` to work directly between object files, the
> linker has to perform a text relocation.
No, it doesn't. It's fully possible to build such libraries without text
relocations currently.
My memory of the situation was that we're linking our shared libraries
with -Wl,-Bsymbolic, which means that those symbol references are bound at
link time, so the offset from adrp to the target symbol is fixed at link
time, and no text relocation is needed.
However I did try to link such shared libraries, removing the
-Wl,-Bsymbolic argument, and it still succeeds. So I'm a bit unsure what
really makes it work for me when it isn't working for you.
With Android NDK r24, I configured a build with "-target-os=android
--arch=aarch64 --cc=aarch64-linux-android32-clang --enable-cross-compile
--enable-shared", and successfully build that. The built libavcodec.so.59
does not have text relocations. Can you reproduce and confirm this bit?
> And in my scenario (libavcodec being a static library linked
> to a shared library, though I'm not sure if this is relevant),
I think this might be quite relevant. Does adding -Wl,-Bsymbolic to the
linker invocation when linking in the static libavcodec into your shared
library fix the issue?
(I'm not necessarily arguing that we shouldn't fix this issue - but I want
to know exactly what differs in your setup and what detail exactly makes
it work in our current default builds which is missing in yours.)
> Windows has no concept of PIC, and Windows builds should be done
> with CONFIG_PIC being 0, so it doesn't need to be handled.
While Windows doesn't really have proper PIC, common builds of ffmpeg on
Windows on aarch64 do end up with CONFIG_PIC set to 1, so please do handle
that in the movrelx macro too, expanding to the same as the current movrel
macro for those cases.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC
2022-07-10 22:17 ` Martin Storsjö
@ 2022-07-10 22:28 ` Martin Storsjö
2022-07-11 0:25 ` Triang3l
0 siblings, 1 reply; 6+ messages in thread
From: Martin Storsjö @ 2022-07-10 22:28 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Mon, 11 Jul 2022, Martin Storsjö wrote:
> Hi,
>
> Thanks for your patch! While the patch probably is worthwhile to pursue,
> ffmpeg does work on Android 6 and newer (with the aarch64 assembly), so
> there's some gaps/misses in the reasoning in the patch description.
>
> On Sun, 10 Jul 2022, Triang3l wrote:
>
>> To support PIC, all AArch64 assembly code in FFmpeg uses the `movrel` macro
>> to obtain addresses of labels, such as lookup table constants, in a way
>> that with CONFIG_PIC being 1, PC-relative addresses of labels are computed
>> via the `adrp` instruction.
>
>> This approach, however, is suitable only for labels defined in the same
>> object file. For `adrp` to work directly between object files, the linker
>> has to perform a text relocation.
>
> No, it doesn't. It's fully possible to build such libraries without text
> relocations currently.
>
> My memory of the situation was that we're linking our shared libraries with
> -Wl,-Bsymbolic, which means that those symbol references are bound at link
> time, so the offset from adrp to the target symbol is fixed at link time, and
> no text relocation is needed.
>
> However I did try to link such shared libraries, removing the -Wl,-Bsymbolic
> argument, and it still succeeds. So I'm a bit unsure what really makes it
> work for me when it isn't working for you.
Andreas Rheinhardt kindly reminded me that when linking libavcodec.so, we
add -Wl,--version-script,libavcodec/libavcodec.ver, which makes e.g.
ff_cos_32 a non-exported symbol, which means that it can't be interposed,
and thus the relocation can be fixed at link time.
If I remove that argument, I can reproduce the linker errors, and adding
-Wl,-Bsymbolic fixes it.
I wonder if we could mark these symbols as ELF hidden, so that they
wouldn't need to be interposable at all, even when you link the static
library into a separate shared library?
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC
2022-07-10 22:28 ` Martin Storsjö
@ 2022-07-11 0:25 ` Triang3l
2022-07-11 9:00 ` Martin Storsjö
0 siblings, 1 reply; 6+ messages in thread
From: Triang3l @ 2022-07-11 0:25 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Hi Martin, thanks for a quick and detailed review!Oops, Thunderbird (or something else in the chain) has added an extraneous space in the beginning of every line, though the +- deltas seem to be unaffected. I can try sending the patch some other way if needed, like as a binary attachment, or via git send-email, but possibly the "^ " (without quotes) per-line regular expression should be fine for cleaning it up as well. This is my first patch submission to this repository, and overall to a mailing list, so I'm just starting learning the best practices… hi everyone!!!Yes, the original issue was likely a consequence of us using a completely different build process than FFmpeg is designed to be built with. We have our own Premake 5 scripts for FFmpeg components, that are used for generation of Android ndk-build files via my Premake generator, Triang3l/premake-androidndk on GitHub. So yes, we don't have any link options imported from a file.I agree, it was very suprising to see that dynamic relocations were done between object files within a single shared library at all. I suspected ASLR at first, but I don't know if it can randomize the locations of individual object files this way, that's probably not the case though. I'll see what can be done to reproduce the effect of libavcodec.v — since my shared library is the final application logic, not a library that's actually shareable, none of the FFmpeg objects need to be exported from it at all. However, static libraries barely have anything to configure overall as far as I know, so disabling exports specifically for FFmpeg may be complicated — but thankfully, we can (and even should, to reduce the file size) use -fvisibility=hidden globally in our application, if that helps fix the issue. -Wl,-Bsymbolic should be fine too, and possibly even less intrusive.If using __attribute__((visibility("hidden"))) for the lookup tables prevents dynamic relocations from being inserted, and if that doesn't break common usages of libavcodec, yes, it should be a way better solution than introducing an additional memory load at runtime. I'll check if that works for us tomorrow or in a couple of days.If we're able to avoid using the global object table this way though, maybe it could be desirable to also get rid of `movrelx` in the AArch32 code as well?By the way, I'm also super confused by how the offset is applied in the local `movrel` currently, it looks very inconsistent. The `adrp` and `add` combination, as I understand its operation, should work for any 32-bit literal value, not specifically for addresses of known objects — `adrp` accepting the upper 20 bits as a literal and adding them to the PC, and then `add` just adding the lower 12 bits, the offset within the page, also taken as a literal.While `movrelx` most likely requires the offset to be applied explicitly using 0…4095 `add` or `sub` afterwards because it's first loaded from a lookup table of addresses of, if I understand correctly, actual objects in the code (I wouldn't expect it to be filled with object+8, object+16, object+24 pointers in case of a large object for loading time reasons, though I haven't researched this topic in detail), if everything `movrel` does is adding the PC to the input literal… do we even need to emit `sub` for negative offsets in it?The current Linux implementation of `movrel` just adds the offset directly to the label using + regardless of whether it's positive or negative, and there already are instances of negative offsets in the code (`subpel_filters` minus 16 in vp8dsp_neon.S). The AArch32 version of this code also doesn't use an offset parameter, rather, passing `subpel_filters-16` directly as the label argument.However, the Apple AArch64 implementation does have two paths for applying the offset — directly to the literal if it's positive, but via a `sub` instruction for a negative one. This is also true for the Windows implementation — whose existence overall is questionable, as Windows DLLs use a different relocation method, and PIC doesn't apply to them at all if I understand correctly; and also the `movrel` implementation for Windows with a non-negative offset is identical to the Linux version, minus the hardware-assisted ASan path on Linux.I'll likely play with the assembler later to see how negative offsets are interpreted, but still, is there a reason to emit the subtraction instruction that you can remember, or would it be safe to possibly even remove the offset argument completely?For `movrelx`, however, if it's needed at all, the offset argument is desirable as without PIC, it will be applied to the label literal for free.Thank you!— Tri
On Mon, 11 Jul 2022, Martin Storsjö wrote:> Hi,>> Thanks for your patch! While the patch probably is worthwhile to pursue, > ffmpeg does work on Android 6 and newer (with the aarch64 assembly), so > there's some gaps/misses in the reasoning in the patch description.>> On Sun, 10 Jul 2022, Triang3l wrote:>>> To support PIC, all AArch64 assembly code in FFmpeg uses the `movrel` macro >> to obtain addresses of labels, such as lookup table constants, in a way >> that with CONFIG_PIC being 1, PC-relative addresses of labels are computed >> via the `adrp` instruction.>>> This approach, however, is suitable only for labels defined in the same >> object file. For `adrp` to work directly between object files, the linker >> has to perform a text relocation.>> No, it doesn't. It's fully possible to build such libraries without text > relocations currently.>> My memory of the situation was that we're linking our shared libraries with > -Wl,-Bsymbolic, which means that those symbol references are bound at link > time, so the offset from adrp to the target symbol is fixed at link time, and > no text relocation is needed.>> However I did try to link such shared libraries, removing the -Wl,-Bsymbolic > argument, and it still succeeds. So I'm a bit unsure what really makes it > work for me when it isn't working for you.Andreas Rheinhardt kindly reminded me that when linking libavcodec.so, we add -Wl,--version-script,libavcodec/libavcodec.ver, which makes e.g. ff_cos_32 a non-exported symbol, which means that it can't be interposed, and thus the relocation can be fixed at link time.If I remove that argument, I can reproduce the linker errors, and adding -Wl,-Bsymbolic fixes it.I wonder if we could mark these symbols as ELF hidden, so that they wouldn't need to be interposable at all, even when you link the static library into a separate shared library?// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC
2022-07-11 0:25 ` Triang3l
@ 2022-07-11 9:00 ` Martin Storsjö
0 siblings, 0 replies; 6+ messages in thread
From: Martin Storsjö @ 2022-07-11 9:00 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Mon, 11 Jul 2022, Triang3l wrote:
> However, static libraries barely have anything to configure overall as
> far as I know, so disabling exports specifically for FFmpeg may be
> complicated — but thankfully, we can (and even should, to reduce the
> file size) use -fvisibility=hidden globally in our application, if that
> helps fix the issue.
Yes, we could consider if we should build the libraries with
-fvisibility=hidden (maybe as an option), but that's not always
necessarily the best option. In particular we would want to set the
default visibility for e.g. the public API symbols in that case. (Trying
to export such symbols via the version script doesn't help, when they're
explicitly set as hidden originally.)
Note that building your own application code with this option doesn't help
much here; it's the libavcodec object files that would need to be build
that way.
> -Wl,-Bsymbolic should be fine too, and possibly even less
> intrusive.
Yes, that's quite non-intrusive, and should be easy to add as a linker
option in your case.
> If using __attribute__((visibility("hidden"))) for the lookup
> tables prevents dynamic relocations from being inserted, and if that
> doesn't break common usages of libavcodec, yes, it should be a way
> better solution than introducing an additional memory load at runtime.
I did a quick test with that and it seems like it works - I'll post a
patch for that shortly.
> If we're able to avoid using the global object table this way though,
> maybe it could be desirable to also get rid of `movrelx` in the AArch32
> code as well?
I wouldn't start touching that - if I remember correctly, movrelx is
needed there for a bunch of other reasons - due to different relocation
types and addressing modes there.
> By the way, I'm also super confused by how the offset is applied in
> the local `movrel` currently, it looks very inconsistent. The `adrp` and
> `add` combination, as I understand its operation, should work for any
> 32-bit literal value, not specifically for addresses of known objects —
> `adrp` accepting the upper 20 bits as a literal and adding them to the
> PC, and then `add` just adding the lower 12 bits, the offset within the
> page, also taken as a literal.
Trust me, it specifically needs to be like this for a reason.
> if everything `movrel` does is adding the PC to the input literal… do we
> even need to emit `sub` for negative offsets in it?
When the final binary is linked and run, then yes, all the adrp+add pair
does is add a literal to PC.
But before that, when an object file is assembled, the instruction opcodes
can't be finalized with the actual literal value, as the distance from the
adrp/add pair to the targeted symbol only is known at link time.
Therefore, the object file stores relocations that say "fix up this adrp
instruction with the actual offset to 'symbol X + 42 bytes'". For ELF
object files, the object file format and relocations allow a negative
offset, but for MachO and COFF, it doesn't (or it might be limited
accidentally by the tools). In either case; on MachO and COFF we can't
practically express a relocation against "symbol minus some bytes" - so we
produce an extra 'sub' instruction in those cases.
> This is also true for the Windows implementation — whose existence
> overall is questionable, as Windows DLLs use a different relocation
> method, and PIC doesn't apply to them at all if I understand correctly;
While Windows code doesn't do proper strict PIC like on ELF, CONFIG_PIC
does end up set in those configurations (like I already mentioned in the
previous mail), and referencing symbols with adrp+add is generally
preferrable over the non-PIC codepath of "ldr rX, =\val+\offset".
The latter will always store an absolute address in the constant island
produced by the ldr pseudo instruction, and storing an absolute address
emits a so called "base relocation" into the linked PE-COFF DLL. When a
DLL is loaded at a non-default address, the loader will need to fix those
up - essentially the same as text relocations on ELF. When using adrp+add
on PE-COFF, no such base relocations are needed.
So while PE-COFF doesn't have true strict PIC, in practice you need very
few base relocations on AArch64 - but if we'd skip the adrp+add
instructions and use the non-PIC codepath of ldr as you suggest, we'd have
much more base relocations.
> is there a reason to emit the subtraction instruction that you can
> remember,
Yes, there is a reason.
> or would it be safe to possibly even remove the offset argument
> completely?
No, it's not safe to remove that, it's clearly there for a reason.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC
@ 2022-07-11 0:31 Triang3l
0 siblings, 0 replies; 6+ messages in thread
From: Triang3l @ 2022-07-11 0:31 UTC (permalink / raw)
To: FFmpeg development discussions and patches
oh nooooo… I have no idea what's happening, all the newlines in the previous email were destroyed, sorry ugggh… let's try without disabling rich text in the Samsung client this time :/Hi Martin, thanks for a quick and detailed review!Oops, Thunderbird (or something else in the chain) has added an extraneous space in the beginning of every line, though the +- deltas seem to be unaffected. I can try sending the patch some other way if needed, like as a binary attachment, or via git send-email, but possibly the "^ " (without quotes) per-line regular expression should be fine for cleaning it up as well. This is my first patch submission to this repository, and overall to a mailing list, so I'm just starting learning the best practices… hi everyone!!!Yes, the original issue was likely a consequence of us using a completely different build process than FFmpeg is designed to be built with. We have our own Premake 5 scripts for FFmpeg components, that are used for generation of Android ndk-build files via my Premake generator, Triang3l/premake-androidndk on GitHub. So yes, we don't have any link options imported from a file.I agree, it was very suprising to see that dynamic relocations were done between object files within a single shared library at all. I suspected ASLR at first, but I don't know if it can randomize the locations of individual object files this way, that's probably not the case though. I'll see what can be done to reproduce the effect of libavcodec.v — since my shared library is the final application logic, not a library that's actually shareable, none of the FFmpeg objects need to be exported from it at all. However, static libraries barely have anything to configure overall as far as I know, so disabling exports specifically for FFmpeg may be complicated — but thankfully, we can (and even should, to reduce the file size) use -fvisibility=hidden globally in our application, if that helps fix the issue. -Wl,-Bsymbolic should be fine too, and possibly even less intrusive.If using __attribute__((visibility("hidden"))) for the lookup tables prevents dynamic relocations from being inserted, and if that doesn't break common usages of libavcodec, yes, it should be a way better solution than introducing an additional memory load at runtime. I'll check if that works for us tomorrow or in a couple of days.If we're able to avoid using the global object table this way though, maybe it could be desirable to also get rid of `movrelx` in the AArch32 code as well?By the way, I'm also super confused by how the offset is applied in the local `movrel` currently, it looks very inconsistent. The `adrp` and `add` combination, as I understand its operation, should work for any 32-bit literal value, not specifically for addresses of known objects — `adrp` accepting the upper 20 bits as a literal and adding them to the PC, and then `add` just adding the lower 12 bits, the offset within the page, also taken as a literal.While `movrelx` most likely requires the offset to be applied explicitly using 0…4095 `add` or `sub` afterwards because it's first loaded from a lookup table of addresses of, if I understand correctly, actual objects in the code (I wouldn't expect it to be filled with object+8, object+16, object+24 pointers in case of a large object for loading time reasons, though I haven't researched this topic in detail), if everything `movrel` does is adding the PC to the input literal… do we even need to emit `sub` for negative offsets in it?The current Linux implementation of `movrel` just adds the offset directly to the label using + regardless of whether it's positive or negative, and there already are instances of negative offsets in the code (`subpel_filters` minus 16 in vp8dsp_neon.S). The AArch32 version of this code also doesn't use an offset parameter, rather, passing `subpel_filters-16` directly as the label argument.However, the Apple AArch64 implementation does have two paths for applying the offset — directly to the literal if it's positive, but via a `sub` instruction for a negative one. This is also true for the Windows implementation — whose existence overall is questionable, as Windows DLLs use a different relocation method, and PIC doesn't apply to them at all if I understand correctly; and also the `movrel` implementation for Windows with a non-negative offset is identical to the Linux version, minus the hardware-assisted ASan path on Linux.I'll likely play with the assembler later to see how negative offsets are interpreted, but still, is there a reason to emit the subtraction instruction that you can remember, or would it be safe to possibly even remove the offset argument completely?For `movrelx`, however, if it's needed at all, the offset argument is desirable as without PIC, it will be applied to the label literal for free.Thank you!— TriOn Mon, 11 Jul 2022, Martin Storsjö wrote:> Hi,>> Thanks for your patch! While the patch probably is worthwhile to pursue, > ffmpeg does work on Android 6 and newer (with the aarch64 assembly), so > there's some gaps/misses in the reasoning in the patch description.>> On Sun, 10 Jul 2022, Triang3l wrote:>>> To support PIC, all AArch64 assembly code in FFmpeg uses the `movrel` macro >> to obtain addresses of labels, such as lookup table constants, in a way >> that with CONFIG_PIC being 1, PC-relative addresses of labels are computed >> via the `adrp` instruction.>>> This approach, however, is suitable only for labels defined in the same >> object file. For `adrp` to work directly between object files, the linker >> has to perform a text relocation.>> No, it doesn't. It's fully possible to build such libraries without text > relocations currently.>> My memory of the situation was that we're linking our shared libraries with > -Wl,-Bsymbolic, which means that those symbol references are bound at link > time, so the offset from adrp to the target symbol is fixed at link time, and > no text relocation is needed.>> However I did try to link such shared libraries, removing the -Wl,-Bsymbolic > argument, and it still succeeds. So I'm a bit unsure what really makes it > work for me when it isn't working for you.Andreas Rheinhardt kindly reminded me that when linking libavcodec.so, we add -Wl,--version-script,libavcodec/libavcodec.ver, which makes e.g. ff_cos_32 a non-exported symbol, which means that it can't be interposed, and thus the relocation can be fixed at link time.If I remove that argument, I can reproduce the linker errors, and adding -Wl,-Bsymbolic fixes it.I wonder if we could mark these symbols as ELF hidden, so that they wouldn't need to be interposable at all, even when you link the static library into a separate shared library?// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-07-11 9:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-10 16:08 [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC Triang3l
2022-07-10 22:17 ` Martin Storsjö
2022-07-10 22:28 ` Martin Storsjö
2022-07-11 0:25 ` Triang3l
2022-07-11 9:00 ` Martin Storsjö
2022-07-11 0:31 Triang3l
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git