Re: [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC

From: "Martin Storsjö" <martin@martin.st>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH] avcodec/aarch64: Access externs via GOT with PIC
Date: Mon, 11 Jul 2022 12:00:22 +0300 (EEST)
Message-ID: <a85b9faa-90dc-d784-6cf1-5b76f5128083@martin.st> (raw)
In-Reply-To: <20220711032533.PXhaNTgC@sas8-c6148047b62a.qloud-c.yandex.net>

On Mon, 11 Jul 2022, Triang3l wrote:

> However, static libraries barely have anything to configure overall as 
> far as I know, so disabling exports specifically for FFmpeg may be 
> complicated — but thankfully, we can (and even should, to reduce the 
> file size) use -fvisibility=hidden globally in our application, if that 
> helps fix the issue.

Yes, we could consider if we should build the libraries with 
-fvisibility=hidden (maybe as an option), but that's not always 
necessarily the best option. In particular we would want to set the 
default visibility for e.g. the public API symbols in that case. (Trying 
to export such symbols via the version script doesn't help, when they're 
explicitly set as hidden originally.)

Note that building your own application code with this option doesn't help 
much here; it's the libavcodec object files that would need to be build 
that way.

> -Wl,-Bsymbolic should be fine too, and possibly even less 
> intrusive.

Yes, that's quite non-intrusive, and should be easy to add as a linker 
option in your case.

> If using __attribute__((visibility("hidden"))) for the lookup 
> tables prevents dynamic relocations from being inserted, and if that 
> doesn't break common usages of libavcodec, yes, it should be a way 
> better solution than introducing an additional memory load at runtime.

I did a quick test with that and it seems like it works - I'll post a 
patch for that shortly.

> If we're able to avoid using the global object table this way though, 
> maybe it could be desirable to also get rid of `movrelx` in the AArch32 
> code as well?

I wouldn't start touching that - if I remember correctly, movrelx is 
needed there for a bunch of other reasons - due to different relocation 
types and addressing modes there.

> By the way, I'm also super confused by how the offset is applied in 
> the local `movrel` currently, it looks very inconsistent. The `adrp` and 
> `add` combination, as I understand its operation, should work for any 
> 32-bit literal value, not specifically for addresses of known objects — 
> `adrp` accepting the upper 20 bits as a literal and adding them to the 
> PC, and then `add` just adding the lower 12 bits, the offset within the 
> page, also taken as a literal.

Trust me, it specifically needs to be like this for a reason.

> if everything `movrel` does is adding the PC to the input literal… do we 
> even need to emit `sub` for negative offsets in it?

When the final binary is linked and run, then yes, all the adrp+add pair 
does is add a literal to PC.

But before that, when an object file is assembled, the instruction opcodes 
can't be finalized with the actual literal value, as the distance from the 
adrp/add pair to the targeted symbol only is known at link time.

Therefore, the object file stores relocations that say "fix up this adrp 
instruction with the actual offset to 'symbol X + 42 bytes'". For ELF 
object files, the object file format and relocations allow a negative 
offset, but for MachO and COFF, it doesn't (or it might be limited 
accidentally by the tools). In either case; on MachO and COFF we can't 
practically express a relocation against "symbol minus some bytes" - so we 
produce an extra 'sub' instruction in those cases.

> This is also true for the Windows implementation — whose existence 
> overall is questionable, as Windows DLLs use a different relocation 
> method, and PIC doesn't apply to them at all if I understand correctly;

While Windows code doesn't do proper strict PIC like on ELF, CONFIG_PIC 
does end up set in those configurations (like I already mentioned in the 
previous mail), and referencing symbols with adrp+add is generally 
preferrable over the non-PIC codepath of "ldr rX, =\val+\offset".

The latter will always store an absolute address in the constant island 
produced by the ldr pseudo instruction, and storing an absolute address 
emits a so called "base relocation" into the linked PE-COFF DLL. When a 
DLL is loaded at a non-default address, the loader will need to fix those 
up - essentially the same as text relocations on ELF. When using adrp+add 
on PE-COFF, no such base relocations are needed.

So while PE-COFF doesn't have true strict PIC, in practice you need very 
few base relocations on AArch64 - but if we'd skip the adrp+add 
instructions and use the non-PIC codepath of ldr as you suggest, we'd have 
much more base relocations.

> is there a reason to emit the subtraction instruction that you can 
> remember,

Yes, there is a reason.

> or would it be safe to possibly even remove the offset argument 
> completely?

No, it's not safe to remove that, it's clearly there for a reason.

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".