Re: [FFmpeg-devel] [PATCH v4] Mark C globals with small code model

From: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To: ffmpeg-devel@ffmpeg.org
Subject: Re: [FFmpeg-devel] [PATCH v4] Mark C globals with small code model
Date: Wed, 12 Mar 2025 00:45:09 +0100
Message-ID: <GV1P250MB07370CAFC2098EA75B92DFCF8FD12@GV1P250MB0737.EURP250.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <20250311191737.2393125-1-prka@google.com>

Pranav Kant via ffmpeg-devel:
> By default, all globals in C/C++ compiled by clang are allocated
> in non-large data sections. See [1] for background on code models.
> For PIC (Position independent code), this is fine as long as binary is
> small but as binary size increases, users maybe want to use medium/large
> code models (-mcmodel=medium) which moves data in to large sections.
> As data in these large sections cannot be accessed using PIC code
> anymore (as it may be too far away), compiler ends up using a different
> instruction sequence when building C/C++ code -- using GOT to access
> these globals (which can be relaxed by linker at link time if binary
> ends up being smaller). However, assembly files continue to access these
> globals defined in C/C++ files using older (and invalid instruction
> sequence). So, we mark all such globals with an attribute that forces
> them to be allocated in small sections allowing them to validly be
> accessed from the assembly code.
> 
> This patch should not have any affect on builds that use small code
> model, which is the default mode.
> 
> [1] https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models
> 
> Signed-off-by: Pranav Kant <prka@google.com>
> ---
>  libavcodec/ac3dsp.c             |  2 ++
>  libavcodec/cabac.c              |  2 ++
>  libavcodec/x86/constants.c      |  8 ++++++++
>  libavutil/attributes.h          |  6 ++++++
>  libavutil/attributes_internal.h | 16 ++++++++++++++++
>  5 files changed, 34 insertions(+)
> 
> diff --git a/libavcodec/ac3dsp.c b/libavcodec/ac3dsp.c
> index 730fa70fff..d16b6c24c3 100644
> --- a/libavcodec/ac3dsp.c
> +++ b/libavcodec/ac3dsp.c
> @@ -25,6 +25,7 @@
>  
>  #include "config.h"
>  #include "libavutil/attributes.h"
> +#include "libavutil/attributes_internal.h"
>  #include "libavutil/common.h"
>  #include "libavutil/intmath.h"
>  #include "libavutil/mem_internal.h"
> @@ -104,6 +105,7 @@ static void ac3_update_bap_counts_c(uint16_t mant_cnt[16], uint8_t *bap,
>          mant_cnt[bap[len]]++;
>  }
>  
> +attribute_mcmodel_small
>  DECLARE_ALIGNED(16, const uint16_t, ff_ac3_bap_bits)[16] = {

Shouldn't stuff like this be applied to the declaration so that C code
can also take advantage of the knowledge that this object will be placed
in the small code section?

>      0,  0,  0,  3,  0,  4,  5,  6,  7,  8,  9, 10, 11, 12, 14, 16
>  };
> diff --git a/libavcodec/cabac.c b/libavcodec/cabac.c
> index 7d41cd2ae6..b8c6db29a2 100644
> --- a/libavcodec/cabac.c
> +++ b/libavcodec/cabac.c
> @@ -24,11 +24,13 @@
>   * Context Adaptive Binary Arithmetic Coder.
>   */
>  
> +#include "libavutil/attributes_internal.h"
>  #include "libavutil/error.h"
>  #include "libavutil/mem_internal.h"
>  
>  #include "cabac.h"
>  
> +attribute_mcmodel_small
>  DECLARE_ASM_ALIGNED(1, const uint8_t, ff_h264_cabac_tables)[512 + 4*2*64 + 4*64 + 63] = {

Your commit message ("However, assembly files continue to access")
speaks only of assembly files, i.e. external asm. Yet this here is only
used by inline ASM. Looking through the code the reason for this is that
I thought that specifying the memory model is only necessary for stuff
used by external asm, yet ff_h264_cabac_tables does not seem to be used
by external ASM at all, only inline ASM. If I see this correctly, the
reason for this is that LOCAL_MANGLE (and therefore MANGLE) uses rip
addressing on x64 when configure sets the RIP define. But this means
that the set of files needing attribute_mcmodel_small is a superset of
the files currently using DECLARE_ASM_ALIGNED. This means that one would
only need two macros for the variables accessed by ASM: One for only
external ASM and one for inline ASM (and potentially external ASM)
instead of adding attribute_mcmodel_small at various places in the codebase.

>      9,8,7,7,6,6,6,6,5,5,5,5,5,5,5,5,
>      4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
> diff --git a/libavcodec/x86/constants.c b/libavcodec/x86/constants.c
> index bc7f2b17b8..347b7dd1d3 100644
> --- a/libavcodec/x86/constants.c
> +++ b/libavcodec/x86/constants.c
> @@ -18,17 +18,21 @@
>   * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>   */
>  
> +#include "libavutil/attributes_internal.h"
>  #include "libavutil/mem_internal.h"
>  #include "libavutil/x86/asm.h" // for xmm_reg
>  #include "constants.h"
>  
> +attribute_mcmodel_small
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pw_1)    = {
0x0001000100010001ULL, 0x0001000100010001ULL,>
                           0x0001000100010001ULL, 0x0001000100010001ULL };
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pw_2)    = { 0x0002000200020002ULL, 0x0002000200020002ULL,
>                                                      0x0002000200020002ULL, 0x0002000200020002ULL };
>  DECLARE_ASM_ALIGNED(16, const xmm_reg,  ff_pw_3)    = { 0x0003000300030003ULL, 0x0003000300030003ULL };
> +attribute_mcmodel_small
>  DECLARE_ASM_ALIGNED(32, const ymm_reg,  ff_pw_4)    = { 0x0004000400040004ULL, 0x0004000400040004ULL,
>                                                      0x0004000400040004ULL, 0x0004000400040004ULL };
> +attribute_mcmodel_small
>  DECLARE_ASM_ALIGNED(16, const xmm_reg,  ff_pw_5)    = { 0x0005000500050005ULL, 0x0005000500050005ULL };
>  DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_8)    = { 0x0008000800080008ULL, 0x0008000800080008ULL };
>  DECLARE_ASM_ALIGNED(16, const xmm_reg,  ff_pw_9)    = { 0x0009000900090009ULL, 0x0009000900090009ULL };
> @@ -49,6 +53,7 @@ DECLARE_ALIGNED(32, const ymm_reg,  ff_pw_256)  = { 0x0100010001000100ULL, 0x010
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pw_512)  = { 0x0200020002000200ULL, 0x0200020002000200ULL,
>                                                      0x0200020002000200ULL, 0x0200020002000200ULL };
>  DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_1019) = { 0x03FB03FB03FB03FBULL, 0x03FB03FB03FB03FBULL };
> +attribute_mcmodel_small
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pw_1023) = { 0x03ff03ff03ff03ffULL, 0x03ff03ff03ff03ffULL,
>                                                      0x03ff03ff03ff03ffULL, 0x03ff03ff03ff03ffULL};
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pw_1024) = { 0x0400040004000400ULL, 0x0400040004000400ULL,
> @@ -66,13 +71,16 @@ DECLARE_ALIGNED(32, const ymm_reg,  ff_pw_m1)   = { 0xFFFFFFFFFFFFFFFFULL, 0xFFF
>  
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pb_0)    = { 0x0000000000000000ULL, 0x0000000000000000ULL,
>                                                      0x0000000000000000ULL, 0x0000000000000000ULL };
> +attribute_mcmodel_small
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pb_1)    = { 0x0101010101010101ULL, 0x0101010101010101ULL,
>                                                      0x0101010101010101ULL, 0x0101010101010101ULL };
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pb_2)    = { 0x0202020202020202ULL, 0x0202020202020202ULL,
>                                                      0x0202020202020202ULL, 0x0202020202020202ULL };
> +attribute_mcmodel_small
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pb_3)    = { 0x0303030303030303ULL, 0x0303030303030303ULL,
>                                                      0x0303030303030303ULL, 0x0303030303030303ULL };
>  DECLARE_ALIGNED(32, const xmm_reg,  ff_pb_15)   = { 0x0F0F0F0F0F0F0F0FULL, 0x0F0F0F0F0F0F0F0FULL };
> +attribute_mcmodel_small
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pb_80)   = { 0x8080808080808080ULL, 0x8080808080808080ULL,
>                                                      0x8080808080808080ULL, 0x8080808080808080ULL };
>  DECLARE_ALIGNED(32, const ymm_reg,  ff_pb_FE)   = { 0xFEFEFEFEFEFEFEFEULL, 0xFEFEFEFEFEFEFEFEULL,

Why do you add this attribute to only eight of these constants? How did
you come up with this list? In addition to the above, git grep cextern
shows that there are several more variables than just these one (e.g.
ff_h263_loop_filter_strength, ff_sbr_noise_table).

> diff --git a/libavutil/attributes.h b/libavutil/attributes.h
> index 04c615c952..dfc35fa31e 100644
> --- a/libavutil/attributes.h
> +++ b/libavutil/attributes.h
> @@ -40,6 +40,12 @@
>  #    define AV_HAS_BUILTIN(x) 0
>  #endif
>  
> +#ifdef __has_attribute
> +#    define AV_HAS_ATTRIBUTE(x) __has_attribute(x)
> +#else
> +#    define AV_HAS_ATTRIBUTE(x) 0
> +#endif
> +
>  #ifndef av_always_inline
>  #if AV_GCC_VERSION_AT_LEAST(3,1)
>  #    define av_always_inline __attribute__((always_inline)) inline
> diff --git a/libavutil/attributes_internal.h b/libavutil/attributes_internal.h
> index bc85ce77ff..c557fa0af0 100644
> --- a/libavutil/attributes_internal.h
> +++ b/libavutil/attributes_internal.h
> @@ -19,6 +19,7 @@
>  #ifndef AVUTIL_ATTRIBUTES_INTERNAL_H
>  #define AVUTIL_ATTRIBUTES_INTERNAL_H
>  
> +#include "config.h"
>  #include "attributes.h"
>  
>  #if (AV_GCC_VERSION_AT_LEAST(4,0) || defined(__clang__)) && (defined(__ELF__) || defined(__MACH__))
> @@ -33,4 +34,19 @@
>  
>  #define EXTERN extern attribute_visibility_hidden
>  
> +/**
> + * Some globals defined in C files are used from hardcoded asm that assumes small
> + * code model (that is, accessing these globals without GOT). This is a problem
> + * when FFMpeg is built with medium code model (-mcmodel=medium) which allocates

s/FFMpeg/FFmpeg/

> + * all globals in a data section that's unreachable with PC relative instructions
> + * (small code model instruction sequence). We mark all such globals with this
> + * attribute_mcmodel_small to ensure assembly accessible globals continue to be
> + * allocated in sections reachable from PC relative instructions.
> + */
> +#if ARCH_X86_64 && defined(__ELF__) && AV_HAS_ATTRIBUTE(model)

You make this ARCH_X86_64 only, yet the concept of code models also
exists for other arches, e.g. AArch64. I presume that assembly for other
arches presumes that we are not using the large code model for accesses,
so that the same issue can happen with them. Then we have two options:

a) Continue to use the same macro you are adding here. This will mean
that stuff only used in assembly by different arches will nevertheless
be declared as using the small code model, potentially clogging the
small symbol address space unnecessarily.
b) Use one macro per arch.

I presume we want to go with a), because the objects we are talking
about are so small that they should be in the small data section anyway.

> +#    define attribute_mcmodel_small __attribute__((model("small")))
> +#else
> +#    define attribute_mcmodel_small
> +#endif
> +
>  #endif /* AVUTIL_ATTRIBUTES_INTERNAL_H */

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".