On Fri, Sep 16, 2022 at 04:55:39PM +0200, Andreas Rheinhardt wrote: > Up until now, libswscale/output.c used a macro to write > an output pixel which involved a call to av_pix_fmt_desc_get() > to find out whether the input pixel format is BE or LE > despite this being known at compile-time (there are templates > per pixfmt). Even worse, these calls are made in a loop, > so that e.g. there are eight calls to av_pix_fmt_desc_get() > for every pixel processed in yuv2rgba64_X_c_template() > for 64bit RGB formats. > > This commit modifies these macros to ensure that isBE() > is evaluated at compile-time. This saved 41184B of .text > for me (GCC 11.2, -O3). Of course, it also improved performance. > E.g. ffmpeg_g -f lavfi -i testsrc2,format=yuva420p -pix_fmt rgba64le \ > -threads 1 -t 1:00 -f null - (which uses yuv2rgba64le_X_c, > which is an invocation of yuv2rgba64_X_c_template() mentioned above), > performance improved from 95589 to 41387 decicycles for one call > to yuv2packedX; for the be variant the numbers went down from > 76087 to 43024 decicycles. > > Signed-off-by: Andreas Rheinhardt > --- > libswscale/output.c | 100 +++++++++++++++++++++++++------------------- > 1 file changed, 58 insertions(+), 42 deletions(-) This looks alot better than before thx PS: i still think that broader support for compile time evaluation of "pure" functions would be usefull. Ideally with minimal mess on the source side, more on the build tool side [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB When you are offended at any man's fault, turn to yourself and study your own failings. Then you will forget your anger. -- Epictetus