From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 96C1A43EB5 for ; Fri, 16 Sep 2022 08:39:55 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 20A2868BBA5; Fri, 16 Sep 2022 11:39:53 +0300 (EEST) Received: from mail-yb1-f170.google.com (mail-yb1-f170.google.com [209.85.219.170]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 51D4F68B9FF for ; Fri, 16 Sep 2022 11:39:47 +0300 (EEST) Received: by mail-yb1-f170.google.com with SMTP id c9so31451339ybf.5 for ; Fri, 16 Sep 2022 01:39:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date; bh=LYjI2luxVHriAgCTIGQKaHBsY2L16hnGyQJKhN0ZqqM=; b=k3fcz2Moak6JyA9DcARJGBn49izb23qDUL172YDs9T1u8Q5SvPzmp3gFIi02l518KW Xe9yUCl/UPRiAA0LX4/dY9vhBd5uznOdY5B/LDRPKELBQE5gtDASW/WmuY/v5BG/+T1l BKbQcEOHeKiEBphgbMpCTnbP0dKMqGrBAITHzLv9DrxDNwJd48aBbonTvBcjpHKkI/Fr Qgw5VuF3Yw5ryK91pqA8t6qV2wJQtHLCTuXEjbl7WlqdQqVsqyHDnKOJLTadN42Zc6y0 PgXJ+wJzfKMRiX90VU7i3Z4YrFgHFKSbdMGeGQsMucSckrILWKQ6fBytLFonSU1DX/9i iN8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=LYjI2luxVHriAgCTIGQKaHBsY2L16hnGyQJKhN0ZqqM=; b=UVA5KZvJgDkewlNclThQtCG4fQDqh4ImJfvC13Mw1VY13v2ez9j+cNCxR2dMTyKTXy BXicwMNWnpmrRwMtb4zk7GdqOrYZGqRi3P4l1krNWvlsy8bDwdEJYFRs3AtnXvb43RsW C3zPS1oLpN7TpKfhRwgKDn+9VJXeDA+7VGOFpNhocZnckTn/3O5XKD+3YH2HsWYczoMA Gs6sE8hDWRq/tBBiiLIRg2yVcJgPyiWFTllAp5T6DjMAMYOQyvZM2QIvt8QLAAZVCR1Q uowZbL05ogQ8Z9ILXzQA0xv5qcI21fn+hbUR7S4Hn0PfbyX+uzhDsJYdmf03GQlReU6W RZsQ== X-Gm-Message-State: ACrzQf0+iQXeTyP9yKn2hUzcf6M5lATZlEcgWHdJGqBhOTqy+hWpuqKM TX/FzCkAe4UFCwlRt9eF7tcli9cSpE3oh7gnOUq6alsC X-Google-Smtp-Source: AMsMyM5cbRjc9v3/t6BmfkR7afPIYnmfCh0ofeYZ8k5pfhkaCObW+X8u1p7WSPS0yDCNx1rtd6VEbdMiZZhKQ854DEs= X-Received: by 2002:a05:6902:1143:b0:6af:1696:9730 with SMTP id p3-20020a056902114300b006af16969730mr3508058ybu.250.1663317585786; Fri, 16 Sep 2022 01:39:45 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a81:7402:0:0:0:0:0 with HTTP; Fri, 16 Sep 2022 01:39:45 -0700 (PDT) In-Reply-To: References: From: Paul B Mahol Date: Fri, 16 Sep 2022 10:39:45 +0200 Message-ID: To: FFmpeg development discussions and patches Subject: Re: [FFmpeg-devel] [PATCH 3/3] swscale/output: Don't call av_pix_fmt_desc_get() in a loop X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Andreas Rheinhardt Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 9/8/22, Andreas Rheinhardt wrote: > Up until now, libswscale/output.c used a macro to write > an output pixel which involved a call to av_pix_fmt_desc_get() > to find out whether the input pixel format is BE or LE > despite this being known at compile-time (there are templates > per pixfmt). Even worse, these calls are made in a loop, > so that e.g. there are eight calls to av_pix_fmt_desc_get() > for every pixel processed in yuv2rgba64_X_c_template() > for 64bit RGB formats. > LGTM for whole set. Got nice speed boost for not SIMD optimized conversions in swscale. > This commit modifies these macros to ensure that isBE() > is evaluated at compile-time. This saved 41184B of .text > for me (GCC 11.2, -O3). > > Signed-off-by: Andreas Rheinhardt > --- > This must be the lowest-hanging fruit in the whole codebase. > Two other question: Why do all these functions in swscale_internal.h > take an enum AVPixelFormat instead of accepting an AVPixFmtDescriptor? > And would making av_pix_fmt_desc_get() av_const be beneficial? > > libswscale/output.c | 101 +++++++++++++++++++++++++------------------- > 1 file changed, 58 insertions(+), 43 deletions(-) > > diff --git a/libswscale/output.c b/libswscale/output.c > index 40a4476c6d..590334eb57 100644 > --- a/libswscale/output.c > +++ b/libswscale/output.c > @@ -919,7 +919,7 @@ YUV2PACKEDWRAPPER(yuv2, 422, uyvy422, > AV_PIX_FMT_UYVY422) > #define R_B ((target == AV_PIX_FMT_RGB48LE || target == AV_PIX_FMT_RGB48BE > || target == AV_PIX_FMT_RGBA64LE || target == AV_PIX_FMT_RGBA64BE) ? R : B) > #define B_R ((target == AV_PIX_FMT_RGB48LE || target == AV_PIX_FMT_RGB48BE > || target == AV_PIX_FMT_RGBA64LE || target == AV_PIX_FMT_RGBA64BE) ? B : R) > #define output_pixel(pos, val) \ > - if (isBE(target)) { \ > + if (is_be) { \ > AV_WB16(pos, val); \ > } else { \ > AV_WL16(pos, val); \ > @@ -931,7 +931,8 @@ yuv2ya16_X_c_template(SwsContext *c, const int16_t > *lumFilter, > const int16_t *chrFilter, const int32_t > **unused_chrUSrc, > const int32_t **unused_chrVSrc, int > unused_chrFilterSize, > const int32_t **alpSrc, uint16_t *dest, int dstW, > - int y, enum AVPixelFormat target, int > unused_hasAlpha, int unused_eightbytes) > + int y, enum AVPixelFormat target, > + int unused_hasAlpha, int unused_eightbytes, int > is_be) > { > int hasAlpha = !!alpSrc; > int i; > @@ -968,7 +969,8 @@ yuv2ya16_2_c_template(SwsContext *c, const int32_t > *buf[2], > const int32_t *unused_ubuf[2], const int32_t > *unused_vbuf[2], > const int32_t *abuf[2], uint16_t *dest, int dstW, > int yalpha, int unused_uvalpha, int y, > - enum AVPixelFormat target, int unused_hasAlpha, int > unused_eightbytes) > + enum AVPixelFormat target, int unused_hasAlpha, > + int unused_eightbytes, int is_be) > { > int hasAlpha = abuf && abuf[0] && abuf[1]; > const int32_t *buf0 = buf[0], *buf1 = buf[1], > @@ -999,7 +1001,8 @@ static av_always_inline void > yuv2ya16_1_c_template(SwsContext *c, const int32_t *buf0, > const int32_t *unused_ubuf[2], const int32_t > *unused_vbuf[2], > const int32_t *abuf0, uint16_t *dest, int dstW, > - int unused_uvalpha, int y, enum AVPixelFormat > target, int unused_hasAlpha, int unused_eightbytes) > + int unused_uvalpha, int y, enum AVPixelFormat > target, > + int unused_hasAlpha, int unused_eightbytes, int > is_be) > { > int hasAlpha = !!abuf0; > int i; > @@ -1027,7 +1030,8 @@ yuv2rgba64_X_c_template(SwsContext *c, const int16_t > *lumFilter, > const int16_t *chrFilter, const int32_t **chrUSrc, > const int32_t **chrVSrc, int chrFilterSize, > const int32_t **alpSrc, uint16_t *dest, int dstW, > - int y, enum AVPixelFormat target, int hasAlpha, int > eightbytes) > + int y, enum AVPixelFormat target, int hasAlpha, int > eightbytes, > + int is_be) > { > int i; > int A1 = 0xffff<<14, A2 = 0xffff<<14; > @@ -1108,7 +1112,8 @@ yuv2rgba64_2_c_template(SwsContext *c, const int32_t > *buf[2], > const int32_t *ubuf[2], const int32_t *vbuf[2], > const int32_t *abuf[2], uint16_t *dest, int dstW, > int yalpha, int uvalpha, int y, > - enum AVPixelFormat target, int hasAlpha, int > eightbytes) > + enum AVPixelFormat target, int hasAlpha, int > eightbytes, > + int is_be) > { > const int32_t *buf0 = buf[0], *buf1 = buf[1], > *ubuf0 = ubuf[0], *ubuf1 = ubuf[1], > @@ -1172,7 +1177,8 @@ static av_always_inline void > yuv2rgba64_1_c_template(SwsContext *c, const int32_t *buf0, > const int32_t *ubuf[2], const int32_t *vbuf[2], > const int32_t *abuf0, uint16_t *dest, int dstW, > - int uvalpha, int y, enum AVPixelFormat target, int > hasAlpha, int eightbytes) > + int uvalpha, int y, enum AVPixelFormat target, > + int hasAlpha, int eightbytes, int is_be) > { > const int32_t *ubuf0 = ubuf[0], *vbuf0 = vbuf[0]; > int i; > @@ -1277,7 +1283,8 @@ yuv2rgba64_full_X_c_template(SwsContext *c, const > int16_t *lumFilter, > const int16_t *chrFilter, const int32_t **chrUSrc, > const int32_t **chrVSrc, int chrFilterSize, > const int32_t **alpSrc, uint16_t *dest, int dstW, > - int y, enum AVPixelFormat target, int hasAlpha, int > eightbytes) > + int y, enum AVPixelFormat target, int hasAlpha, > + int eightbytes, int is_be) > { > int i; > int A = 0xffff<<14; > @@ -1340,7 +1347,8 @@ yuv2rgba64_full_2_c_template(SwsContext *c, const > int32_t *buf[2], > const int32_t *ubuf[2], const int32_t *vbuf[2], > const int32_t *abuf[2], uint16_t *dest, int dstW, > int yalpha, int uvalpha, int y, > - enum AVPixelFormat target, int hasAlpha, int > eightbytes) > + enum AVPixelFormat target, int hasAlpha, int > eightbytes, > + int is_be) > { > const int32_t *buf0 = buf[0], *buf1 = buf[1], > *ubuf0 = ubuf[0], *ubuf1 = ubuf[1], > @@ -1391,7 +1399,8 @@ static av_always_inline void > yuv2rgba64_full_1_c_template(SwsContext *c, const int32_t *buf0, > const int32_t *ubuf[2], const int32_t *vbuf[2], > const int32_t *abuf0, uint16_t *dest, int dstW, > - int uvalpha, int y, enum AVPixelFormat target, int > hasAlpha, int eightbytes) > + int uvalpha, int y, enum AVPixelFormat target, > + int hasAlpha, int eightbytes, int is_be) > { > const int32_t *ubuf0 = ubuf[0], *vbuf0 = vbuf[0]; > int i; > @@ -1468,7 +1477,11 @@ yuv2rgba64_full_1_c_template(SwsContext *c, const > int32_t *buf0, > #undef r_b > #undef b_r > > -#define YUV2PACKED16WRAPPER(name, base, ext, fmt, hasAlpha, eightbytes) \ > +#define IS_BE_LE 0 > +#define IS_BE_BE 1 > +#define IS_BE(BE_LE) IS_BE_ ## BE_LE > + > +#define YUV2PACKED16WRAPPER_0(name, base, ext, fmt, is_be, hasAlpha, > eightbytes) \ > static void name ## ext ## _X_c(SwsContext *c, const int16_t *lumFilter, \ > const int16_t **_lumSrc, int lumFilterSize, \ > const int16_t *chrFilter, const int16_t **_chrUSrc, > \ > @@ -1483,7 +1496,7 @@ static void name ## ext ## _X_c(SwsContext *c, const > int16_t *lumFilter, \ > uint16_t *dest = (uint16_t *) _dest; \ > name ## base ## _X_c_template(c, lumFilter, lumSrc, lumFilterSize, \ > chrFilter, chrUSrc, chrVSrc, chrFilterSize, \ > - alpSrc, dest, dstW, y, fmt, hasAlpha, > eightbytes); \ > + alpSrc, dest, dstW, y, fmt, hasAlpha, eightbytes, > is_be); \ > } \ > \ > static void name ## ext ## _2_c(SwsContext *c, const int16_t *_buf[2], \ > @@ -1497,7 +1510,7 @@ static void name ## ext ## _2_c(SwsContext *c, const > int16_t *_buf[2], \ > **abuf = (const int32_t **) _abuf; \ > uint16_t *dest = (uint16_t *) _dest; \ > name ## base ## _2_c_template(c, buf, ubuf, vbuf, abuf, \ > - dest, dstW, yalpha, uvalpha, y, fmt, hasAlpha, > eightbytes); \ > + dest, dstW, yalpha, uvalpha, y, fmt, hasAlpha, > eightbytes, is_be); \ > } \ > \ > static void name ## ext ## _1_c(SwsContext *c, const int16_t *_buf0, \ > @@ -1511,36 +1524,38 @@ static void name ## ext ## _1_c(SwsContext *c, const > int16_t *_buf0, \ > *abuf0 = (const int32_t *) _abuf0; \ > uint16_t *dest = (uint16_t *) _dest; \ > name ## base ## _1_c_template(c, buf0, ubuf, vbuf, abuf0, dest, \ > - dstW, uvalpha, y, fmt, hasAlpha, > eightbytes); \ > -} > - > -YUV2PACKED16WRAPPER(yuv2, rgba64, rgb48be, AV_PIX_FMT_RGB48BE, 0, 0) > -YUV2PACKED16WRAPPER(yuv2, rgba64, rgb48le, AV_PIX_FMT_RGB48LE, 0, 0) > -YUV2PACKED16WRAPPER(yuv2, rgba64, bgr48be, AV_PIX_FMT_BGR48BE, 0, 0) > -YUV2PACKED16WRAPPER(yuv2, rgba64, bgr48le, AV_PIX_FMT_BGR48LE, 0, 0) > -YUV2PACKED16WRAPPER(yuv2, rgba64, rgba64be, AV_PIX_FMT_RGBA64BE, 1, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64, rgba64le, AV_PIX_FMT_RGBA64LE, 1, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64, rgbx64be, AV_PIX_FMT_RGBA64BE, 0, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64, rgbx64le, AV_PIX_FMT_RGBA64LE, 0, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64, bgra64be, AV_PIX_FMT_BGRA64BE, 1, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64, bgra64le, AV_PIX_FMT_BGRA64LE, 1, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64, bgrx64be, AV_PIX_FMT_BGRA64BE, 0, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64, bgrx64le, AV_PIX_FMT_BGRA64LE, 0, 1) > -YUV2PACKED16WRAPPER(yuv2, ya16, ya16be, AV_PIX_FMT_YA16BE, 1, 0) > -YUV2PACKED16WRAPPER(yuv2, ya16, ya16le, AV_PIX_FMT_YA16LE, 1, 0) > - > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgb48be_full, AV_PIX_FMT_RGB48BE, 0, > 0) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgb48le_full, AV_PIX_FMT_RGB48LE, 0, > 0) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgr48be_full, AV_PIX_FMT_BGR48BE, 0, > 0) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgr48le_full, AV_PIX_FMT_BGR48LE, 0, > 0) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgba64be_full, AV_PIX_FMT_RGBA64BE, > 1, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgba64le_full, AV_PIX_FMT_RGBA64LE, > 1, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgbx64be_full, AV_PIX_FMT_RGBA64BE, > 0, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgbx64le_full, AV_PIX_FMT_RGBA64LE, > 0, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgra64be_full, AV_PIX_FMT_BGRA64BE, > 1, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgra64le_full, AV_PIX_FMT_BGRA64LE, > 1, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgrx64be_full, AV_PIX_FMT_BGRA64BE, > 0, 1) > -YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgrx64le_full, AV_PIX_FMT_BGRA64LE, > 0, 1) > + dstW, uvalpha, y, fmt, hasAlpha, > eightbytes, is_be); \ > +} > +#define YUV2PACKED16WRAPPER(name, base, ext, fmt, endianness, hasAlpha, > eightbytes) \ > + YUV2PACKED16WRAPPER_0(name, base, ext, fmt ## endianness, > IS_BE(endianness), hasAlpha, eightbytes) > + > +YUV2PACKED16WRAPPER(yuv2, rgba64, rgb48be, AV_PIX_FMT_RGB48, BE, 0, 0) > +YUV2PACKED16WRAPPER(yuv2, rgba64, rgb48le, AV_PIX_FMT_RGB48, LE, 0, 0) > +YUV2PACKED16WRAPPER(yuv2, rgba64, bgr48be, AV_PIX_FMT_BGR48, BE, 0, 0) > +YUV2PACKED16WRAPPER(yuv2, rgba64, bgr48le, AV_PIX_FMT_BGR48, LE, 0, 0) > +YUV2PACKED16WRAPPER(yuv2, rgba64, rgba64be, AV_PIX_FMT_RGBA64, BE, 1, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64, rgba64le, AV_PIX_FMT_RGBA64, LE, 1, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64, rgbx64be, AV_PIX_FMT_RGBA64, BE, 0, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64, rgbx64le, AV_PIX_FMT_RGBA64, LE, 0, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64, bgra64be, AV_PIX_FMT_BGRA64, BE, 1, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64, bgra64le, AV_PIX_FMT_BGRA64, LE, 1, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64, bgrx64be, AV_PIX_FMT_BGRA64, BE, 0, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64, bgrx64le, AV_PIX_FMT_BGRA64, LE, 0, 1) > +YUV2PACKED16WRAPPER(yuv2, ya16, ya16be, AV_PIX_FMT_YA16, BE, 1, 0) > +YUV2PACKED16WRAPPER(yuv2, ya16, ya16le, AV_PIX_FMT_YA16, LE, 1, 0) > + > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgb48be_full, AV_PIX_FMT_RGB48, BE, > 0, 0) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgb48le_full, AV_PIX_FMT_RGB48, LE, > 0, 0) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgr48be_full, AV_PIX_FMT_BGR48, BE, > 0, 0) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgr48le_full, AV_PIX_FMT_BGR48, LE, > 0, 0) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgba64be_full, AV_PIX_FMT_RGBA64, > BE, 1, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgba64le_full, AV_PIX_FMT_RGBA64, > LE, 1, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgbx64be_full, AV_PIX_FMT_RGBA64, > BE, 0, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, rgbx64le_full, AV_PIX_FMT_RGBA64, > LE, 0, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgra64be_full, AV_PIX_FMT_BGRA64, > BE, 1, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgra64le_full, AV_PIX_FMT_BGRA64, > LE, 1, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgrx64be_full, AV_PIX_FMT_BGRA64, > BE, 0, 1) > +YUV2PACKED16WRAPPER(yuv2, rgba64_full, bgrx64le_full, AV_PIX_FMT_BGRA64, > LE, 0, 1) > > /* > * Write out 2 RGB pixels in the target pixel format. This function takes > a > -- > 2.34.1 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".