From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 3FC9D41145 for ; Fri, 15 Jul 2022 15:04:17 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 551F568B9CC; Fri, 15 Jul 2022 18:04:15 +0300 (EEST) Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id C916768B207 for ; Fri, 15 Jul 2022 18:04:08 +0300 (EEST) Received: by mail-wr1-f51.google.com with SMTP id q9so7093943wrd.8 for ; Fri, 15 Jul 2022 08:04:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=enlsJpnQnKUzdptGRiZIYohIHOitCTjiYJAdls6ZXP0=; b=HZ6jzy7D6Prd63qvpBsTewCXNw0HeBGKP4cTO4lZEcbDX05F+OHIs0O+0XZ6w/PoqL GtjBL8SLFxq3NTHwMHeVDXCvFyRisJimluKEdnmUXhjnUzJi5I6KPU0Soe/MfARw3BQp 4cOgeL0FzNE6RCUM0JGkUxvzL0QHRnzkWQqQcZATIYe0iGWb3AyVk6n8HF6pm4MOtYtK pWnSKtIqSXX9u8XtAaU3xtN54BBEcwqFGEBAXranwEfO9r4IYQQB62lQLH8Py86FwnBW yEdg0tVf+2hqjSoxKL67g8aIMDn5a5AExm7gkd2kGG5k7thAfrjCXWJxpki4HgIvEE/x h+tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=enlsJpnQnKUzdptGRiZIYohIHOitCTjiYJAdls6ZXP0=; b=ptG/EGFjhuB7DrTl2R2XO4UlDJDHOsa/Qe5taovsdsmN6miu3cAIbsGOvLCVwiEMdI ufIu3LV4SoEF8NRxoiBwt1V4Po0UMRmjLD/FebtNCH2e7NP+bs0ZoNGzzfZD/rTaaADH E5rpYpQOK8+Li3a4gLCdc+u+VPsJABzf4vkKWJZgmjppV36txSOP1rM7nwP3JhyA6FjM 36GuRpDqux3ogG4+wjFB5wMe+QaDehmK+WKzmSxZjcsoBMZesScortaM+1SAueCPDkTk i0WTWJUzJWTJq62FlpRu59dNfrCxUNssQPcE3J+YktsVCDk63fMsasf4kOenbsOOz0Z6 JHPw== X-Gm-Message-State: AJIora+WogUCw4SYIzLp6VxvG+bZKmqcyuqH1Yl+4MTzhM2HIdMdZ5z1 R28676fJ8R5yaUYCErqxVKMFJYyN0DIhDN94W+STLWEx2/0= X-Google-Smtp-Source: AGRyM1sRui0XZXVxtuV8p7hGt3Zsxvu9mhGmRbLAyeaRLXRwIdgeOdjNbfFVwObDs9cdC+nUe31l6MnxJxFM5m7QQ+M= X-Received: by 2002:a5d:6501:0:b0:21d:85ce:6b91 with SMTP id x1-20020a5d6501000000b0021d85ce6b91mr13195040wru.5.1657897447569; Fri, 15 Jul 2022 08:04:07 -0700 (PDT) MIME-Version: 1.0 References: <20220714165612.GC2088045@pb2> <20220715145943.3474147-1-alankelly@google.com> In-Reply-To: <20220715145943.3474147-1-alankelly@google.com> From: Alan Kelly Date: Fri, 15 Jul 2022 17:03:56 +0200 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH v2 4/5] libswscale: Enable hscale_avx2 for all input sizes. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Hi Michael, Thanks for looking at this. I fixed the test issue. Alan On Fri, Jul 15, 2022 at 4:59 PM Alan Kelly wrote: > ff_shuffle_filter_coefficients shuffles the tail as required. > --- > libswscale/utils.c | 19 ++++++++++++++++--- > libswscale/x86/swscale.c | 6 ++---- > tests/checkasm/sw_scale.c | 2 +- > 3 files changed, 19 insertions(+), 8 deletions(-) > > diff --git a/libswscale/utils.c b/libswscale/utils.c > index cb4f5b521c..544b7fee96 100644 > --- a/libswscale/utils.c > +++ b/libswscale/utils.c > @@ -266,8 +266,7 @@ int ff_shuffle_filter_coefficients(SwsContext *c, int > *filterPos, > #if ARCH_X86_64 > int i, j, k; > int cpu_flags = av_get_cpu_flags(); > - // avx2 hscale filter processes 16 pixel blocks. > - if (!filter || dstW % 16 != 0) > + if (!filter) > return 0; > if (EXTERNAL_AVX2_FAST(cpu_flags) && !(cpu_flags & > AV_CPU_FLAG_SLOW_GATHER)) { > if ((c->srcBpc == 8) && (c->dstBpc <= 14)) { > @@ -279,9 +278,11 @@ int ff_shuffle_filter_coefficients(SwsContext *c, int > *filterPos, > } > // Do not swap filterPos for pixels which won't be processed by > // the main loop. > - for (i = 0; i + 8 <= dstW; i += 8) { > + for (i = 0; i + 16 <= dstW; i += 16) { > FFSWAP(int, filterPos[i + 2], filterPos[i + 4]); > FFSWAP(int, filterPos[i + 3], filterPos[i + 5]); > + FFSWAP(int, filterPos[i + 10], filterPos[i + 12]); > + FFSWAP(int, filterPos[i + 11], filterPos[i + 13]); > } > if (filterSize > 4) { > // 16 pixels are processed at a time. > @@ -295,6 +296,18 @@ int ff_shuffle_filter_coefficients(SwsContext *c, int > *filterPos, > } > } > } > + // 4 pixels are processed at a time in the tail. > + for (; i < dstW; i += 4) { > + // 4 filter coeffs are processed at a time. > + int rem = dstW - i >= 4 ? 4 : dstW - i; > + for (k = 0; k + 4 <= filterSize; k += 4) { > + for (j = 0; j < rem; ++j) { > + int from = (i + j) * filterSize + k; > + int to = i * filterSize + j * 4 + k * 4; > + memcpy(&filter[to], &filterCopy[from], 4 * > sizeof(int16_t)); > + } > + } > + } > } > av_free(filterCopy); > } > diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c > index 628f12137c..f628c71bd4 100644 > --- a/libswscale/x86/swscale.c > +++ b/libswscale/x86/swscale.c > @@ -626,10 +626,8 @@ switch(c->dstBpc){ \ > > if (EXTERNAL_AVX2_FAST(cpu_flags) && !(cpu_flags & > AV_CPU_FLAG_SLOW_GATHER)) { > if ((c->srcBpc == 8) && (c->dstBpc <= 14)) { > - if (c->chrDstW % 16 == 0) > - ASSIGN_AVX2_SCALE_FUNC(c->hcScale, c->hChrFilterSize); > - if (c->dstW % 16 == 0) > - ASSIGN_AVX2_SCALE_FUNC(c->hyScale, c->hLumFilterSize); > + ASSIGN_AVX2_SCALE_FUNC(c->hcScale, c->hChrFilterSize); > + ASSIGN_AVX2_SCALE_FUNC(c->hyScale, c->hLumFilterSize); > } > } > > diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c > index b643a47c30..798990a6cf 100644 > --- a/tests/checkasm/sw_scale.c > +++ b/tests/checkasm/sw_scale.c > @@ -223,7 +223,7 @@ static void check_hscale(void) > ff_sws_init_scale(ctx); > memcpy(filterAvx2, filter, sizeof(uint16_t) * (SRC_PIXELS > * MAX_FILTER_WIDTH + MAX_FILTER_WIDTH)); > if ((cpu_flags & AV_CPU_FLAG_AVX2) && !(cpu_flags & > AV_CPU_FLAG_SLOW_GATHER)) > - ff_shuffle_filter_coefficients(ctx, filterPosAvx, > width, filterAvx2, SRC_PIXELS); > + ff_shuffle_filter_coefficients(ctx, filterPosAvx, > width, filterAvx2, ctx->dstW); > > if (check_func(ctx->hcScale, > "hscale_%d_to_%d__fs_%d_dstW_%d", ctx->srcBpc, ctx->dstBpc + 1, width, > ctx->dstW)) { > memset(dst0, 0, SRC_PIXELS * sizeof(dst0[0])); > -- > 2.37.0.170.g444d1eabd0-goog > > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".