From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 20D774036D for ; Mon, 20 Dec 2021 15:01:23 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id BE89168AF54; Mon, 20 Dec 2021 17:01:21 +0200 (EET) Received: from mail-yb1-f177.google.com (mail-yb1-f177.google.com [209.85.219.177]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7B318680BA5 for ; Mon, 20 Dec 2021 17:01:15 +0200 (EET) Received: by mail-yb1-f177.google.com with SMTP id j2so29519151ybg.9 for ; Mon, 20 Dec 2021 07:01:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=167GaUe4PmxcMoE2JNRPfJirzojdgF+OpgrOLoVACPw=; b=XX3skZSgVs/A4/pPA547d31x16EQjl+Qq8luhOMwnAqOQTd1QhDmO2fg/rArDkAH9g sBmj9wuIO8UPzRc9EaQPoLq8ZJ9Cbx4X160LbgYriCbTm6s8oOOrVej/4vbCcsPLUqS0 5/LfCZkFFwyku+vWRwlmWcT4HTTqcjL/SR7IojBzyDKATR2NkvHHZyv1Q2wjeJno36EM /JIhgsX5X5O8vBGBPSUrUpXxfcKrJ0grmKi0Y5wWiV0XNxDDG28gRegn1gZ3R+VWqSkz iqudind0Gn2yIi5O2UViN0URxHGkwpiPm5DhIoCoTwtAxxWmi5zgUwJyfixkdhrWGRqf FO2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=167GaUe4PmxcMoE2JNRPfJirzojdgF+OpgrOLoVACPw=; b=N2D16uTxE39NRlo/JgZi3wdJqhvYiCBzkr03TYARMFSPNlXUVQumcqy6B3qjxLlAhL funwJTsIQY3B0FQCZUSx2v5kSUsd+Fj1ESAXd0UNCQ3Ml0msMtaLx7F51krzkGfIzyXo e1nNiyGXrVDigPwf4EbWXYtFD508nrPtnRiH/wsWyRfWljuXRbN3OtvIrDE16SeM9goC sKHCzfk5yllnI3Atz2rM76B2iZb29dP8BfeuW+GE7GEn1sl+nunudQZVDCwzt5WFEYYL jnedyRSB4DPs52ABRSJOmGE/R0RTehMtgRAuA/+y1Dxw2vOj1xErLWutiqHYz10S1fMU Xs4g== X-Gm-Message-State: AOAM5333uPJI2vgxkNcUyWcpRdhsK0CvJS/MeSF5ctJa0qFyeZSP8E+C RJ5VQTHHgGSJww52Fk/pL1O5LSUOmAOB5NCFmY8Uw/sbQoF1Fg== X-Google-Smtp-Source: ABdhPJx1vYZA5XaATG19uDwteJwxngHiV9uMrgMjuascQYngtX2ke4O65RmBaVS+L2uZNO7tE4zPJbBw1scdLKmVuik= X-Received: by 2002:a25:bfca:: with SMTP id q10mr21551952ybm.68.1640012473207; Mon, 20 Dec 2021 07:01:13 -0800 (PST) MIME-Version: 1.0 References: <20211220135627.615097-1-alankelly@google.com> <8424d6a1-df63-954e-6823-740bf1fcb891@gmail.com> <20211220144312.738559-1-alankelly@google.com> In-Reply-To: From: Alan Kelly Date: Mon, 20 Dec 2021 16:01:02 +0100 Message-ID: To: FFmpeg development discussions and patches X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Mon, Dec 20, 2021 at 3:53 PM James Almer wrote: > > > On 12/20/2021 11:47 AM, Lynne wrote: > > 20 Dec 2021, 15:43 by alankelly-at-google.com@ffmpeg.org: > > > >> This flag is set on Haswell and earlier and all AMD cpus. > >> --- > >> Removes unnecessary indentation, clarifies comment and only sets flag > on AMD > >> cpus with AVX2. > >> libavutil/cpu.h | 1 + > >> libavutil/x86/cpu.c | 14 +++++++++++++- > >> 2 files changed, 14 insertions(+), 1 deletion(-) > >> > >> diff --git a/libavutil/cpu.h b/libavutil/cpu.h > >> index ae443eccad..ce9bf14bf7 100644 > >> --- a/libavutil/cpu.h > >> +++ b/libavutil/cpu.h > >> @@ -54,6 +54,7 @@ > >> #define AV_CPU_FLAG_BMI1 0x20000 ///< Bit Manipulation > Instruction Set 1 > >> #define AV_CPU_FLAG_BMI2 0x40000 ///< Bit Manipulation > Instruction Set 2 > >> #define AV_CPU_FLAG_AVX512 0x100000 ///< AVX-512 functions: > requires OS support even if YMM/ZMM registers aren't used > >> +#define AV_CPU_FLAG_SLOW_GATHER 0x2000000 ///< CPU has slow gathers. > >> > >> #define AV_CPU_FLAG_ALTIVEC 0x0001 ///< standard > >> #define AV_CPU_FLAG_VSX 0x0002 ///< ISA 2.06 > >> diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c > >> index bcd41a50a2..563984f234 100644 > >> --- a/libavutil/x86/cpu.c > >> +++ b/libavutil/x86/cpu.c > >> @@ -146,8 +146,16 @@ int ff_get_cpu_flags_x86(void) > >> if (max_std_level >= 7) { > >> cpuid(7, eax, ebx, ecx, edx); > >> #if HAVE_AVX2 > >> - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) > >> + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) { > >> rval |= AV_CPU_FLAG_AVX2; > >> + cpuid(1, eax, ebx, ecx, std_caps); > >> + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); > >> + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); > >> + /* Haswell has slow gather */ > >> + if(family == 6 && model < 70) > >> + rval |= AV_CPU_FLAG_SLOW_GATHER; > >> + } > >> + > >> #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ > >> if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ > >> if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000) > >> @@ -196,6 +204,10 @@ int ff_get_cpu_flags_x86(void) > >> used unless explicitly disabled by checking AV_CPU_FLAG_AVXSLOW. */ > >> if ((family == 0x15 || family == 0x16) && (rval & AV_CPU_FLAG_AVX)) > >> rval |= AV_CPU_FLAG_AVXSLOW; > >> + > >> + /* AMD cpus have slow gather */ > >> + if(rval & AV_CPU_FLAG_AVX2) > >> + rval |= AV_CPU_FLAG_SLOW_GATHER; > >> } > >> > > > > No, I'd rather limit AMD CPUs to all currently released CPUs. > > Future ones are getting AVX512, which did speed up gathers on > > Intel CPUs, as the ISA extension extended gathers and addded > > scatters. > > I wouldn't hold my breath for that, but it's probably a good idea > anyway. A check so it's flagged only on Excavator and Zen <= 3. > > > > > Also your previous patch introduces ff_shuffle_filter_coefficients() > > which is so bad it pretty much needs a complete rewrite. > > You're also not detecting malloc errors or propagating them back. > > That's unrelated to this patch. > > > > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > Updated patch sent with check for family <= 25 so that future CPUs will have avx2 hscale enabled by default. I may have time this week to look at ff_shuffle_filter_coefficients. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".