From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 0A35740353 for ; Mon, 20 Dec 2021 14:47:22 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id A9F9168AE9C; Mon, 20 Dec 2021 16:47:21 +0200 (EET) Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 70D37680BAD for ; Mon, 20 Dec 2021 16:47:15 +0200 (EET) Received: from w3.tutanota.de (unknown [192.168.1.164]) by w4.tutanota.de (Postfix) with ESMTP id 1742D10601E1 for ; Mon, 20 Dec 2021 14:47:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1640011633; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:References:Sender; bh=LdAEF5oz70psZxaR/GakjMXzEVwnn4IKKxL0kZAoKk4=; b=NGD6ViKQfyR9twZsk9dja8eZMItYKj+NXiQXjuk5chm9wQHjzzTpmsvsVlU2t/cF yPhyMMJEP5wcI2+lcGjkW5U5SPWrN92MJFaCvWZY9qUbw8TiXhRMiLHKEKYOUTo7E9S TKaAXdpHzD8NBoNz5221n8q+P3cY2AZF3dU/+NCrm94clKuetlu4U1iWwHoWfFBLyH1 RjysPgTh/HmLAJkW88dABh2wb+gMuBDt7BpAZnuEeQDcwGrAWPnA6pCfeNmSYDOojZv +q65F9ZItmTtCAbsR0ENEVQZpIjYIRjvRy7jpangnKHAflhl/8Kzx/KKCMivhLYLGqW QRYyBAcQ/A== Date: Mon, 20 Dec 2021 15:47:13 +0100 (CET) From: Lynne To: FFmpeg development discussions and patches Message-ID: In-Reply-To: <20211220144312.738559-1-alankelly@google.com> References: <20211220135627.615097-1-alankelly@google.com> <8424d6a1-df63-954e-6823-740bf1fcb891@gmail.com> <20211220144312.738559-1-alankelly@google.com> MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: 20 Dec 2021, 15:43 by alankelly-at-google.com@ffmpeg.org: > This flag is set on Haswell and earlier and all AMD cpus. > --- > Removes unnecessary indentation, clarifies comment and only sets flag on AMD > cpus with AVX2. > libavutil/cpu.h | 1 + > libavutil/x86/cpu.c | 14 +++++++++++++- > 2 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/libavutil/cpu.h b/libavutil/cpu.h > index ae443eccad..ce9bf14bf7 100644 > --- a/libavutil/cpu.h > +++ b/libavutil/cpu.h > @@ -54,6 +54,7 @@ > #define AV_CPU_FLAG_BMI1 0x20000 ///< Bit Manipulation Instruction Set 1 > #define AV_CPU_FLAG_BMI2 0x40000 ///< Bit Manipulation Instruction Set 2 > #define AV_CPU_FLAG_AVX512 0x100000 ///< AVX-512 functions: requires OS support even if YMM/ZMM registers aren't used > +#define AV_CPU_FLAG_SLOW_GATHER 0x2000000 ///< CPU has slow gathers. > > #define AV_CPU_FLAG_ALTIVEC 0x0001 ///< standard > #define AV_CPU_FLAG_VSX 0x0002 ///< ISA 2.06 > diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c > index bcd41a50a2..563984f234 100644 > --- a/libavutil/x86/cpu.c > +++ b/libavutil/x86/cpu.c > @@ -146,8 +146,16 @@ int ff_get_cpu_flags_x86(void) > if (max_std_level >= 7) { > cpuid(7, eax, ebx, ecx, edx); > #if HAVE_AVX2 > - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) > + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) { > rval |= AV_CPU_FLAG_AVX2; > + cpuid(1, eax, ebx, ecx, std_caps); > + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); > + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); > + /* Haswell has slow gather */ > + if(family == 6 && model < 70) > + rval |= AV_CPU_FLAG_SLOW_GATHER; > + } > + > #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ > if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ > if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000) > @@ -196,6 +204,10 @@ int ff_get_cpu_flags_x86(void) > used unless explicitly disabled by checking AV_CPU_FLAG_AVXSLOW. */ > if ((family == 0x15 || family == 0x16) && (rval & AV_CPU_FLAG_AVX)) > rval |= AV_CPU_FLAG_AVXSLOW; > + > + /* AMD cpus have slow gather */ > + if(rval & AV_CPU_FLAG_AVX2) > + rval |= AV_CPU_FLAG_SLOW_GATHER; > } > No, I'd rather limit AMD CPUs to all currently released CPUs. Future ones are getting AVX512, which did speed up gathers on Intel CPUs, as the ISA extension extended gathers and addded scatters. Also your previous patch introduces ff_shuffle_filter_coefficients() which is so bad it pretty much needs a complete rewrite. You're also not detecting malloc errors or propagating them back. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".