From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 2D9FE40352 for ; Mon, 20 Dec 2021 14:53:42 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 740EA68AE77; Mon, 20 Dec 2021 16:53:40 +0200 (EET) Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 57076689E10 for ; Mon, 20 Dec 2021 16:53:34 +0200 (EET) Received: by mail-qk1-f176.google.com with SMTP id d2so9501681qki.12 for ; Mon, 20 Dec 2021 06:53:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :references:from:in-reply-to:content-transfer-encoding; bh=ZPCQ2vxLHc/lsVnPTQQQfH4s6MBb8TpzDWTdu9YwZbg=; b=WU5ceCVNUoiPatRwBdXL5nieoiZc5NE1M08dMMXg/DCv1Ot6xkTfwbw9JPN9pWllNb 0KvchLX90Ryn4MrUGLY7yhXepwzFinBHsDEHcGR3xGFf+MoFSHpdMrf4HwgYiTwtlYSR 4J7f9Jw+BbEDmgFsW2EC7b8XTcGIt+ly00Ajur+XuYVhSN5urWI13XR0WYj5rf8N9S61 l4Tk+9aPTPi9UP7SHOl6ioo2JmPt2CwIu4zEryiFrL5gMHD0oJyBktez+tDqZhYm0LxM hNXNm/QtGzZvtY4yUZ06b7oSzh4EVsYYwrMrSIrTvmESIrrkPCwurBNa+tuiI4GBYFRe 9O1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=ZPCQ2vxLHc/lsVnPTQQQfH4s6MBb8TpzDWTdu9YwZbg=; b=HIoXixhT0ciNEz/qccM3fUySnlFS2Z1ApVPPpGYOmKtXg4e8csEW9Mrt43DnBKiJjc QyX5kjysufix0emgEmCNEDOQy3xkMZpeTjm/pHPr0Uqi4zawDmItTMDj2TXzcdqoit2g ebn+JgTxqzku83ZUNwFPRv3xvi3crUEnGaWBZzD6taaTN1MsUrjcBPq31chp0soyCSKl YebmAa5+tABUvj1UH1PxqwH3q3jhDPmf7hOAa0N35o7SXenjOWa9DtKLv37jhNXd2mA9 gJN+3SlhrsRnUiCCZj7EltPLbI/3Z0/iNmiG2SwfNygoNVt0cerv9PhPwUbxzOhlYtwP tADw== X-Gm-Message-State: AOAM532NjSXcU/a+qSCsDQ3y2Ujg8EQQBV9YH2yL/gNjzGh2Y6naaRgs v6Bb7oRxMug+Y+LI/tFdrwUaRIM618yJOQ== X-Google-Smtp-Source: ABdhPJyXbZqCPY/lAVjDcGZSgMN1w2F2D15wf5dqze0yMCIpP/eFFM1fWmsfXdlwFeKSYdZ6zow0Jg== X-Received: by 2002:a05:620a:4454:: with SMTP id w20mr9887717qkp.369.1640012012713; Mon, 20 Dec 2021 06:53:32 -0800 (PST) Received: from [192.168.0.13] ([181.170.250.138]) by smtp.gmail.com with ESMTPSA id w2sm15193991qta.11.2021.12.20.06.53.31 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 20 Dec 2021 06:53:32 -0800 (PST) Message-ID: Date: Mon, 20 Dec 2021 11:53:30 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20211220135627.615097-1-alankelly@google.com> <8424d6a1-df63-954e-6823-740bf1fcb891@gmail.com> <20211220144312.738559-1-alankelly@google.com> From: James Almer In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Add AV_CPU_FLAG_SLOW_GATHER. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 12/20/2021 11:47 AM, Lynne wrote: > 20 Dec 2021, 15:43 by alankelly-at-google.com@ffmpeg.org: > >> This flag is set on Haswell and earlier and all AMD cpus. >> --- >> Removes unnecessary indentation, clarifies comment and only sets flag on AMD >> cpus with AVX2. >> libavutil/cpu.h | 1 + >> libavutil/x86/cpu.c | 14 +++++++++++++- >> 2 files changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/libavutil/cpu.h b/libavutil/cpu.h >> index ae443eccad..ce9bf14bf7 100644 >> --- a/libavutil/cpu.h >> +++ b/libavutil/cpu.h >> @@ -54,6 +54,7 @@ >> #define AV_CPU_FLAG_BMI1 0x20000 ///< Bit Manipulation Instruction Set 1 >> #define AV_CPU_FLAG_BMI2 0x40000 ///< Bit Manipulation Instruction Set 2 >> #define AV_CPU_FLAG_AVX512 0x100000 ///< AVX-512 functions: requires OS support even if YMM/ZMM registers aren't used >> +#define AV_CPU_FLAG_SLOW_GATHER 0x2000000 ///< CPU has slow gathers. >> >> #define AV_CPU_FLAG_ALTIVEC 0x0001 ///< standard >> #define AV_CPU_FLAG_VSX 0x0002 ///< ISA 2.06 >> diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c >> index bcd41a50a2..563984f234 100644 >> --- a/libavutil/x86/cpu.c >> +++ b/libavutil/x86/cpu.c >> @@ -146,8 +146,16 @@ int ff_get_cpu_flags_x86(void) >> if (max_std_level >= 7) { >> cpuid(7, eax, ebx, ecx, edx); >> #if HAVE_AVX2 >> - if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) >> + if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)) { >> rval |= AV_CPU_FLAG_AVX2; >> + cpuid(1, eax, ebx, ecx, std_caps); >> + family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff); >> + model = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0); >> + /* Haswell has slow gather */ >> + if(family == 6 && model < 70) >> + rval |= AV_CPU_FLAG_SLOW_GATHER; >> + } >> + >> #if HAVE_AVX512 /* F, CD, BW, DQ, VL */ >> if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */ >> if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) == 0xd0030000) >> @@ -196,6 +204,10 @@ int ff_get_cpu_flags_x86(void) >> used unless explicitly disabled by checking AV_CPU_FLAG_AVXSLOW. */ >> if ((family == 0x15 || family == 0x16) && (rval & AV_CPU_FLAG_AVX)) >> rval |= AV_CPU_FLAG_AVXSLOW; >> + >> + /* AMD cpus have slow gather */ >> + if(rval & AV_CPU_FLAG_AVX2) >> + rval |= AV_CPU_FLAG_SLOW_GATHER; >> } >> > > No, I'd rather limit AMD CPUs to all currently released CPUs. > Future ones are getting AVX512, which did speed up gathers on > Intel CPUs, as the ISA extension extended gathers and addded > scatters. I wouldn't hold my breath for that, but it's probably a good idea anyway. A check so it's flagged only on Excavator and Zen <= 3. > > Also your previous patch introduces ff_shuffle_filter_coefficients() > which is so bad it pretty much needs a complete rewrite. > You're also not detecting malloc errors or propagating them back. That's unrelated to this patch. > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".