From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 030B84C6AC for ; Mon, 4 Aug 2025 17:19:13 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 6A48D687BE5; Mon, 4 Aug 2025 20:19:09 +0300 (EEST) Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id BAFC6687D97 for ; Mon, 4 Aug 2025 20:19:02 +0300 (EEST) Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-23fc4b64b6eso6067555ad.3 for ; Mon, 04 Aug 2025 10:19:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1754327941; x=1754932741; darn=ffmpeg.org; h=content-transfer-encoding:mime-version:message-id:references :in-reply-to:subject:to:from:date:from:to:cc:subject:date:message-id :reply-to; bh=W46dB1VHieeUnpXb6PFlPJKqwXq3Y0bD8IUybwnZKs0=; b=ZGbAYaST9rwx3eRj1riqhefilJhfwdVHMckPHmTf4/crPFAgWuf9T8BI/8EJ499Ga/ oaKSkCiNVdZ5GwVahBb8BqiL5HOQYSZ9wPP9Ey9TL71qegyxD0eMIu0pLabZrF0E/F/+ vBPu6GDz+yqJyithVFQcV9Xj76S6erYFZaQ25JD94hrcxPJkagmDmyvrPp3EwEsyRynf BsahWWNg9YQ/rC5PwIoeOuZ/sJOvEE7cPNb+lABgg/odcgLJLS+yvfaYrRowp6FMCqGT JU8JCL1YDMBVKb6fHPxKJ1KN4a20aaZEh2ByFqofy2kL4/dTZZsuA2Wifv9j0oKB6ks1 lmcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754327941; x=1754932741; h=content-transfer-encoding:mime-version:message-id:references :in-reply-to:subject:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W46dB1VHieeUnpXb6PFlPJKqwXq3Y0bD8IUybwnZKs0=; b=EENzLjrqajff66WYje8QeZ8HtUY1j+W8UDdKUVLjPJQ8YXK2NRdIJIFQ0SHKT0mNa6 9VkPqnXe5U2D8yXncufTl8hUwnIURmGovdJG3REBWC+DI8iAx40gt6rFzNQI966ELBXd ky3iPUM0DlY4iUP+GNQ1IkO2duPbcAbaNfY2IiD+gA4Yobs99YjauIlZi/WxC9+tOkut DygVlV7/dQXGTAIQx7hkkTwmDaFapm+4ujtyJDhSWLgyprbOuNPqft6YlsujvtuSIW83 UjgIZuls2P+wQ8E2eEvHk1NrhC/adQplfYJn22aJDFY1YGCX5f1a9BECUy4FIakI1D7/ LD3A== X-Gm-Message-State: AOJu0YzIXYIZAzgTh+bVBHMi2ne8b2gB01WoYQlwIZJdlysAtsWr+ePb QYFjzCEdwlXw7qMFdgIv7JCO7i4O945wfaARGVSU/CC2g0uZ/7ZBy/9rM1Khr3jR X-Gm-Gg: ASbGncs/IwxffWUyeRkZSN8RTsprIk52VaMGT0Ac36J8qmPcnKg/1rG0rMEH+P9p22c FzGEuoHzJBXL9mh8y+3pciKROMJO85drY0Dodsz7f2jJNp7dYSnwY5A0oK6vcxR9/HM3q3nmF0F l86FNH8l6VqXVUB9u9TBwkXpRqNgrFOSAWwFk27Nsu7lp7hJEgfdsBACLXw3Zfu48ODjYn/+mV1 8h0KmhoubSMJdAw1gz8v39gnMHwtG107qQtYU9iNQ4KOmCgtNhWQ4reJsz30dGFm6jXvvSg7zI/ kah1PhYeWEAlbdl8d9U2z3T3Z3EyO37bsl/vmDEQ8mFM8II/1Gq4bD6TMTuAipST3NBttoVv+Nz AMBAQ01MsSuetX9Lp8SuCdJuyB5tGXwflPpzUETtP1uhXeD+/TQ36JWCWErbbaIpwN2oEd3xWYk 4l2Q== X-Google-Smtp-Source: AGHT+IGiGPmdy3jNi6SlVT4Ovkg8ove3VVyu2QZUAqsbldEWQodE4uPpomYCcyAuPCK/+bICP9J4ng== X-Received: by 2002:a17:903:1251:b0:240:725d:dd66 with SMTP id d9443c01a7336-24247032340mr60951825ad.11.1754327940591; Mon, 04 Aug 2025 10:19:00 -0700 (PDT) Received: from ehlo.thunderbird.net (syn-075-139-184-057.res.spectrum.com. [75.139.184.57]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-241d1f0f576sm113421335ad.47.2025.08.04.10.18.59 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Aug 2025 10:19:00 -0700 (PDT) Date: Mon, 04 Aug 2025 10:19:00 -0700 From: Jacob Lifshay To: FFmpeg development discussions and patches In-Reply-To: <20250804135035.465073-1-alankelly@google.com> References: <20250804135035.465073-1-alankelly@google.com> Message-ID: <064B6262-D88B-4C8B-A9B2-26725F600064@gmail.com> MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH] swscale: Break loop-carried dependency enabling parallel out of order execution of the gathers. X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On August 4, 2025 6:49:20 AM PDT, Alan Kelly via ffmpeg-devel wrote: > The gather is unmasked but the instruction does a merge into ymm4, which > depends on the value of ymm4 from the previous loop iteration. The > out-of-order scheduler does not know statically that the instruction is > fully unmasked, preventing parallel out-of-order execution of the > gathers. > --- > libswscale/x86/scale_avx2.asm | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm > index b4b852d60b..90ee8b0a0e 100644 > --- a/libswscale/x86/scale_avx2.asm > +++ b/libswscale/x86/scale_avx2.asm > @@ -68,8 +68,10 @@ cglobal hscale8to15_%1, 7, 9, 16, pos0, dst, w, srcmem, filter, fltpos, fltsize, > .innerloop: > %endif > vpcmpeqd m13, m13 > + pxor m3, m3 ; break loop-carried dependency this is in AVX2 code, so you should use vpxor since pxor will just clear the lower 128 bits and leave the upper 128 bits unmodified. actually, on some older intel cpus it will cause a huge stall due to not being v-prefixed: https://stackoverflow.com/questions/41303780/why-is-this-sse-code-6-times-slower-without-vzeroupper-on-skylake/41349852#41349852 Jacob _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".