From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 455AF44C75 for ; Mon, 14 Nov 2022 14:31:40 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 5CECF68BDF6; Mon, 14 Nov 2022 16:31:37 +0200 (EET) Received: from mail-oi1-f172.google.com (mail-oi1-f172.google.com [209.85.167.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6536F68BDAC for ; Mon, 14 Nov 2022 16:31:31 +0200 (EET) Received: by mail-oi1-f172.google.com with SMTP id b124so11556028oia.4 for ; Mon, 14 Nov 2022 06:31:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=E3OWdxYgZP1+gAR2ywRFVzWN4a6Q1CFYHMaow3pBGxA=; b=Qvso2M5QwuM2LbMhYTi8S8e8HSor2QTtP7R9sUv0EcqN8Rrkex3CMypHr6at3S49Qh m+dIzf8LteD36ZVciRvafJ8JLe4P8wxkJ6DBaTWDe7ShQDoMkxEvZaUky9CPckgk4AE2 McTS/3JHBEQ1ebWaFuzzB31WZGsskwl2pdLsEI7yv1okhHaqIIKIkg+f735/gTq7Wk06 gU2B0vIhnjlYtgvWNzWhb+59RFPLWg0zDk82yPTG8VvUn2+mGIw2QYnx5sCEzJCDQeeZ q/nuIv0U71RBM6zzLtK7IeXMNCzWpSSPqWypzywKNhichqfoXwlgpg0tc8isiFjeEFju f6xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=E3OWdxYgZP1+gAR2ywRFVzWN4a6Q1CFYHMaow3pBGxA=; b=g0+wCBA5SfbWgBA4gEzrtriY/2zeh6oJGZ1xiGZOb0ngjYyWn3RL2kv/3SrQbTThxz QIKI1ANba6NGqEnv8blN5OeKLUeE5k/VZyZhQwNIkRh2GT//SzVpIcGguPtI11hn79l2 lCiaomkGlCLm6/mUzj11bRJ753bX2V6UTII4mgmuqPQeaEUuL4VeAUNwIo7uXyi3gpdL oJADP8fIhlV9ueJ5Psle1aQr2amP/R8S6W6gbjSBKwVZgHSfwjaPrxr91XYV3YY93Y2w Fte8TDZY6mRGBCXf6eA4c3SO9J2aTYNvclxT0VE3TWif3Nj9gMNZ8Jayd/ug7fzS/QBx BYXg== X-Gm-Message-State: ANoB5pkdC73EjVuh1aTIcMZG2fMozx4ukbPuncZWdEFiIZVO7YikcRhl G6J/wj9fXAsSLXhyMGtj0HQkkZ6DsFY= X-Google-Smtp-Source: AA0mqf4ci0xgQhcr5Mas1AIX1FUjzjH45y6RydUyZLcS5GzIcZFE7BBmddBiyIoQMKIF7wKMFkakmg== X-Received: by 2002:a05:6808:120c:b0:35a:7c1e:7828 with SMTP id a12-20020a056808120c00b0035a7c1e7828mr5872879oil.263.1668436289522; Mon, 14 Nov 2022 06:31:29 -0800 (PST) Received: from [192.168.0.15] ([181.85.72.69]) by smtp.gmail.com with ESMTPSA id o2-20020a4aa802000000b0047537233dfasm3623861oom.21.2022.11.14.06.31.28 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 14 Nov 2022 06:31:29 -0800 (PST) Message-ID: <0325e6b2-782c-1891-277a-a40d88039f17@gmail.com> Date: Mon, 14 Nov 2022 11:31:49 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20221104082925.25598-1-bin.wang@intel.com> <345301fa-4a3d-f131-4974-53b65b7c2adb@gmail.com> From: James Almer In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 11/14/2022 10:54 AM, Wang, Bin wrote: > > >> -----Original Message----- >> From: ffmpeg-devel On Behalf Of James >> Almer >> Sent: Monday, November 14, 2022 9:36 PM >> To: ffmpeg-devel@ffmpeg.org >> Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add >> sobel filter optimization and unit test with intel AVX512 VNNI >> >> On 11/14/2022 10:30 AM, Wang, Bin wrote: >>>> By using xmm# you're not taking into account any x86inc SWAPing, so >>>> this is using xmm0 and xmm1 where the single scalar float input >>>> arguments reside (at least on unix64), instead of xm0 and xm1 (xmm16 >>>> and xmm17) where the broadcasted scalars were stored. >>>> This, again, only worked by chance on unix64 because you're using >>>> scalar fmadd, and shouldn't work at all on win64. >>>> >>>> Also, all these as is are being encoded as VEX, not EVEX, but it >>>> should be fine leaving them untouched instead of using xm#, since >>>> they will be shorter (five bytes instead of six for some) by using the lower, >> non callee-saved regs. >>> >>> Thanks for the help. I'm not familiar with WIN64 asm. So what I need to do is >> change the WIN64 swap from: >>> SWAP xmm0, xmm2 >>> SWAP xmm1, xmm3 >>> To: >>> VBROADCASTSS m0, xmm2 >>> VBROADCASTSS m1, xmm3 >>> >>> Is that correct? >> >> Yes, that will ultimately broadcast the two scalars in xmm2 and xmm3 to >> zmm16 and zmm17. >> After that what you need to do is either change the fmaddss instruction to use >> xm0 and xm1 macros instead of xmm0 and xmm1 (so xmm16 and xmm17 with >> EVEX encoding is used), or much like the broadcast above use xmm2 and xmm3 >> explicitly on win64, so it remains VEX encoded. > > So, to fix the issue, does this 2 changes looks good for you? > First change the WIN64 swap from: > SWAP xmm0, xmm2 > SWAP xmm1, xmm3 > To: > VBROADCASTSS m0, xmm2 > VBROADCASTSS m1, xmm3 > > Second change the fmaddss from: > fmaddss xmm4, xmm4, xmm0, xmm1 > To: > fmaddss xmm4, xmm4, xm0, xm1 Yes. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".