From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id AE9DF44C57 for ; Mon, 14 Nov 2022 11:34:45 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 30DC468BDCC; Mon, 14 Nov 2022 13:34:42 +0200 (EET) Received: from mail-oa1-f46.google.com (mail-oa1-f46.google.com [209.85.160.46]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 89D7768BD92 for ; Mon, 14 Nov 2022 13:34:36 +0200 (EET) Received: by mail-oa1-f46.google.com with SMTP id 586e51a60fabf-13ba86b5ac0so12171742fac.1 for ; Mon, 14 Nov 2022 03:34:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=SP7cSE+lYmLTwjNaPcm8nw97zKNY1pwCMB6Y5J1FNJI=; b=LwPq2qD8HmOTryRBmVaFxgxIYDdWVw4edZf2ld06RjQAKeKpnAXVLFfw9C+zv6JjOj rcZZmoOLd/Jd3hxEIrJ0Yo3GyVKDWi6uBDVW434Zwyd8jQCpGSsgBHyGdjf+87jK1dQL j5vhLgHLrZzt0brnQ5IQmo0kOkU62AlrFHP0eVsjPFrggaU4gljNupHJWUiin44+2Yiv p6xjwuNhms5dXFMH9Pr0/gyXZl5dnLdwNXLkLwEfBq236bPPpx8sn3btxDp24hpqf3wC IMS3dkGn9tubYOvyuVpRmjLSuCnRTEXEKF3YSNViZvhuWedeOc/OGWYGKPoQYs2y7oiz UvwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SP7cSE+lYmLTwjNaPcm8nw97zKNY1pwCMB6Y5J1FNJI=; b=pBwT+6A6ATP6kb2Pv5HDmnN46wzcugG9Lk84OqXXjJ0R3gJBDgm1JYkdR6am/lIiHQ Z0/LliD5MDf7q3/FQiqQjdcJlTsMz9d7nSDFiJtV0TBo4S0ZIgqEUPm+78jWIltNUHes SfmzfYTjPcW5njA4ElFUX/0M/tgJLEQsih9EKyib5J6vCRnoiaS1aPpjyGi5tkv6NA+K CXOFV6+9BN1MtmQ94FT4jxXS2YWYJ8XMTzVWoxvSKI3fz0+Ais4bWxzWe5f4Gx/X5oCl 10fmZ/kpDx0aj4v/pNDE03kPTv4U7UfIBs5rYVjNMI06EK7mW+OR7Cxf8OUK3Ummkadi bKVw== X-Gm-Message-State: ANoB5pkLr3rwHqkVyOKbIeGiWxpFKGnfkyC6NXwZIRRoan5swkJmks0C 1U3XB+eLxfTT9vBVgdSTmiUvxXJ3AHk= X-Google-Smtp-Source: AA0mqf7M8GMu3LS41GOHGplEmFEHQOZk2zboauoGOIIpu+cdpmhOU0P1cQ48bNz/nFMxWauigVrNrw== X-Received: by 2002:a05:6870:aa8f:b0:13d:3935:d06e with SMTP id gr15-20020a056870aa8f00b0013d3935d06emr6394760oab.197.1668425674616; Mon, 14 Nov 2022 03:34:34 -0800 (PST) Received: from [192.168.0.15] ([181.85.72.69]) by smtp.gmail.com with ESMTPSA id y1-20020a4aaa41000000b0049e9a80c690sm3496360oom.1.2022.11.14.03.34.33 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 14 Nov 2022 03:34:34 -0800 (PST) Message-ID: Date: Mon, 14 Nov 2022 08:34:54 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 To: ffmpeg-devel@ffmpeg.org References: <20221104082925.25598-1-bin.wang@intel.com> Content-Language: en-US From: James Almer In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 11/14/2022 2:58 AM, Wang, Bin wrote: > -----Original Message----- > From: ffmpeg-devel On Behalf Of James Almer > Sent: Monday, November 14, 2022 10:43 AM > To: ffmpeg-devel@ffmpeg.org > Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI > > On 11/4/2022 5:29 AM, bin.wang-at-intel.com@ffmpeg.org wrote: >> +%macro FILTER_SOBEL 0 >> +%if UNIX64 >> +cglobal filter_sobel, 4, 15, 7, dst, width, matrix, ptr, c0, c1, c2, >> +c3, c4, c5, c6, c7, c8, r, x %else cglobal filter_sobel, 4, 15, 7, >> +dst, width, rdiv, bias, matrix, ptr, c0, c1, c2, c3, c4, c5, c6, c7, >> +c8, r, x %endif %if WIN64 >> + SWAP xmm0, xmm2 >> + SWAP xmm1, xmm3 >> + mov r2q, matrixmp >> + mov r3q, ptrmp >> + DEFINE_ARGS dst, width, matrix, ptr, c0, c1, c2, c3, c4, c5, c6, >> +c7, c8, r, x %endif >> + movsxdifnidn widthq, widthd >> + VBROADCASTSS m0, xmm0 >> + VBROADCASTSS m1, xmm1 > >> + This and every other xmm# case should instead be xm#, to ensure the swapping is taken into account. > > Sorry, I can't get your point, could you please help to explain why I have to use xm# to ensure the swapping operation(swap xmm# can't work in WIN64 asm)? And How to do it ? SWAP only affects the x86inc defined macros m#, xm#, ym#, and zm#, so those instructions above end up encoded as vbroadcastss zmm2, xmm0 and vbroadcastss zmm3, xmm1 on WIN64. In fact, now that i check it they end up as vbroadcastss zmm18, xmm0 and vbroadcastss zmm19, xmm1 because x86inc is purposely using the higher 16 regs with these macros on all targets to avoid having to call vzeroupper at the end. This works on unix64 by pure chance because the floats were effectively in xmm0 and xmm1 and all calculations then happen on m#, xm# and ym#. So you'll have to duplicate the VBROADCASTSS lines to broadcast xmm2 and xmm3 to m0 and m1 on WIN64 instead of using SWAP. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".