From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id EBF5C44C8D for ; Mon, 14 Nov 2022 15:22:56 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 4CF9568BE49; Mon, 14 Nov 2022 17:22:55 +0200 (EET) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 6BF6468BD46 for ; Mon, 14 Nov 2022 17:22:48 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668439373; x=1699975373; h=from:to:subject:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version; bh=4FwoGVh3IxoEmf5ZDR39Cc0+51tz1h5Ci1lLF3MfzI0=; b=JLy2pergYU7xfhvwGJaBBpvWoakhTZtjWDaevt77DoQB/u1/AWsEuH8H VFZOwqubATnYYZcldeAJy8C6a5JGR0zg9gQZTTQt97WoKIHKsFnj+zBDS kwTY4+uJShU0u2GCH2fpVkGN8n7RkhNb0nlUd4s12kmGe+JbkGCqVD/1C Y7SbnOKy4aztk2iinARSczWecT33UGzHlJGMlJ08MAsoLfnJ9ACBQ7IQW wJcdMPgh43QQHZggDSVG8b/SShdiFRXKr/7VzJ5aIUXYTawtIcKwzRgx3 14+8vDlzNgmOp+UQmtBDXUnp4A6E1SOxsJLTx/4Jgp8xHqzgt6gKECpal A==; X-IronPort-AV: E=McAfee;i="6500,9779,10531"; a="295358688" X-IronPort-AV: E=Sophos;i="5.96,164,1665471600"; d="scan'208";a="295358688" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2022 07:19:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10531"; a="616346372" X-IronPort-AV: E=Sophos;i="5.96,164,1665471600"; d="scan'208";a="616346372" Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by orsmga006.jf.intel.com with ESMTP; 14 Nov 2022 07:19:03 -0800 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 14 Nov 2022 07:19:03 -0800 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31 via Frontend Transport; Mon, 14 Nov 2022 07:19:03 -0800 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (104.47.73.46) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.31; Mon, 14 Nov 2022 07:19:00 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BkMBME2fDVEEB6Q0OxGI5i2Z2l9Sh10sK7WcXvUEgtg8sWGcdxPUNSi2MSlirG3IXxs9W/b+7/okBMgxFsG+r+JdmNeoXMsD5WsoY5cmafu5nGc0an7uqtwqSpmYCIJoOcvKASReRnyLBIRy0dKsA1mDCXYMKSIKayQjgWgHFu86MWuV7eWCwqgKo8uG0tMdCh2MXgVeRuhnw8Kt5BybxchNco+qyADJoKAmZw2EjSIrEgE3NbN3fSyw9uNRhsYGv7HEc+ohJtISAR9cK0W2zBKJzYYnTdC2KXB/ypORkVjoPy3ThqhTMRdYcrNZjP3Rsxf2Lpn0/t9w5RBwn0hArg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=fZ/HmGMKer0QUu16muQYPprLJRtpc7WlsxkIAeu+ldM=; b=kcf6LZnEUaiDBcM6v3+/hMhOkqgdZ48qVH0cLBqNJfn0q75X7xBsRnxbMHcznvb2QAchBvfH9znpZGNgBO9Y2z33htDK0jy7AoqyTKhii8fBOxkWm9Q3DKBIzJMUBU2K8w6SmzOBmu3AvtWaheQV6h1lAM47UdLMDzBaqe/iUoW5JrhPiGfknn1m61Lx9vvY2lmW6ir606X5tdkh/BUeIo5xDlFB8JYGe1PaTFFQEe6+37zjXddNzFcFjnZ/W1yJtg01n94jD6Nhq0KomRoibVgNBoMnWP1OJyYmzJVO7ZhD1WjmrGE09ZTFouAaf5ejWGBMI6rsddp1usdmgIuAwA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from BN6PR11MB1746.namprd11.prod.outlook.com (2603:10b6:404:fb::20) by SA2PR11MB5082.namprd11.prod.outlook.com (2603:10b6:806:115::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.17; Mon, 14 Nov 2022 15:18:58 +0000 Received: from BN6PR11MB1746.namprd11.prod.outlook.com ([fe80::1565:8f45:a36f:8704]) by BN6PR11MB1746.namprd11.prod.outlook.com ([fe80::1565:8f45:a36f:8704%9]) with mapi id 15.20.5813.017; Mon, 14 Nov 2022 15:18:58 +0000 From: "Wang, Bin" To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI Thread-Index: AQHY8Cqnk334s11ZYkKsTF5Sp70+yK4+cBGAgAAG/hCAAAScgIAAAwmggAAMn4CAAAs1wA== Date: Mon, 14 Nov 2022 15:18:58 +0000 Message-ID: References: <20221104082925.25598-1-bin.wang@intel.com> <345301fa-4a3d-f131-4974-53b65b7c2adb@gmail.com> <0325e6b2-782c-1891-277a-a40d88039f17@gmail.com> In-Reply-To: <0325e6b2-782c-1891-277a-a40d88039f17@gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: BN6PR11MB1746:EE_|SA2PR11MB5082:EE_ x-ms-office365-filtering-correlation-id: f04cb5f9-b272-4e79-73b5-08dac6538d6c x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: jdRu/9yBruqbgK6UI3pVY2I1GBoKwfuhKKlqEdJj6V/Sv2LZgmxGZXrA4YFdEinFoCSXpIYDCWqRx66mdrAi8eZP+q+xElrDLaneZmkO3oi6NoKQlP6OvfuFW6NEhhy4DMFwXkz7KiTk5iJ2XfaXNzwTz8p3KZFWDFRbKfN70s44535lNZi1M0TTdcpYRHBuUJ/yuIUWX0ogDf78VvTpaS2jdI+cva5PqCcDde61/uff1gyAFAbOX7WqZ1wr6Rig++z0rHOu/zme805O/GaGHlzOeLF/yQ2+ln4T2oUJnXZfWyFvCj2OPx65C09akIoDb7Xaor5UxNinJ1OLUlJywdr818tMWacJEPoIMUrfiP+P500ancCo8kkaOC1xCYobyBaAzANJEEwig5MCAUw0A+PXXvNpCU8Hwoa3Cvqck0AULFw+y3Yt2Zz98AnwQM+PagN1xeJvDd1voNQMcFunCLmmRbZ9lDzIK3QXNaKt4Xp9PMtFe09kCVDZVoZSWUG/pMNRkKCtLPUXBj8D+EqX+Xd9zwBa58PgJO7FfEAOeLmbUTm7xtxSYT3fvEFlP4BbJpoFE25FtHY1s2hxrOOm/I+fYvD+VFj+vc6xA1PvrAtzkM4PQWd4GlxrsOOAOJa7KKATuameuwu9XHwPhPrJvmk/2tZM473JFsCWIpiKlqykNEHEaZ2GnocyIlSAv9QieIKp9roihBzzz2NWHOUaThUXY5PKptdz+D4/+frkkCvsC55bxXyNFUPbFx0VZib5HFfFwjQZU8LrmGh6VRDzDl+/kXcglOCwbw7wmhKnEKY= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN6PR11MB1746.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(39860400002)(346002)(366004)(136003)(376002)(396003)(451199015)(6506007)(6916009)(7696005)(316002)(53546011)(41300700001)(2906002)(83380400001)(186003)(52536014)(5660300002)(8936002)(26005)(66946007)(66476007)(64756008)(76116006)(66446008)(66556008)(8676002)(9686003)(33656002)(55016003)(66899015)(478600001)(38070700005)(966005)(38100700002)(86362001)(122000001)(82960400001)(71200400001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?9jWUHozXutaUSFXJ6JmCU/+WSeCpfnodzL4swQstFJAwPa6lDEdRCSib15ql?= =?us-ascii?Q?8586MxdNy0qsol7cKqk8z6zuw9eDDfJobsu2zaXWXAHIfAfn5ExUOy/b1QW7?= =?us-ascii?Q?5m8yDG5tKrjfJIPCawDeb2TB69nFerLpPcGXSR5Tu9ajJfjxqUdzhXEdn9oA?= =?us-ascii?Q?nWIDjeizyCGhLY0UuezgixiOkUCwFeB4UIEczQdjwvlZ+b1yMA4Iln2Z93/G?= =?us-ascii?Q?rqfKr7+cAAP5D8flptF2Vo15e6HaTwXAN4Oq8jV5fe5anR46vdJr9sz7hlRW?= =?us-ascii?Q?sjaRsNepbilt0L+yySImui9LY6mraS/bmUK+5dl/JNpUz+JXEf4+tTjiJ4Dp?= =?us-ascii?Q?N9p64xuQxPh1G3sBird7wantxHWgO8u8iNuiJNHbNMpwqFDCQm2HDKutdxzt?= =?us-ascii?Q?V10T2lrSy4UmULHsoXJ/pCFDHitB6jIO0NkqWw2ttSLqwpFxgo3gGTqt4nKp?= =?us-ascii?Q?qpeq8yjhhch581O6TzU87zZIwIluqeoh/nqq0ePqlj0PGcWqCBSPnLT0aCmI?= =?us-ascii?Q?KQ9t3kVvIKAaxz0F4CTaWy3iLlxyH+99RWyLS8mufn8zzkxFIfERZdDNVdwS?= =?us-ascii?Q?ww6KPwx2Rx3fHyjEdAcT7iWqQ4ZEUcGq2Zj8vwhG9YaeDREtTownTLVAXCtg?= =?us-ascii?Q?r4p8kTYZL7lb74SHLm1VBoo2qaXodn0SAibgvxVhJdDxs4hX4GasXOWzKIOz?= =?us-ascii?Q?b8+mEhpxmovHqeW3rvqXC4wczsYGoeG72L4sCdV7R7X7vm12bN/rsv1jodNo?= =?us-ascii?Q?CpLFwVeZun0eqrOL2FGqvwh4FzcWX574SvZCHH/7lmwDBi2Y6Os+/xGhohGg?= =?us-ascii?Q?oco3LZxup09poTBMLwBN61XwwH9F7wUKxsQ6fcELf8UFZGBjYuj6cCNnuvGm?= =?us-ascii?Q?SmDqPWuRZ7H9ofvAAZEim/BNgfphk2YsJhQJmDFeZbpYtTeYDXfZp+qN6a/i?= =?us-ascii?Q?S8Tmvz51XfP6+NrWn8n3NThaabAA+fUj6wudNWWZdZTkKtlI7iGkAfJNBc9r?= =?us-ascii?Q?uNLAE1IYrXl4T23n/qynwibgcxp1raIl0m8Lo6haK91jZ4jNwAivp5D3//ih?= =?us-ascii?Q?dGaDPPpk4r2jTTTw7CnBQ44TAYGjR5w+DNloNEe2uU99WB9I23NLWENnAEi/?= =?us-ascii?Q?tsDH20zJof+R0dyt+3XZ0CVj2R/UaTBAQToz+hhh0OMSTnZj7FUm87BEPKCq?= =?us-ascii?Q?BxS40e0En3xgQ+Qnrx53tjxJHngYiLbRX6gbBuradLv9yxBayLsS4Etn1FlA?= =?us-ascii?Q?JaH+MBuTPnAHIyi8lxdbdFvppckjp0x9fR6U/gd0QLeEBajJdHQlJ5mgbDn/?= =?us-ascii?Q?oWzYJFu2s3atpHQlf+0hcFAKJ4pifQSpmbyxCr9WtspkwaoT8TKwoZx/Ko07?= =?us-ascii?Q?sJ8bhGrOYOHW2388KylIn9zkP0GzuPelvycEb+mMjPs5U4WzhHAZF8c38lYI?= =?us-ascii?Q?DQ4F/wkZkcvnEJxk2bZ6MVuHYh/ladT2RNe860/fl8QA817yZUUujJWyiVOM?= =?us-ascii?Q?azmyGwxliDVdyZ8PKn21O+1AyB0Hot0TNeM9fjEEk8XiExrhrWZvEaIgHbEV?= =?us-ascii?Q?3Fcn8URqV1yAUPvF00Q=3D?= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN6PR11MB1746.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: f04cb5f9-b272-4e79-73b5-08dac6538d6c X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Nov 2022 15:18:58.3470 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: s1rrcQwTWFQQMZv8KDOvoA9U5D6W37M7VKeHzDNRoElHc8IuCJ99GTKa74nuX0Xz98Wi+yk9np+Sbgiv9DAzAg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA2PR11MB5082 X-OriginatorOrg: intel.com Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: > >> On 11/14/2022 10:30 AM, Wang, Bin wrote: > >>>> By using xmm# you're not taking into account any x86inc SWAPing, so > >>>> this is using xmm0 and xmm1 where the single scalar float input > >>>> arguments reside (at least on unix64), instead of xm0 and xm1 > >>>> (xmm16 and xmm17) where the broadcasted scalars were stored. > >>>> This, again, only worked by chance on unix64 because you're using > >>>> scalar fmadd, and shouldn't work at all on win64. > >>>> > >>>> Also, all these as is are being encoded as VEX, not EVEX, but it > >>>> should be fine leaving them untouched instead of using xm#, since > >>>> they will be shorter (five bytes instead of six for some) by using > >>>> the lower, > >> non callee-saved regs. > >>> > >>> Thanks for the help. I'm not familiar with WIN64 asm. So what I need > >>> to do is > >> change the WIN64 swap from: > >>> SWAP xmm0, xmm2 > >>> SWAP xmm1, xmm3 > >>> To: > >>> VBROADCASTSS m0, xmm2 > >>> VBROADCASTSS m1, xmm3 > >>> > >>> Is that correct? > >> > >> Yes, that will ultimately broadcast the two scalars in xmm2 and xmm3 > >> to > >> zmm16 and zmm17. > >> After that what you need to do is either change the fmaddss > >> instruction to use > >> xm0 and xm1 macros instead of xmm0 and xmm1 (so xmm16 and xmm17 > with > >> EVEX encoding is used), or much like the broadcast above use xmm2 and > >> xmm3 explicitly on win64, so it remains VEX encoded. > > > > So, to fix the issue, does this 2 changes looks good for you? > > First change the WIN64 swap from: > > SWAP xmm0, xmm2 > > SWAP xmm1, xmm3 > > To: > > VBROADCASTSS m0, xmm2 > > VBROADCASTSS m1, xmm3 > > > > Second change the fmaddss from: > > fmaddss xmm4, xmm4, xmm0, xmm1 > > To: > > fmaddss xmm4, xmm4, xm0, xm1 > > Yes. Appreciate for your help, I commit new patch here: https://patchwork.ffmpeg.org/project/ffmpeg/patch/20221114143551.9740-1-bin.wang@intel.com/ > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org > with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".