From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 3AD0C44C72 for ; Mon, 14 Nov 2022 13:54:59 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 61EB768BDE7; Mon, 14 Nov 2022 15:54:56 +0200 (EET) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 8031468BDD7 for ; Mon, 14 Nov 2022 15:54:49 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668434094; x=1699970094; h=from:to:subject:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version; bh=t2mR9yMEuH3RYzjBPXzOONQ4l4Tw5IDflaRsVwC5ifU=; b=mkhuH2zx22jPKk/FZOaEDnxnKX7vomtiYSNr+9ANO3i8rJPE37ibPZiO pB71LVJcdkypY+MlbFdigvnjlfUHX7Et9P2Exd+TGR9AfMqC2Lv/UxI7f IHmm9fJF7r96oM4P+SoD6QdGoP+VnD5F9kIi4eosI/mmsuAzjvmFY4MF+ yJ/GVeS9KIWu/SKnv+kL7fooMs/Onz0cK4vMB8Xp8S/R4o1xXqdtelJJR JLQ0Met2CfwnBZpvgsmveGD7Bn+7b/at+r4JOFGK4QnPRTDKb+C5MCmzX WGwLsH7ZIAuLFLbi+Gqf6VfR6hPyt7dn3vVtqxG9KmATE+TnbzijKQ3Ud w==; X-IronPort-AV: E=McAfee;i="6500,9779,10531"; a="310679837" X-IronPort-AV: E=Sophos;i="5.96,161,1665471600"; d="scan'208";a="310679837" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2022 05:54:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10531"; a="640778951" X-IronPort-AV: E=Sophos;i="5.96,161,1665471600"; d="scan'208";a="640778951" Received: from fmsmsx602.amr.corp.intel.com ([10.18.126.82]) by fmsmga007.fm.intel.com with ESMTP; 14 Nov 2022 05:54:47 -0800 Received: from fmsmsx611.amr.corp.intel.com (10.18.126.91) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 14 Nov 2022 05:54:46 -0800 Received: from fmsmsx602.amr.corp.intel.com (10.18.126.82) by fmsmsx611.amr.corp.intel.com (10.18.126.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 14 Nov 2022 05:54:46 -0800 Received: from FMSEDG603.ED.cps.intel.com (10.1.192.133) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31 via Frontend Transport; Mon, 14 Nov 2022 05:54:46 -0800 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (104.47.55.176) by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.31; Mon, 14 Nov 2022 05:54:46 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VmKMA9WWnzO81KKO0Yn+YcpM4c96c+JIrQHZr6G1ef6ZpMB9q8gzWmAWkZit45wWpa1CXUv3BGRowN7yqm4Yr+/ipzL5u4sFtBJESW0OQejsN5fEEd62Ql7YpwSxeqna5YQo+U7gXyqgH5iogaYLV/ySOO/Fm+npW1fmE1em/xSDKkhN35+4jdy7QuMkZBi1kDCADkOd3q9RWXJtu9easJSiZu/X8n/b6LTFwCbybNAwS7j/Sc+W1Dpfj6hI1TdUIi6M4UDzuCDRsa+M/XtyVm/4hgmBWNWawoJXh+ItZ2HMHkyavQEGNF6dY6U4z/xXlmQuVr8RkLfEQkUlrEbekw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PPu9/Vmds5elBM1T8w/I2BqwUS/CZy0nRpnZ4tZMLTo=; b=DOQUYvUxGBQbRas73b90/ux6nsAh12vF8t63UebqXgWz84s7FrPLLYCzTe4JOfcUCqSnuR0MKMZqcdkFpkoz/6gFb/kI9uLWOqi/bALJCjxPyTAD+rcNjMsP7ea6e56hAPYH00emcvwsFoIdddfnLnxTzAVfu0bOrvhC26ylGNkKhUCfzv+nW6iFKBZJMKJ8eQZxbRFR59FqTLC+bbH1vbbcG6QO0wJz/5kKHFUV2HIphHHvv3u6XsHpTF5VHLjKnn5emS7KWVyF0HMmFYT+DXMF+BZhiAbCDzNbRqtxO0mNKnOdSmsFx1JiunqLubKqvjK3nWt4rXN6cMGWHBtTlA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from BN6PR11MB1746.namprd11.prod.outlook.com (2603:10b6:404:fb::20) by SN7PR11MB6678.namprd11.prod.outlook.com (2603:10b6:806:26a::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.17; Mon, 14 Nov 2022 13:54:44 +0000 Received: from BN6PR11MB1746.namprd11.prod.outlook.com ([fe80::1565:8f45:a36f:8704]) by BN6PR11MB1746.namprd11.prod.outlook.com ([fe80::1565:8f45:a36f:8704%9]) with mapi id 15.20.5813.017; Mon, 14 Nov 2022 13:54:44 +0000 From: "Wang, Bin" To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI Thread-Index: AQHY8Cqnk334s11ZYkKsTF5Sp70+yK4+cBGAgAAG/hCAAAScgIAAAwmg Date: Mon, 14 Nov 2022 13:54:43 +0000 Message-ID: References: <20221104082925.25598-1-bin.wang@intel.com> <345301fa-4a3d-f131-4974-53b65b7c2adb@gmail.com> In-Reply-To: <345301fa-4a3d-f131-4974-53b65b7c2adb@gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: BN6PR11MB1746:EE_|SN7PR11MB6678:EE_ x-ms-office365-filtering-correlation-id: 8aa7dbae-cb45-411c-9eeb-08dac647c8d2 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: XEEKSFEo5cCv6Rtck8/IETTcy3OjJ5H0sKAOPPv2aEaQXnA0k7atxS/ny5Qo+xmWrgHQT0ARBhcJnsv6bHZ0wqCHnAG70Um7Xp4lv6j5jVpjg0dOhi66/QJKcW1RLuv7JSi8ajfGORedWYJZiMIYA5p3c+Y489dS+i2isg5Y/Uvzo0P55QMNDd9JtJAvhI9AIuhGKY3yDhwe1RMn0OSAvBj9gzQPE9/gNpdiElgBID4X7BtJBpyAgsFqBMV3loA3VE8+L41zQkXzHWoP9RCntqMHZ2SbXCf4wYx9OXfhNJ9GtmuK37l4djt/LrNwfVyssTyUTAI0L9ZOagF0GLvsWegKoffC4iBtIXCS1TbZgV/EAsmVKlTSi/86pzM4oKpEuPcUjAxnUY2WHHlqOafN7Xyr0W5ehWfGd4CB2gCK1zS4fesPuc9nEHUL4vogOoS7kLJjIJGegh/DRPMkmmxSK5+5RRHheiWe4oM2YAOnEczXk5cnru4jrnixlTccunu9ZYZcHvpBkJ7wNmPo4oD/xsCE/a/oyLG7sCR7JB1/Uj5cA+NlwFLh3ChCV7xDVxRZIKp2toKKGFk/NYwvkGQd0UrZd4oQdfFhKoB/CK/QDVNCm7boSRVEB0uN48GaBm02aAETzIu61TrUkV61VJOgwlbFjbX4kz1nLUvghqeEHzevq9+crJhMSSFYzUxUpCnghvcy6LINkQS75jUWRkJapoSvL0xUEarDkglF0SwvcGKtTii9Sv5T8oY1EnGtS0MLyua2i+IdKfZegaaZDoOkViintCRnAE7ruWldZFB32X4= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN6PR11MB1746.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(39860400002)(396003)(366004)(376002)(136003)(346002)(451199015)(83380400001)(55016003)(7696005)(53546011)(6506007)(66899015)(9686003)(26005)(38100700002)(186003)(122000001)(2906002)(82960400001)(5660300002)(8936002)(71200400001)(478600001)(8676002)(966005)(52536014)(66556008)(76116006)(64756008)(66446008)(66946007)(66476007)(41300700001)(6916009)(316002)(86362001)(38070700005)(33656002); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?a8f0+15MFvrrOTEXIQQ6PFrP3B9p/emd58xbcPrsANB5qs2VrGUHBjJwe57M?= =?us-ascii?Q?Q1BxrBNxovpMCjs7ArMbQzu84LmopkvJeRV4Z7IjSxdz/OyQL90jjw2qSTSU?= =?us-ascii?Q?gVKBsiFNqgXRJjzUw7MWr7O3DMU0QPoTAA1wrv/y3jyouL9Of7paOIjd13gR?= =?us-ascii?Q?aA9P6m49iHGlgnXHUdfu6wUrtED4AfMy87jEq0UR9dcc0fg1vvXZuGfXTdsi?= =?us-ascii?Q?WafPwY4N0jwZEzdrIN1Q9i/YXMch81gE40oob1LKYzCAEo855az4LR9Wt0qT?= =?us-ascii?Q?nHr04LXLe25/a+hrrho+RcU+KKGeiljUBmdzkkRDG0fqaxWJ1gzq19rZ09dN?= =?us-ascii?Q?nIhRv5XsAhb6bcWGJ2IGlhqVo/rpnZZ5St3XmYDoazl9HNDLdwsQY6v5wRja?= =?us-ascii?Q?YUZsJWInyBhDlZaT6E/JNOHrcZovfrh1JQadN22wIaMX0ZLBeX2vIfi2k+X0?= =?us-ascii?Q?3cuvFTKpeuu4XEwVD7ON0VYo5mzseK5QxvVF0ZuYHtRyKNuff5mqGJ69YBgv?= =?us-ascii?Q?r9+K3OxzVO521enx/bcIuIkwGkCAQ91igsticmDZq24fxZmb3UfKSw+Sdwqe?= =?us-ascii?Q?R74DVk8srBS+G8J/vWA2q3yFkdayG5/BqKCV03DBOHo4Q1/u8HQLSo+8fM16?= =?us-ascii?Q?MONLLQjdRFMbjrgYIyBKgkyslaR0suQZhtNJLHLtP1X3obTaZuOZ6A8NcXI1?= =?us-ascii?Q?PAVmMahPwj7SRiUjSmtLCNRJj7uQhDeiHAXHqkJPzKrkyOr+fXXGPLysOvE9?= =?us-ascii?Q?a2jbsOHZpnin9X981iGlLj1BFBgzquEuwxCl42Snpit1ZyEHLCsa9wJAh0BW?= =?us-ascii?Q?fcyjqGWGoYA5eV0d8sZhQGHN/Cm88DsWupt42aFeJCMOHgmF/3umpr172XHL?= =?us-ascii?Q?a3c84Mexhclz69+w6cGmTfSQanmoZIlPs+ZuzzWgHr7nbcmcX0fA4ZzV7loa?= =?us-ascii?Q?YnaotB72oFmPmgduHT8zn7uCMeNS5BRuCdO99SgKF8RWc1jZLr/d/gvKtwEe?= =?us-ascii?Q?2IVa3S2e+RQVn4WfnT3kHAsIxZTVb2OJ9tED2+ceXYkS1TbuKq3UvykROdOJ?= =?us-ascii?Q?ZVzB5OIxsRYp3vEaQCot527PFL6hLS75LycF/x1HS1ybRgT6wZAo4H/HgXBD?= =?us-ascii?Q?mbIdEqZHFuI8rr4oqu84fZ+wVbkS3W0OPtAHdQ0QuuKRC3CrnxgfCym33Rj0?= =?us-ascii?Q?OfBURQOgbEkIngqtTmxtRUNGwHhtGElKZPlSYhvrLL+EYPuKW1e6yUZy+KLs?= =?us-ascii?Q?vqIbm/ZVt0IFRILTizN3XuMr4NCZgrl2t/gM1mfnnGUlQrt60V6IQjpuhE9w?= =?us-ascii?Q?PNhVAWC5R48zc+GzFOIbt0syYnkLaEivksfkLxAVvmV6fr4W7RVArRdSLTsC?= =?us-ascii?Q?bQQM3qYIUIPHrzm/Rdua/htsEfD9VNv8FNRp3eI88nDAsggvBdyIyA2kFkIk?= =?us-ascii?Q?CnTj/5GvHKRerDyaNQ25lcvh4QqJEYW6Ow+UENtiaqJda1IEO6FJHruE7TPW?= =?us-ascii?Q?+eyIb1H6L4h56rHtXhLd+Y8thcWBvCQLpb99xLz6DS+HMwqq7v44Ak7y75rn?= =?us-ascii?Q?/gAxcvonMTkmHuwu0Qw=3D?= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN6PR11MB1746.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8aa7dbae-cb45-411c-9eeb-08dac647c8d2 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Nov 2022 13:54:44.0258 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: JsKKdu54L3aSYxtkwUOrQI0R2qb+4bEQJujV1AhnZnaJXh0HRHqOx59hpo18EY3uisebr41OO+SQkbwmKus1sQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR11MB6678 X-OriginatorOrg: intel.com Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: > -----Original Message----- > From: ffmpeg-devel On Behalf Of James > Almer > Sent: Monday, November 14, 2022 9:36 PM > To: ffmpeg-devel@ffmpeg.org > Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add > sobel filter optimization and unit test with intel AVX512 VNNI > > On 11/14/2022 10:30 AM, Wang, Bin wrote: > >> By using xmm# you're not taking into account any x86inc SWAPing, so > >> this is using xmm0 and xmm1 where the single scalar float input > >> arguments reside (at least on unix64), instead of xm0 and xm1 (xmm16 > >> and xmm17) where the broadcasted scalars were stored. > >> This, again, only worked by chance on unix64 because you're using > >> scalar fmadd, and shouldn't work at all on win64. > >> > >> Also, all these as is are being encoded as VEX, not EVEX, but it > >> should be fine leaving them untouched instead of using xm#, since > >> they will be shorter (five bytes instead of six for some) by using the lower, > non callee-saved regs. > > > > Thanks for the help. I'm not familiar with WIN64 asm. So what I need to do is > change the WIN64 swap from: > > SWAP xmm0, xmm2 > > SWAP xmm1, xmm3 > > To: > > VBROADCASTSS m0, xmm2 > > VBROADCASTSS m1, xmm3 > > > > Is that correct? > > Yes, that will ultimately broadcast the two scalars in xmm2 and xmm3 to > zmm16 and zmm17. > After that what you need to do is either change the fmaddss instruction to use > xm0 and xm1 macros instead of xmm0 and xmm1 (so xmm16 and xmm17 with > EVEX encoding is used), or much like the broadcast above use xmm2 and xmm3 > explicitly on win64, so it remains VEX encoded. So, to fix the issue, does this 2 changes looks good for you? First change the WIN64 swap from: SWAP xmm0, xmm2 SWAP xmm1, xmm3 To: VBROADCASTSS m0, xmm2 VBROADCASTSS m1, xmm3 Second change the fmaddss from: fmaddss xmm4, xmm4, xmm0, xmm1 To: fmaddss xmm4, xmm4, xm0, xm1 > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org > with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".