From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id E938F44C2C for ; Mon, 14 Nov 2022 05:58:38 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 241D768BD19; Mon, 14 Nov 2022 07:58:35 +0200 (EET) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id DA4FA68BCA9 for ; Mon, 14 Nov 2022 07:58:27 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668405513; x=1699941513; h=from:to:subject:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version; bh=PGjPEffBx/6tVhaocNZH0oKNcYm3TChq6893Ytaj1Bk=; b=Ioql6BK705OAqzcsBHQjDQv7m5vmjV9Va/NOds5YQgylFGFcyCuMZPj7 h8s4ZGnDo4DvJubMM6V76Q6y+QFJy/2xbE8+nc1LRmRTmDWJ1yoqTyCXv qDrQiMvFJiOE4/ralVm6iaFmfnBMgpM1EoGb1CyEUrzmtVb4YPGO6xSU2 4+Pm3ab24xiAVGqa6mHuO9+EKHrGtHT+IlH1T2jPLsKL4BaIgYhJNd05e wGDjEpRv6yIvqCGKAMwC75hf0NpNM9ZW9pum/1xEua0X00QUZtlLiY3Et 1cQk3Z7PS0SJkc3IP3kPJr7K/QDvPYKHExA3q3Gby0vytGpIrgvQRlVUK A==; X-IronPort-AV: E=McAfee;i="6500,9779,10530"; a="299407942" X-IronPort-AV: E=Sophos;i="5.96,161,1665471600"; d="scan'208";a="299407942" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Nov 2022 21:58:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10530"; a="701873218" X-IronPort-AV: E=Sophos;i="5.96,161,1665471600"; d="scan'208";a="701873218" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by fmsmga008.fm.intel.com with ESMTP; 13 Nov 2022 21:58:22 -0800 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sun, 13 Nov 2022 21:58:22 -0800 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31 via Frontend Transport; Sun, 13 Nov 2022 21:58:22 -0800 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (104.47.59.169) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.31; Sun, 13 Nov 2022 21:58:21 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HAYqGMwqTHGN4M5vA7ujYmHDT5ffE+ybMjeo4b/RworJ2MOhg7P2n/nQrilSi1aYqWy30o6fR64Ce0LEUXZ/dnDQqfC/IZy2MMBqiilT0dKcQLxcxs2fpbEAZ0h4kRrn9eQ0+uyRlMHXK7iUSZwCKw8U3ZbWPXr0pnT9tISFYTxXG7b01uAQ0rL9kvxky8G2mWLY0SZUpceaqbU6V52p8Y8PsBEqctd2jykbI11m7Xy08nQh3wWRU0BxF3um5zMUmoVJ8Ys9IczSLp2O7Rx9bFE8VQtfkXDQdfQhcRgIBm8YUheEOtz0fkSkrxZDwFw78qysF9yvtiybTYaOMG/5Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lmhZ7hrHqG70M8NlFkc6PJvzhgg0PBy11sFKjStus9c=; b=n/LSgYsViKAXMW7WZXucdl/YMe+TaJYKxdP7GrN3Ow98RG3eey+wrpXdJ4n0plZq73nPd6aY2UWKAw8i57/fqhGstkmz6QRjzDwUilnEFO0pb3vCeQeXKprcURQrtjxsviGiORyAef3BVCcMl0KcaAowGx0pZzFYvgsbnq6Kkz6Fax215iHd4zvV1P+yogL/g3hv6sqWZKWsw8yNVVOELL+BSkdJ1foJeKKBXYAxQwO2y7QdfthtchgLK26TVSmpzc8dePvxR6v1nC3yOLeGhkAI8BUA6pd8CS3fzVi0A/rdMSERAbmhZEoiKR8WHzVHy8t5lDolbcyAziN9qPAHMQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from BN6PR11MB1746.namprd11.prod.outlook.com (2603:10b6:404:fb::20) by DM8PR11MB5702.namprd11.prod.outlook.com (2603:10b6:8:21::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.17; Mon, 14 Nov 2022 05:58:20 +0000 Received: from BN6PR11MB1746.namprd11.prod.outlook.com ([fe80::1565:8f45:a36f:8704]) by BN6PR11MB1746.namprd11.prod.outlook.com ([fe80::1565:8f45:a36f:8704%9]) with mapi id 15.20.5813.017; Mon, 14 Nov 2022 05:58:20 +0000 From: "Wang, Bin" To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI Thread-Index: AQHY8Cqnk334s11ZYkKsTF5Sp70+yK49xUCAgAAzTbA= Date: Mon, 14 Nov 2022 05:58:19 +0000 Message-ID: References: <20221104082925.25598-1-bin.wang@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: BN6PR11MB1746:EE_|DM8PR11MB5702:EE_ x-ms-office365-filtering-correlation-id: 9c68c575-5466-442d-36ab-08dac6053b5d x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: peUFD9SlYHz/Wh5L+e3+kTtwtoJcB6gTgky3Gt+HwJ6tFhY8XmFSBrHI4bRp/8iWden0O692WHFfkAuBajCUtx618XiZuaeLsb2uyjN1yCQYLLZ6LMXzv9g7IqahFbBQQTS8V5eHlpeG38MrpQMxJAl2Oo/4kVDGDm/T6+k7cjnlxLARhu4ZhnNqlHZ+qP1sB0DqRIPwkIk/2C4iPzCBPLKtTrE8fhgmFcteMt/C4WdhgNyyMEy0v8l7Wpaxt0oXmt3Id0XjPucmtHTTLMvkXBCfoPe0vsv9thAFfWo6Yzs4ar6eOsiapREEx37jFWtXoZgTO2VWsM2elWs9vrCYBGQ+Bw8ZCSowxFgCdb2PB6odfjFk4cI4ic0vE28Nu9RGR0F/KsSiUTU0dXwZzf3VXN9L41VaRFOO+GGwCEE4CtKnYk09R0LOpqJmdXi6oLpeM5MZbAr1QOQrmBhMCgP+uIFXAQhuvTgQOaxIQJGJUtb+mdVSebyU95wVX6/MM48y2DJWqs/RfR+wL4frs3L0v23tnqHYaCfFZLepAA363M0upqMapCSb0DpBkpa1cvF9Lx9uwXUPIJwaT/nLzXpWRwes8oQjdYZG1Df4XmD4yLexcFKg+Q6kVk3cIBabmA+xAyCcgUHkuYsCihd6Dq0nHWL+t2DfJZlyoAPbR/1F1GC/Roil1lW1HoBLRtTkJzCjv+/jJeOf1bgm7KnqLxrfHXeAde0AYEpijx1T8wsGGwGf7r4qLFDc8eJJpx2XmuId/4RZS5ma/9N6J1fjCip0Hg== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN6PR11MB1746.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(376002)(39860400002)(396003)(346002)(366004)(136003)(451199015)(9686003)(26005)(122000001)(71200400001)(6916009)(316002)(33656002)(66946007)(66556008)(186003)(64756008)(66476007)(76116006)(8676002)(2906002)(66446008)(55016003)(86362001)(82960400001)(52536014)(83380400001)(478600001)(7696005)(6506007)(5660300002)(41300700001)(53546011)(38070700005)(38100700002)(8936002); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?yovG8h2AyoO5UXi+BAAuhve/lldpGh8kxe9Cb9QZN+Trr0f0hdF44QwsV2nI?= =?us-ascii?Q?ZcWVhbn+hnQPAECW1GuxW3n1IZ/ja03QYCZOWMD1SSuIYfXNlr8oaZk1HBak?= =?us-ascii?Q?ICXEUfrl/dBwJ9ffRuGBFxgWtkOTY9MF4HF9wfY1sHmqaiICaaBZMdD9o1j2?= =?us-ascii?Q?F0t4Q+pN+EPdKvIseJSEm59zsxvoU2suePw8/EOp19jhJZxeZjwrNZuhO5AP?= =?us-ascii?Q?NrXepAVlofsV4fuQuooW+k7oO+Fx6HA4cMhzODyHRAqwXIiB8C170/dmAHIq?= =?us-ascii?Q?0wIDBNCHLYa4YBJRBgM929aWQpkS9XDBTYbAWXEHxNjKs+b6KfCTt1MsHYjY?= =?us-ascii?Q?Jpx24CGhEnbTXJiZ4uoCy/bDE1xyMmJMqRmxEUqcUtBsgYgjlNWxgC9G2krY?= =?us-ascii?Q?XCKM8toUe6nzjyvEzeP4GQBCCrxec57kBjIUciyc7BHLdFicwXBGM1uHDc4T?= =?us-ascii?Q?OlZ/J1od4Z5bUbp+OnvMumBggo1CrMKaJOoeyz/zoZdVsaFXs/oN8Vd9qsPM?= =?us-ascii?Q?gSmMqEb/ZR41kG2P8AzT8WSFL3J1ONYWfo7Aqy+01zAHs913REcaUtBxmbAs?= =?us-ascii?Q?Z/SH9ES5sXCCfl+m2KUhTYFQq3CTOFS8gQxFrrbGilunGsOjML23x4YjRa7K?= =?us-ascii?Q?nXibI1J+QuF3reoUVrukdAJnI8dvN2xDRQEZjTGiFb4Vs2v5dBAs9h99YEh/?= =?us-ascii?Q?vplvnUQBTenDAE1uFVAr+STc7tYpsuf3DR/cj4zsurQh3nfo2q+l7mBGBVEs?= =?us-ascii?Q?zRG/Eak0KnWaI9RsUd5yNFml03Eb/PueoIlXoUJyf3F3gEHC9oEP81IE4159?= =?us-ascii?Q?RqpBdzxT8TzAXX/A9cAd6/OjTL3rbgMUpffXpFWtvC1718kPhOxLtF0ty7rL?= =?us-ascii?Q?UPf08CEyKob1A+sx6iEGIg/7BMSJKpqpJED12k+y6+/pXPy6PwdeqWIQi6dj?= =?us-ascii?Q?R4XLryjI4rcgR+gYhBYFaqJyLvs2N/PC4q6aFgawBLDT97M68Zl6lpoa1QYL?= =?us-ascii?Q?DGY2h83SKzjFfngbaiCAj/Nkzagbh0rS+bUIstM1ktazAnC6ZKg7ez+l6uuh?= =?us-ascii?Q?e5W7uRfT9muMVqRetSaudagnyAp/iBnPonzGh33BaEva6ejZSMErLMyw7DNI?= =?us-ascii?Q?6NkH60g5cAC3wGl6iVs6QAjlgA9Od4ih0YhmEpcGoMUIamhu6VSb9F/SFypu?= =?us-ascii?Q?uj4TSgDetO7XYuJBrbj6YECug6sSHTfdmnJsdWRJLUzEUnzfwSsmO9D861Zp?= =?us-ascii?Q?MUf2ca+/dz6InANxsxhZ1yZTR7d7EI93Deak9r0AFrxZXxrBBTsFfsFoyARC?= =?us-ascii?Q?NoZ75kPGQu/EJdu0p+c/G/7ve1LBwI2qsGQxgrRU2TzJpue28p7t2wUsAkvy?= =?us-ascii?Q?B05SqplQL8QRPP/R2T1K+541XmDP9tvctoyZHddLuqXXtJeFhPTcRF52yAMf?= =?us-ascii?Q?sSAHt/h3VqBtEUO1WS1DEUnpB0Z+bldNpss/KNMb1to574LWeQEfMc6dqp5C?= =?us-ascii?Q?treyxRLFSWyE0fGDYNWoaCAUIfJPv4Um4+bCFV7NQjwey4Lv/C0apvZmM4l/?= =?us-ascii?Q?/g3e6KTtDrCrRZoQsMedj8Nro7tHm1laJYbZvJY1?= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN6PR11MB1746.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9c68c575-5466-442d-36ab-08dac6053b5d X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Nov 2022 05:58:19.9290 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: lxQwKU00NIwanb/Llm4ZGRp3q8CeHf8rSqSWfKwckQ9WIyBg3RF6jGAd69ZUhnnO5+xyZK4q00+ONzJCV0UEYg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM8PR11MB5702 X-OriginatorOrg: intel.com Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: -----Original Message----- From: ffmpeg-devel On Behalf Of James Almer Sent: Monday, November 14, 2022 10:43 AM To: ffmpeg-devel@ffmpeg.org Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI On 11/4/2022 5:29 AM, bin.wang-at-intel.com@ffmpeg.org wrote: > +%macro FILTER_SOBEL 0 > +%if UNIX64 > +cglobal filter_sobel, 4, 15, 7, dst, width, matrix, ptr, c0, c1, c2, > +c3, c4, c5, c6, c7, c8, r, x %else cglobal filter_sobel, 4, 15, 7, > +dst, width, rdiv, bias, matrix, ptr, c0, c1, c2, c3, c4, c5, c6, c7, > +c8, r, x %endif %if WIN64 > + SWAP xmm0, xmm2 > + SWAP xmm1, xmm3 > + mov r2q, matrixmp > + mov r3q, ptrmp > + DEFINE_ARGS dst, width, matrix, ptr, c0, c1, c2, c3, c4, c5, c6, > +c7, c8, r, x %endif > + movsxdifnidn widthq, widthd > + VBROADCASTSS m0, xmm0 > + VBROADCASTSS m1, xmm1 > + This and every other xmm# case should instead be xm#, to ensure the swapping is taken into account. Sorry, I can't get your point, could you please help to explain why I have to use xm# to ensure the swapping operation(swap xmm# can't work in WIN64 asm)? And How to do it ? _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".