From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 473D443B4C for ; Mon, 15 Aug 2022 06:19:07 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 090E668B91A; Mon, 15 Aug 2022 09:19:05 +0300 (EEST) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 702DC68810B for ; Mon, 15 Aug 2022 09:18:57 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1660544342; x=1692080342; h=from:to:subject:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version; bh=F9rvhZpSxX/TM9+X47GI04fQoWXvvV5lQUnqJg/BxFQ=; b=cK3lRpqigRS1hL6KnHrm9ngKLFAY9ItRg1wjP6hXraq4JRTDepvPu2NY 4kh4wU8uZAmehFEHAGW59cg4Aj6CnUJLMVSm9c1wFBeahwFW40k06jyb3 vHQ06oQKizT/YDEyzdRspU5deV7uHv417amq7HVgy0E0PmNbfOajVJfgI ncZowRD8EfhAhvTyQ1SKovEPiuXm47PSXwBTcL4iNolhPjN9KotXlDXiP QKtG4fIHp8WC3sRUVLcKR/bd057h2X6y5uZvn+AANyWyPkLLFbkZDzuTb HEuHK82Ad0d1n5s9W9URr2BMFWkdeoLvBXFbsnqd2QzfwCGhEWJCADsCQ g==; X-IronPort-AV: E=McAfee;i="6400,9594,10439"; a="317880801" X-IronPort-AV: E=Sophos;i="5.93,237,1654585200"; d="scan'208";a="317880801" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Aug 2022 23:18:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,237,1654585200"; d="scan'208";a="557184746" Received: from fmsmsx602.amr.corp.intel.com ([10.18.126.82]) by orsmga003.jf.intel.com with ESMTP; 14 Aug 2022 23:18:54 -0700 Received: from fmsmsx612.amr.corp.intel.com (10.18.126.92) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Sun, 14 Aug 2022 23:18:54 -0700 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx612.amr.corp.intel.com (10.18.126.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Sun, 14 Aug 2022 23:18:54 -0700 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28 via Frontend Transport; Sun, 14 Aug 2022 23:18:54 -0700 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.168) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.28; Sun, 14 Aug 2022 23:18:53 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=USvPmXJoXHATMVPV7wymApHE/UMCP8tmgYs0iX6LQi2lca8KXM3Y6MJCrtlR0HdbufaAzJ5u4SIf+fkOZvP4vvlOeqt7bl0XD/OCKOJmfkNNIdVS5Gjj/R27UyNyy4LF4Zjwi6/Ar5yLokZb1y6vFhfdnA1WMsi4ZMc0IGwLmy8yA7LwlLjqvs28OTSssVdtjZ7ag4L9NKSu58ZzVVF2cMx3UFpjJHFk4c2G5z9DM5gg/EvahUf7tQpgMiXzceo+7ia3xILp2GOLxzRKRDnrHjH/QPZGOQVAjcX+gdW52V1JF3NjXMZMiBk2y15WSkv7hoVFZOK7O3XQn2/HCZ2n1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=N7bq8dMA5nuX2GynU2ycnG99GDK2Wmiio7UbdcYUa4Y=; b=MfaxrOQTlmC3kOfZqIvpjo3U4b8NHrDKFXpjzRM7QgUbhUnamaJxcukELmIATz/yNri6BVlWmoED5I0MuvyH/pqHZxsLE1DZVl1DDeHq3wQfzmbaZ4HK/5M+UuvOwiXXmonDvfZEYdVcyzuKWN94Heo2tBQxkVZb/WsIQNuN7w12UlxvokPhcYFihDiRftIvsq5twhasYZVaxH54tLh3SXZ10avRioX0GhPFoJSblhTFVC72+Y6UsFmLAbgiJ13y6sgm6BFsAGJ0O+Wr77ho9g4d0jWksW/Xs4EmKUuX1wgcKkgM14bDGTu9h5oRK+z/1KXooD/rsVNPlIB3o268uA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from BN6PR11MB1746.namprd11.prod.outlook.com (2603:10b6:404:fb::20) by DM6PR11MB2971.namprd11.prod.outlook.com (2603:10b6:5:72::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5525.11; Mon, 15 Aug 2022 06:18:51 +0000 Received: from BN6PR11MB1746.namprd11.prod.outlook.com ([fe80::7c83:a5cb:7ab7:cf8d]) by BN6PR11MB1746.namprd11.prod.outlook.com ([fe80::7c83:a5cb:7ab7:cf8d%12]) with mapi id 15.20.5525.011; Mon, 15 Aug 2022 06:18:51 +0000 From: "Wang, Bin" To: FFmpeg development discussions and patches Thread-Topic: [FFmpeg-devel] [PATCH] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI Thread-Index: AQHYriqVVo5F75M6702KJVxDPTUT2a2q+ngAgASGGRA= Date: Mon, 15 Aug 2022 06:18:50 +0000 Message-ID: References: <20220812084451.14325-1-bin.wang@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-version: 11.6.500.17 dlp-product: dlpe-windows dlp-reaction: no-action authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: f457eb9e-c406-49d5-7cf6-08da7e86057e x-ms-traffictypediagnostic: DM6PR11MB2971:EE_ x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: QCdAz8vl3t4wtiamvDDb9NOYW6XDA1IGfeSnhyUKMw8azH/cpwhYSs2ilQVKsbaiLahQb35NKU1f8GlTrreDbh1kCa30EL6LHE17+hGzSicy+ZknOZ8v4pFBVJ5Qy+gI4EmvoA94R9OA3ak8MvjkzObB8fSwk0OKBwYeMmgdixAFVY7j24lBq3OGq3W4nzDK0mW48CD8GSdAvEGWoJkikpcpugaWfOgvn/yNYqYoBXFFi9DO2nMoGj7jZJrjx9aeWswnR0Kfkj3PTYuC8Rue6X7HS1pjPMxtrJY82141ielkDVoQg3OBCuFNhPdq6Xec19bY2wsTwZUuUQWi8FPYvD1xxMWlzGtElGHl28/nxQxBDPWSGevOs5rNaZNr2aPuDd2EXrL446WNp6nQXCVLxYBPA8nIhfDVCSrDeyedhA7iypi81uX+2E0kpqE9ELUR87eEf2qxI2x/TcU0Rn7rUBXEmCP9Pp6AB5+pqwxzGadS17J2d1/28BU0Vj7pDnNCUKMghNllp11vadc1sJYzruG3bSW8TXy4YYPj+ZIEw5vanonIHUMgMg2AXxP1yZX4oTAw4PjB9w3G/adcl0qb0rd9UMN51p9/9hhkNtBawIKVWV8KcaaMLvshd6TpoZxjlFYzQTwITJN27Cey/tStD8jY2ATSanUsX2CirvObyA1PkTj/oPtrk6pa+j1LEbFkBeE9WAAQVhRv4RAPQxv3cGvvzasy77aaJ9sqdZkNLiXilWlQOiITNhI6SoASAXdWbxa6QIczMlp1mrVb7n5KvkgNfvsmJM971TblahYlrQaCafvDb3rxCvXun32xEysv9qRSyx6Wq4nJsD4jmusJTKDX8bNIj46zxetnFPhA9EU= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN6PR11MB1746.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(136003)(396003)(376002)(39860400002)(346002)(366004)(2906002)(41300700001)(478600001)(83380400001)(7696005)(966005)(26005)(6506007)(53546011)(71200400001)(5660300002)(30864003)(186003)(9686003)(33656002)(52536014)(316002)(55016003)(66946007)(8676002)(6916009)(86362001)(66556008)(66446008)(76116006)(64756008)(66476007)(38100700002)(8936002)(82960400001)(38070700005)(122000001)(2004002); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?e29Ij/yt1hUfUseBaA/rSpeg2Rt+0uNuvMwlHnguTT9yFK9ajB9HuOtDcprf?= =?us-ascii?Q?SQ3FWipYvpA0XOApAVe1Rmx8I0syJ+/6gS3B8l4UxN7Ou82tuhTcE0rSmypM?= =?us-ascii?Q?zzVoKDRck/eNXkDW0fDPLHAL18zKh+Y7iy40tsdXyutmhYKjmXogdiWcGTBY?= =?us-ascii?Q?X+u8I4fE7sw1VoyL10KAvREZEpyHW2zNNNLMc5wqohNegwWj4iov/g+BkHja?= =?us-ascii?Q?lZ9guoQ1yk1Kq4foU8RR7r5RiH1JLuxlmnITTbAll6yyN5yKMkASJ8KYEVuI?= =?us-ascii?Q?1EU/7ZbTZ6lzHBRbSvS8HHM2MRziGJhXsCnVR0qiFStuDRKOvD+pYgfL4c00?= =?us-ascii?Q?jK8FRhJtOTGnVIBPX/dBknFVwYB1fef5ePzVDzBuOEBaN5J3Fk5WrsqpOWm6?= =?us-ascii?Q?oLnnTUHuLPxaj2YjvxIieGvTRdimqAoXR4s45ATDRo1L396I+IKsYHCOEgVi?= =?us-ascii?Q?zn/uv01TL8d8at6YDu14FJDn99CuzGc03EJuP4mMoCkOKMul9njM6Ofs5YIO?= =?us-ascii?Q?gTj7B/3yRhnCSVL1rnAIwz9+AOglIYqLEeD/OdIVsZHa3wEQ0E7oYQxAhqfZ?= =?us-ascii?Q?KrD+g8rK9BeXil3Y0F/irJKc0VzpzETUn7hjxmS7rggcbX73qP0OlTscuwuL?= =?us-ascii?Q?+rFf+aFYH9z+BfqI2o95KUf4N/4uIFW3FYsBRFWJJKbOK6kCz8HHHhKdklH2?= =?us-ascii?Q?n7Qf+SbPCkAfMN8aFg0V7Egp2WTvXqhIdSzv3/mvQKVmvTsGyo6MSX33amZb?= =?us-ascii?Q?YNIXS1JnE6wj1nGbuygOkX0xb+03a6+PySZT0xeKN90nF8nNIEGgEDVr4vrr?= =?us-ascii?Q?SrMg/+TLVW7X15Mq+0FpkYNTL23BbL8M9BYkpkZLvddsZe1749HsZsoul6Hn?= =?us-ascii?Q?X66Mn34BSg2YbHs7HjJmJVN/AflB16pdkrQaoDzJiyaoQG17kQJe7OaBvxoz?= =?us-ascii?Q?y9hYA3JW47aT0IFnOMuK1d2jTItqCe7UkspiRqSyFBEgIEQaI+iANlFXtEoq?= =?us-ascii?Q?KerrF6lT3cZI4vm/TJBTU2aR9T3L0yBVFYdo2SHXBv6FUC9WHTlPIgrRlG2c?= =?us-ascii?Q?Amc02gtwgXcCpd1aRb9jTAx4TWPNL331PpOLbCs0d8UG+ftLIyj0bLgbWtJJ?= =?us-ascii?Q?mnnTcUklu/90DWKCB5jggH5zxv8+UsAHmxKppYenznphsCUeZc5QnUeo7gf2?= =?us-ascii?Q?GtdsatP5fydRM4iPf5UWdEsf3cTLNM63LGNb6iosAcbePEC28+qbAjtiFNpo?= =?us-ascii?Q?LztWleV57pllMrvZtJBbGQccer32eqEo78Kx4WxGJzEvcL/JqLtNmGjGxQF8?= =?us-ascii?Q?ltZ9yLBbadhTFtgyoudkCWkcnBbr78yBjmxeE2+VMS8Ox1sUZXSeNdQJL4LZ?= =?us-ascii?Q?pnQU7IrnM/+5Qe2C5VfWDW6BWYLrLgSCd44eJerXH4b2lHZsgKPoGKUm8NM5?= =?us-ascii?Q?QVvEyCXBrmPkLLi1qeLZZIZUTSqV16sTDmsILxsT3vCXO22ZO+31j/piMa3o?= =?us-ascii?Q?SP5anF1M/2Iqh2MkknvR9QIipPdgfg7sZz+Abf/7JJSb/F3YoiIN3HIhmv1V?= =?us-ascii?Q?+i2YPtrx2mA6fzJLglm8giK/Gl/LunIRbhmaIt5N?= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN6PR11MB1746.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: f457eb9e-c406-49d5-7cf6-08da7e86057e X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Aug 2022 06:18:50.8614 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Ak9r1CmOA5ki839ZBQRKplSKLyDInA0mPDXhUc9L5AqtW90T87M3BA1pnLW2CJXpw4e5iVCPoomkxgt38ozOXA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR11MB2971 X-OriginatorOrg: intel.com Subject: Re: [FFmpeg-devel] [PATCH] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: -----Original Message----- From: ffmpeg-devel On Behalf Of Andreas Rheinhardt Sent: Friday, August 12, 2022 5:11 PM To: ffmpeg-devel@ffmpeg.org Subject: Re: [FFmpeg-devel] [PATCH] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI bin.wang-at-intel.com@ffmpeg.org: > From: bwang30 > > This commit enabled assembly code with intel AVX512 VNNI and added > unit test for sobel filter > > sobel_c: 4537 > sobel_avx512icl 2470 > > Signed-off-by: bwang30 > --- > libavfilter/convolution.h | 2 + > libavfilter/vf_convolution.c | 8 ++ > libavfilter/x86/vf_convolution.asm | 162 ++++++++++++++++++++++++++ > libavfilter/x86/vf_convolution_init.c | 18 +++ > tests/checkasm/Makefile | 1 + > tests/checkasm/checkasm.c | 3 + > tests/checkasm/checkasm.h | 1 + > tests/checkasm/vf_convolution.c | 116 ++++++++++++++++++ > 8 files changed, 311 insertions(+) > create mode 100644 tests/checkasm/vf_convolution.c > > diff --git a/libavfilter/convolution.h b/libavfilter/convolution.h > index 88aabe9a20..143b0fb2d9 100644 > --- a/libavfilter/convolution.h > +++ b/libavfilter/convolution.h > @@ -61,4 +61,6 @@ typedef struct ConvolutionContext { } > ConvolutionContext; > > void ff_convolution_init_x86(ConvolutionContext *s); > +void ff_sobel_init_x86(ConvolutionContext *s); int > +ff_filter_param_init(AVFilterContext *ctx); > #endif > diff --git a/libavfilter/vf_convolution.c > b/libavfilter/vf_convolution.c index 9a9c099e6d..98aa952258 100644 > --- a/libavfilter/vf_convolution.c > +++ b/libavfilter/vf_convolution.c > @@ -874,6 +874,9 @@ static int param_init(AVFilterContext *ctx) > if (s->depth > 8) > for (p = 0; p < s->nb_planes; p++) > s->filter[p] = filter16_sobel; > +#if CONFIG_CONVOLUTION_FILTER && ARCH_X86_64 > + ff_sobel_init_x86(s); > +#endif > } else if (!strcmp(ctx->filter->name, "kirsch")) { > if (s->depth > 8) > for (p = 0; p < s->nb_planes; p++) @@ -887,6 +890,11 @@ > static int param_init(AVFilterContext *ctx) > return 0; > } > > +int ff_filter_param_init(AVFilterContext *ctx) { > + return param_init(ctx); > +} > + > static int config_input(AVFilterLink *inlink) { > AVFilterContext *ctx = inlink->dst; diff --git > a/libavfilter/x86/vf_convolution.asm > b/libavfilter/x86/vf_convolution.asm > index 754d4d1064..59c807b218 100644 > --- a/libavfilter/x86/vf_convolution.asm > +++ b/libavfilter/x86/vf_convolution.asm > @@ -22,6 +22,10 @@ > > SECTION_RODATA > half: dd 0.5 > +data_p1: dd 1 > +data_n1: dd -1 > +data_p2: dd 2 > +data_n2: dd -2 > > SECTION .text > > @@ -154,3 +158,161 @@ cglobal filter_3x3, 4, 15, 7, dst, width, rdiv, > bias, matrix, ptr, c0, c1, c2, c INIT_XMM sse4 > FILTER_3X3 > %endif > + > + > +%macro SOBEL_MUL_16 3 > + movd xmm2, [%2] > + VPBROADCASTD m2, xmm2 > + movdqu xmm3, [c%1q + xq] > + vpmovzxbd m3, xmm3 > + vpdpbusd m%3, m3, m2 > +%endmacro > + > +%macro SOBEL_ADD_16 2 > + movdqu xmm3, [c%1q + xq] > + vpmovzxbd m3, xmm3 > + vpaddd m%2, m3 > +%endmacro > + > + > +%macro SOBEL_MUL 2 > + movzx ptrd, byte [c%1q + xq] > + imul ptrd, [%2] > + add rd, ptrd > +%endmacro > + > +%macro SOBEL_ADD 1 > + movzx ptrd, byte [c%1q + xq] > + add rd, ptrd > +%endmacro > + > +; void filter_sobel_avx512(uint8_t *dst, int width, > +; float scale, float delta, const int *const matrix, > +; const uint8_t *c[], int peak, int radius, > +; int dstride, int stride) > +%macro FILTER_SOBEL 0 > +%if UNIX64 > +cglobal filter_sobel, 4, 15, 7, dst, width, matrix, ptr, c0, c1, c2, > +c3, c4, c5, c6, c7, c8, r, x %else cglobal filter_sobel, 4, 15, 7, > +dst, width, rdiv, bias, matrix, ptr, c0, c1, c2, c3, c4, c5, c6, c7, > +c8, r, x %endif %if WIN64 > + SWAP xmm0, xmm2 > + SWAP xmm1, xmm3 > + mov r2q, matrixmp > + mov r3q, ptrmp > + DEFINE_ARGS dst, width, matrix, ptr, c0, c1, c2, c3, c4, c5, c6, > +c7, c8, r, x %endif > + movsxdifnidn widthq, widthd > + VBROADCASTSS m0, xmm0 > + VBROADCASTSS m1, xmm1 > + pxor m6, m6 > + mov c0q, [ptrq + 0*gprsize] > + mov c1q, [ptrq + 1*gprsize] > + mov c2q, [ptrq + 2*gprsize] > + mov c3q, [ptrq + 3*gprsize] > + mov c4q, [ptrq + 4*gprsize] > + mov c5q, [ptrq + 5*gprsize] > + mov c6q, [ptrq + 6*gprsize] > + mov c7q, [ptrq + 7*gprsize] > + mov c8q, [ptrq + 8*gprsize] > + > + xor xq, xq > + cmp widthq, mmsize/4 > + jl .loop2 > + > + mov rq, widthq > + and rq, mmsize/4-1 > + sub widthq, rq > + > +.loop1: > + pxor m4, m4 > + pxor m5, m5 > + > + ;Gx > + SOBEL_MUL_16 0, data_n1, 4 > + SOBEL_MUL_16 1, data_n2, 4 > + SOBEL_MUL_16 2, data_n1, 4 > + SOBEL_ADD_16 6, 4 > + SOBEL_MUL_16 7, data_p2, 4 > + SOBEL_ADD_16 8, 4 > + > + cvtdq2ps m4, m4 > + mulps m4, m4 > + > + ;Gy > + SOBEL_MUL_16 0, data_n1, 5 > + SOBEL_ADD_16 2, 5 > + SOBEL_MUL_16 3, data_n2, 5 > + SOBEL_MUL_16 5, data_p2, 5 > + SOBEL_MUL_16 6, data_n1, 5 > + SOBEL_ADD_16 8, 5 > + > + cvtdq2ps m5, m5 > + VFMADD231PS m4, m5, m5 > + > + sqrtps m4, m4 > + mulps m4, m0 ; sum *= scale > + addps m4, m1 ; sum += delta > + cvttps2dq m4, m4 > + vpmovusdb xmm4, m4 > + movdqu [dstq + xq], xmm4 > + > + add xq, mmsize/4 > + cmp xq, widthq > + jl .loop1 > + > + add widthq, rq > + cmp xq, widthq > + jge .end > + > +.loop2: > + xor rd, rd > + pxor m4, m4 > + > + ;Gx > + SOBEL_MUL 0, data_n1 > + SOBEL_MUL 1, data_n2 > + SOBEL_MUL 2, data_n1 > + SOBEL_ADD 6 > + SOBEL_MUL 7, data_p2 > + SOBEL_ADD 8 > + > + cvtsi2ss xmm4, rd > + mulss xmm4, xmm4 > + > + xor rd, rd > + ;Gy > + SOBEL_MUL 0, data_n1 > + SOBEL_ADD 2 > + SOBEL_MUL 3, data_n2 > + SOBEL_MUL 5, data_p2 > + SOBEL_MUL 6, data_n1 > + SOBEL_ADD 8 > + > + cvtsi2ss xmm5, rd > + mulss xmm5, xmm5 ; b1 * b1 > + addss xmm4, xmm5 > + > + sqrtps xmm4, xmm4 > + mulss xmm4, xmm0 ; sum *= rdiv > + addss xmm4, xmm1 ; sum += bias > + cvttps2dq xmm4, xmm4 ; trunc to integer > + packssdw xmm4, xmm4 > + packuswb xmm4, xmm4 > + movd rd, xmm4 > + mov [dstq + xq], rb > + > + add xq, 1 > + cmp xq, widthq > + jl .loop2 > +.end: > + RET > +%endmacro > + > +%if ARCH_X86_64 > +%if HAVE_AVX512ICL_EXTERNAL > +INIT_ZMM avx512icl > +FILTER_SOBEL > +%endif > +%endif > diff --git a/libavfilter/x86/vf_convolution_init.c > b/libavfilter/x86/vf_convolution_init.c > index b78a47d02b..52a3d28991 100644 > --- a/libavfilter/x86/vf_convolution_init.c > +++ b/libavfilter/x86/vf_convolution_init.c > @@ -29,6 +29,11 @@ void ff_filter_3x3_sse4(uint8_t *dst, int width, > const uint8_t *c[], int peak, int radius, > int dstride, int stride, int size); > > +void ff_filter_sobel_avx512icl(uint8_t *dst, int width, > + float scale, float delta, const int *const matrix, > + const uint8_t *c[], int peak, int radius, > + int dstride, int stride, int size); > + > av_cold void ff_convolution_init_x86(ConvolutionContext *s) { #if > ARCH_X86_64 @@ -44,3 +49,16 @@ av_cold void > ff_convolution_init_x86(ConvolutionContext *s) > } > #endif > } > + > +av_cold void ff_sobel_init_x86(ConvolutionContext *s) { #if > +ARCH_X86_64 > + int cpu_flags = av_get_cpu_flags(); > + for (int i = 0; i < s->nb_planes; i++) { > + if (s->depth == 8) { > + if (EXTERNAL_AVX512ICL(cpu_flags)) > + s->filter[i] = ff_filter_sobel_avx512icl; > + } > + } > +#endif > +} > diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index > 1ac170491b..4e91547fde 100644 > --- a/tests/checkasm/Makefile > +++ b/tests/checkasm/Makefile > @@ -44,6 +44,7 @@ AVFILTEROBJS-$(CONFIG_GBLUR_FILTER) += vf_gblur.o > AVFILTEROBJS-$(CONFIG_HFLIP_FILTER) += vf_hflip.o > AVFILTEROBJS-$(CONFIG_THRESHOLD_FILTER) += vf_threshold.o > AVFILTEROBJS-$(CONFIG_NLMEANS_FILTER) += vf_nlmeans.o > +AVFILTEROBJS-$(CONFIG_CONVOLUTION_FILTER) += vf_convolution.o > > CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) > > diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c > index e56fd3850e..ae5e9d1143 100644 > --- a/tests/checkasm/checkasm.c > +++ b/tests/checkasm/checkasm.c > @@ -191,6 +191,9 @@ static const struct { > #if CONFIG_THRESHOLD_FILTER > { "vf_threshold", checkasm_check_vf_threshold }, > #endif > + #if CONFIG_CONVOLUTION_FILTER > + { "vf_convolution", checkasm_check_vf_convolution }, > + #endif > #endif > #if CONFIG_SWSCALE > { "sw_gbrp", checkasm_check_sw_gbrp }, diff --git > a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index > d7645d3730..71c722cf77 100644 > --- a/tests/checkasm/checkasm.h > +++ b/tests/checkasm/checkasm.h > @@ -85,6 +85,7 @@ void checkasm_check_vf_eq(void); void > checkasm_check_vf_gblur(void); void checkasm_check_vf_hflip(void); > void checkasm_check_vf_threshold(void); > +void checkasm_check_vf_convolution(void); > void checkasm_check_vp8dsp(void); > void checkasm_check_vp9dsp(void); > void checkasm_check_videodsp(void); > diff --git a/tests/checkasm/vf_convolution.c > b/tests/checkasm/vf_convolution.c new file mode 100644 index > 0000000000..353b32af5b > --- /dev/null > +++ b/tests/checkasm/vf_convolution.c > @@ -0,0 +1,116 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published > +by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > +along > + * with FFmpeg; if not, write to the Free Software Foundation, Inc., > + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > + */ > + > +#include > +#include "checkasm.h" > +#include "libavfilter/avfilter.h" > +#include "libavfilter/convolution.h" > +#include "libavutil/intreadwrite.h" > +#include "libavutil/mem_internal.h" > + > +#define WIDTH 512 > +#define HEIGHT 512 > +#define SRC_STRIDE 512 > +#define PIXELS (WIDTH * HEIGHT) > + > +#define randomize_buffers(buf, size) \ > + do { \ > + int j; \ > + uint8_t *tmp_buf = (uint8_t *)buf;\ > + for (j = 0; j< size; j++) \ > + tmp_buf[j] = rnd() & 0xFF; \ > + } while (0) > + > +static void check_sobel(const char * report_name) { > + LOCAL_ALIGNED_32(uint8_t, src, [PIXELS]); > + LOCAL_ALIGNED_32(uint8_t, dst_ref, [PIXELS]); > + LOCAL_ALIGNED_32(uint8_t, dst_new, [PIXELS]); > + const int height = WIDTH; > + const int width = HEIGHT; > + const int stride = SRC_STRIDE; > + const int dstride = SRC_STRIDE; > + int mode = 0; > + const uint8_t *c[49]; > + const int radius = 1; > + const int bpc = 1; > + const int step = mode == MATRIX_COLUMN ? 16 : 1; > + const int slice_start = 0; > + const int slice_end = height; > + int y; > + const int sizew = mode == MATRIX_COLUMN ? height : width; > + > + AVFilterContext *filter_in, *filter_sobel; > + AVFilterGraph *filter_graph; > + AVFilterLink *inlink; > + ConvolutionContext *s; > + char args[255]; > + > + filter_graph = avfilter_graph_alloc(); > + snprintf(args, sizeof(args), "video_size=%dx%d:pix_fmt=%d:time_base=%d/%d:pixel_aspect=%d/%d", WIDTH, HEIGHT, AV_PIX_FMT_YUV420P, 1, 1, 1, 1); > + avfilter_graph_create_filter(&filter_in, avfilter_get_by_name("buffer"), "in", args, NULL, filter_graph); > + avfilter_graph_create_filter(&filter_sobel, avfilter_get_by_name(report_name), NULL, NULL, NULL, filter_graph); > + avfilter_link(filter_in, 0, filter_sobel, 0); > + > + inlink = filter_sobel->inputs[0]; > + inlink->format = AV_PIX_FMT_YUV420P; > + ff_filter_param_init(filter_sobel); > + s = filter_sobel->priv; > + > + declare_func(void, uint8_t *dst, int width, float scale, float delta, const int *const matrix, > + const uint8_t *c[], int peak, int radius, int > + dstride, int stride, int size); > + > + float scale = 2; > + float delta = 10; > + > + memset(dst_ref, 0, PIXELS); > + memset(dst_new, 0, PIXELS); > + randomize_buffers(src, PIXELS); > + > + if (check_func(s->filter[0], "%s", report_name)) { > + for (y = slice_start; y < slice_end; y += step) { > + const int xoff = mode == MATRIX_COLUMN ? (y - slice_start) * bpc : radius * bpc; > + const int yoff = mode == MATRIX_COLUMN ? radius * dstride > + : 0; > + > + s->setup[0](radius, c, src, stride, radius, width, y, height, bpc); > + call_ref(dst_ref + yoff + xoff, sizew - 2 * radius, > + scale, delta, NULL, c, 0, radius, > + dstride, stride, slice_end - step); > + call_new(dst_new + yoff + xoff, sizew - 2 * radius, > + scale, delta, NULL, c, 0, radius, > + dstride, stride, slice_end - step); > + if (memcmp(dst_ref + yoff + xoff, dst_new + yoff + xoff, slice_end - step)) > + fail(); > + bench_new(dst_new + yoff + xoff, sizew - 2 * radius, > + scale, delta, NULL, c, 0, radius, > + dstride, stride, slice_end - step); > + if (mode != MATRIX_COLUMN) > + dst_ref += dstride; > + } > + } > + > + avfilter_free(filter_in); > + avfilter_free(filter_sobel); > + avfilter_graph_free(&filter_graph); checkasm tests typically test the asm functions directly; they don't allocate filtergraphs to do so. > +} > + > +void checkasm_check_vf_convolution(void) > +{ > + check_sobel("sobel"); > + report("convolution:sobel"); > +} Commit patch V2 to avoid allocate filtergraphs in asm unit test https://patchwork.ffmpeg.org/project/ffmpeg/patch/20220815053844.30298-1-bin.wang@intel.com/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".