From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 108BC436C2 for ; Fri, 21 Oct 2022 03:41:39 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0270668BE0D; Fri, 21 Oct 2022 06:41:37 +0300 (EEST) Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id CDD9268BDB7 for ; Fri, 21 Oct 2022 06:41:30 +0300 (EEST) Received: by mail-qt1-f172.google.com with SMTP id l28so936502qtv.4 for ; Thu, 20 Oct 2022 20:41:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=obe-tv.20210112.gappssmtp.com; s=20210112; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=Ra7Fhp6AIb/qBn23vMnLAb+3tfIXZIclCG+9MG3a5r8=; b=jsJOfEUGXlneEaGw/5rWNd3YUg5eJq0HNo25RBF9Tp6a4rc2DEIExHtWK2QkpM2XUt ZUyjJFtVbX+/UCG3IJEYiLBVZ2LQbeggWSNvIasnPLH8/+23tIXsSq+aw9uzoYv1JZ7S ++vm95kcKVk+ZjC61foPgRccXth9gPGK2vVHTWbb2r+R+K6B7fHO3K1iW7KuzmTgxEFr PPKPVR0zESC0szTPBC/8Fnzsx2GlBlg5FwFhnzXqIbHd2AkGJ2KZgBrPW1O9vbrmJXOR w9NwzbLqL8xmPKSackRI8eGo70XNIHrB3S82tkwxGw+6eo0ObT9cCn32nPr/W/keoGsv Qpbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Ra7Fhp6AIb/qBn23vMnLAb+3tfIXZIclCG+9MG3a5r8=; b=XJUn3hmHU8b9KNEoR3ecChtIAApsNPeeor4H920wDKyF/ynFLV7L2ZwPtnKti8gpm5 yrrVAKiXdy1z8ugbKS5Po2FuRq6YoWdp1J+AmSlqT3UHe1btMFGF9AOSzaf4dmveBzbO U3fFE63LNnEDYNHwwQ0Dp5bJpXtYaDItlcfQDJgLDVzn9KlWa1SyC8pAg6bXvMrW9MHS fqupTIi7e5BM7RjLmXiUy27+EsudlBkCNu5fxc3qSMC3iVNkQm2HtZ23wVv0zRtJeSYp WwWCJfKM9uiaXfcGNH0gGvY7Y6HEvW57vMjFV4ONghBwXPnqC7CxOxM/QB6S24IIRrnj mQ+w== X-Gm-Message-State: ACrzQf2mJiPlQ/WZW127e46WyTzleJRQTXN9tTaTfM5iKvMi7vih+PAx j7wWTKXmXRdfOj80MjSq4J46XwUCJV4XCJUyqX+8OaibZpQcPg== X-Google-Smtp-Source: AMsMyM6JAZRpuBhW/sGLOKE2aSpj9f9NibY0pgVHM2iP8A5YW3ORkwUfJN3CZxJ2Cr33S/TThQ3hnde71pikhJkoViw= X-Received: by 2002:ac8:7d95:0:b0:39c:f1a5:7391 with SMTP id c21-20020ac87d95000000b0039cf1a57391mr14180996qtd.605.1666323688512; Thu, 20 Oct 2022 20:41:28 -0700 (PDT) MIME-Version: 1.0 From: Kieran Kunhya Date: Fri, 21 Oct 2022 04:41:17 +0100 Message-ID: To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="0000000000008932dd05eb8337c8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [FFmpeg-devel] [PATCH] RFC: v210enc optimisations and initial AVX-512 X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --0000000000008932dd05eb8337c8 Content-Type: text/plain; charset="UTF-8" Hi, Please see attached an attempt to optimise the 8-bit input to v210enc to reduce the number of shuffles. This comes at the cost of having to extract the middle element and perform a DWORD shift on it and then reinserting it. I have added a few comments but any other ideas are welcome. Crude benchmarks on Intel(R) Xeon(R) D-2123IT: Before: v210_planar_pack_8_ssse3: 316.5 v210_planar_pack_8_avx: 319.0 v210_planar_pack_8_avx2: 223.0 After: v210_planar_pack_8_ssse3: 321.0 v210_planar_pack_8_avx: 326.0 v210_planar_pack_8_avx2: 217.0 v210_planar_pack_8_avx512: 211.0 Regards, Kieran Kunhya --0000000000008932dd05eb8337c8 Content-Type: application/octet-stream; name="0001-RFC-v210enc-optimisations-and-initial-AVX-512.patch" Content-Disposition: attachment; filename="0001-RFC-v210enc-optimisations-and-initial-AVX-512.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_l9hxwn5v0 RnJvbSBhMWI2YmQ0N2JjYmFkM2YxODhhNzg2MTZjYThjMWI2MTM0YTExM2I1IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBLaWVyYW4gS3VuaHlhIDxraWVyYW5rQG9iZS50dj4KRGF0ZTog RnJpLCAyMSBPY3QgMjAyMiAwNDoxODoxMSArMDEwMApTdWJqZWN0OiBbUEFUQ0hdIFJGQzogdjIx MGVuYyBvcHRpbWlzYXRpb25zIGFuZCBpbml0aWFsIEFWWC01MTIKCi0tLQogbGliYXZjb2RlYy94 ODYvdjIxMGVuYy5hc20gICAgfCA1OSArKysrKysrKysrKysrKysrKysrKysrLS0tLS0tLS0tLS0t LQogbGliYXZjb2RlYy94ODYvdjIxMGVuY19pbml0LmMgfCAgNyArKysrKwogMiBmaWxlcyBjaGFu Z2VkLCA0NSBpbnNlcnRpb25zKCspLCAyMSBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9saWJh dmNvZGVjL3g4Ni92MjEwZW5jLmFzbSBiL2xpYmF2Y29kZWMveDg2L3YyMTBlbmMuYXNtCmluZGV4 IDk2NWYyYmVhM2MuLjJkMDgyN2JiZDAgMTAwNjQ0Ci0tLSBhL2xpYmF2Y29kZWMveDg2L3YyMTBl bmMuYXNtCisrKyBiL2xpYmF2Y29kZWMveDg2L3YyMTBlbmMuYXNtCkBAIC0zOCwxMyArMzgsMTYg QEAgY2V4dGVybiBwYl8xCiBjZXh0ZXJuIHBiX0ZFCiAlZGVmaW5lIHYyMTBfZW5jX21heF84IHBi X0ZFCiAKLXYyMTBfZW5jX2x1bWFfc2h1Zl84OiB0aW1lcyAyIGRiIDYsLTEsNywtMSw4LC0xLDks LTEsMTAsLTEsMTEsLTEsLTEsLTEsLTEsLTEKLXYyMTBfZW5jX2x1bWFfbXVsdF84OiB0aW1lcyAy IGR3IDE2LDQsNjQsMTYsNCw2NCwwLDAKK3YyMTBfZW5jX211bHRfODogdGltZXMgOCBkdyA0LDY0 Cit2MjEwX2VuY19zaGlmdF84OiB0aW1lcyA4IGR3IDIsNgordjIxMF9lbmNfbWFza184OiB0aW1l cyA4IGRiIDB4MDAsMHhmZiwweDAwLDB4MDAKK3YyMTBfZW5jX21hc2syXzg6IHRpbWVzIDggZGIg MHhmZiwweDAzLDB4ZmYsMHhmZgogCi12MjEwX2VuY19jaHJvbWFfc2h1ZjFfODogdGltZXMgMiBk YiAwLC0xLDEsLTEsMiwtMSwzLC0xLDgsLTEsOSwtMSwxMCwtMSwxMSwtMQotdjIxMF9lbmNfY2hy b21hX3NodWYyXzg6IHRpbWVzIDIgZGIgMywtMSw0LC0xLDUsLTEsNywtMSwxMSwtMSwxMiwtMSwx MywtMSwxNSwtMQordjIxMF9lbmNfbHVtYV9zaHVmMV84OiB0aW1lcyAyIGRiIC0xLDAsLTEsLTEs MSwtMSwyLC0xLC0xLDMsLTEsLTEsNCwtMSw1LC0xCit2MjEwX2VuY19sdW1hX3NodWYyXzg6IHRp bWVzIDIgZGIgLTEsNiwtMSwtMSw3LC0xLDgsLTEsLTEsOSwtMSwtMSwxMCwtMSwxMSwtMQogCi12 MjEwX2VuY19jaHJvbWFfbXVsdF84OiB0aW1lcyAyIGR3IDQsMTYsNjQsMCw2NCw0LDE2LDAKK3Yy MTBfZW5jX2Nocm9tYV9zaHVmMV84OiB0aW1lcyAyIGRiIDAsLTEsOCwtMSwtMSwxLC0xLC0xLDks LTEsMiwtMSwtMSwxMCwtMSwtMQordjIxMF9lbmNfY2hyb21hX3NodWYyXzg6IHRpbWVzIDIgZGIg MywtMSwxMSwtMSwtMSw0LC0xLC0xLDEyLC0xLDUsLTEsLTEsMTMsLTEsLTEKIAogU0VDVElPTiAu dGV4dAogCkBAIC0xMTUsNyArMTE4LDcgQEAgY2dsb2JhbCB2MjEwX3BsYW5hcl9wYWNrXzgsIDUs IDUsIDcsIHksIHUsIHYsIGRzdCwgd2lkdGgKIAogICAgIG1vdmEgICAgbTQsIFt2MjEwX2VuY19t aW5fOF0KICAgICBtb3ZhICAgIG01LCBbdjIxMF9lbmNfbWF4XzhdCi0gICAgcHhvciAgICBtNiwg bTYKKyAgICBtb3ZhICAgIG02LCBbdjIxMF9lbmNfbWFza184XQogCiAubG9vcDoKICAgICBtb3Z1 ICAgICAgICB4bTEsIFt5cSt3aWR0aHEqMl0KQEAgLTEyNCwxNiArMTI3LDYgQEAgY2dsb2JhbCB2 MjEwX3BsYW5hcl9wYWNrXzgsIDUsIDUsIDcsIHksIHUsIHYsIGRzdCwgd2lkdGgKICVlbmRpZgog ICAgIENMSVBVQiAgbTEsIG00LCBtNQogCi0gICAgcHVucGNrbGJ3IG0wLCBtMSwgbTYKLSAgICA7 IGNhbid0IHVucGFjayBoaWdoIGJ5dGVzIGluIHRoZSBzYW1lIHdheSBiZWNhdXNlIHdlIHByb2Nl c3MKLSAgICA7IG9ubHkgc2l4IGJ5dGVzIGF0IGEgdGltZQotICAgIHBzaHVmYiAgbTEsIFt2MjEw X2VuY19sdW1hX3NodWZfOF0KLQotICAgIHBtdWxsdyAgbTAsIFt2MjEwX2VuY19sdW1hX211bHRf OF0KLSAgICBwbXVsbHcgIG0xLCBbdjIxMF9lbmNfbHVtYV9tdWx0XzhdCi0gICAgcHNodWZiICBt MCwgW3YyMTBfZW5jX2x1bWFfc2h1Zl8xMF0KLSAgICBwc2h1ZmIgIG0xLCBbdjIxMF9lbmNfbHVt YV9zaHVmXzEwXQotCiAgICAgbW92cSAgICAgICAgIHhtMywgW3VxK3dpZHRocV0KICAgICBtb3Zo cHMgICAgICAgeG0zLCBbdnErd2lkdGhxXQogJWlmIGNwdWZsYWcoYXZ4MikKQEAgLTE0MywxNCAr MTM2LDMzIEBAIGNnbG9iYWwgdjIxMF9wbGFuYXJfcGFja184LCA1LCA1LCA3LCB5LCB1LCB2LCBk c3QsIHdpZHRoCiAlZW5kaWYKICAgICBDTElQVUIgIG0zLCBtNCwgbTUKIAotICAgIDsgc2h1ZmZs ZSBhbmQgbXVsdGlwbHkgdG8gZ2V0IHRoZSBzYW1lIHBhY2tpbmcgYXMgaW4gMTAtYml0CisgICAg OyB2cGVybWkyYiBpcyBvYnZpb3VzIGNob2ljZSBidXQgdG9vIHNsb3cKKyAgICBwc2h1ZmIgIG0w LCBtMSwgW3YyMTBfZW5jX2x1bWFfc2h1ZjFfOF0KKyAgICBwc2h1ZmIgIG0xLCBbdjIxMF9lbmNf bHVtYV9zaHVmMl84XQorCiAgICAgcHNodWZiICBtMiwgbTMsIFt2MjEwX2VuY19jaHJvbWFfc2h1 ZjFfOF0KICAgICBwc2h1ZmIgIG0zLCBbdjIxMF9lbmNfY2hyb21hX3NodWYyXzhdCiAKLSAgICBw bXVsbHcgIG0yLCBbdjIxMF9lbmNfY2hyb21hX211bHRfOF0KLSAgICBwbXVsbHcgIG0zLCBbdjIx MF9lbmNfY2hyb21hX211bHRfOF0KLSAgICBwc2h1ZmIgIG0yLCBbdjIxMF9lbmNfY2hyb21hX3No dWZfMTBdCi0gICAgcHNodWZiICBtMywgW3YyMTBfZW5jX2Nocm9tYV9zaHVmXzEwXQorICAgIHBv ciAgICAgbTAsIG0yCisgICAgcG9yICAgICBtMSwgbTMKKworICAgIDsgVE9ETzogYXZ4LTUxMiBt YXNrZWQgbW92PworICAgIHBhbmQgICAgbTIsIG02LCBtMAorICAgIHBhbmQgICAgbTMsIG02LCBt MQorCisgICAgcHNsbGQgICBtMiwgNAorICAgIHBzbGxkICAgbTMsIDQKKworJWlmIGNwdWZsYWco YXZ4NTEyKQorICAgIHZwc2xsdncgIG0wLCBbdjIxMF9lbmNfc2hpZnRfOF0KKyAgICB2cHNsbHZ3 ICBtMSwgW3YyMTBfZW5jX3NoaWZ0XzhdCislZWxzZQorICAgIHBtdWxsdyAgbTAsIFt2MjEwX2Vu Y19tdWx0XzhdCisgICAgcG11bGx3ICBtMSwgW3YyMTBfZW5jX211bHRfOF0KKyVlbmRpZgorCisg ICAgcGFuZCBtMCwgW3YyMTBfZW5jX21hc2syXzhdCisgICAgcGFuZCBtMSwgW3YyMTBfZW5jX21h c2syXzhdCiAKICAgICBwb3IgICAgIG0wLCBtMgogICAgIHBvciAgICAgbTEsIG0zCkBAIC0xODIs MyArMTk0LDggQEAgdjIxMF9wbGFuYXJfcGFja184CiBJTklUX1lNTSBhdngyCiB2MjEwX3BsYW5h cl9wYWNrXzgKICVlbmRpZgorCislaWYgSEFWRV9BVlg1MTJfRVhURVJOQUwKK0lOSVRfWU1NIGF2 eDUxMgordjIxMF9wbGFuYXJfcGFja184CislZW5kaWYKZGlmZiAtLWdpdCBhL2xpYmF2Y29kZWMv eDg2L3YyMTBlbmNfaW5pdC5jIGIvbGliYXZjb2RlYy94ODYvdjIxMGVuY19pbml0LmMKaW5kZXgg MTNhMzUxZGQxZC4uMDk1ZWQ1ZTkxMyAxMDA2NDQKLS0tIGEvbGliYXZjb2RlYy94ODYvdjIxMGVu Y19pbml0LmMKKysrIGIvbGliYXZjb2RlYy94ODYvdjIxMGVuY19pbml0LmMKQEAgLTI3LDYgKzI3 LDggQEAgdm9pZCBmZl92MjEwX3BsYW5hcl9wYWNrXzhfYXZ4KGNvbnN0IHVpbnQ4X3QgKnksIGNv bnN0IHVpbnQ4X3QgKnUsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgY29uc3QgdWlu dDhfdCAqdiwgdWludDhfdCAqZHN0LCBwdHJkaWZmX3Qgd2lkdGgpOwogdm9pZCBmZl92MjEwX3Bs YW5hcl9wYWNrXzhfYXZ4Mihjb25zdCB1aW50OF90ICp5LCBjb25zdCB1aW50OF90ICp1LAogICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICBjb25zdCB1aW50OF90ICp2LCB1aW50OF90ICpk c3QsIHB0cmRpZmZfdCB3aWR0aCk7Cit2b2lkIGZmX3YyMTBfcGxhbmFyX3BhY2tfOF9hdng1MTIo Y29uc3QgdWludDhfdCAqeSwgY29uc3QgdWludDhfdCAqdSwKKyAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgY29uc3QgdWludDhfdCAqdiwgdWludDhfdCAqZHN0LCBwdHJkaWZmX3Qgd2lk dGgpOwogdm9pZCBmZl92MjEwX3BsYW5hcl9wYWNrXzEwX3Nzc2UzKGNvbnN0IHVpbnQxNl90ICp5 LCBjb25zdCB1aW50MTZfdCAqdSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBj b25zdCB1aW50MTZfdCAqdiwgdWludDhfdCAqZHN0LAogICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgIHB0cmRpZmZfdCB3aWR0aCk7CkBAIC01Miw0ICs1NCw5IEBAIGF2X2NvbGQgdm9p ZCBmZl92MjEwZW5jX2luaXRfeDg2KFYyMTBFbmNDb250ZXh0ICpzKQogICAgICAgICBzLT5zYW1w bGVfZmFjdG9yXzEwID0gMjsKICAgICAgICAgcy0+cGFja19saW5lXzEwICAgICA9IGZmX3YyMTBf cGxhbmFyX3BhY2tfMTBfYXZ4MjsKICAgICB9CisKKyAgICBpZiAoRVhURVJOQUxfQVZYNTEyKGNw dV9mbGFncykpIHsKKyAgICAgICAgcy0+c2FtcGxlX2ZhY3Rvcl84ICA9IDI7CisgICAgICAgIHMt PnBhY2tfbGluZV84ICAgICAgPSBmZl92MjEwX3BsYW5hcl9wYWNrXzhfYXZ4NTEyOworICAgIH0K IH0KLS0gCjIuMjQuMS53aW5kb3dzLjIKCg== --0000000000008932dd05eb8337c8 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --0000000000008932dd05eb8337c8--