From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id F2ECE408DE for ; Thu, 28 Sep 2023 05:46:06 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id C53CB68CA55; Thu, 28 Sep 2023 08:46:03 +0300 (EEST) Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 838A368CA03 for ; Thu, 28 Sep 2023 08:45:57 +0300 (EEST) Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-65afae9e544so48898966d6.2 for ; Wed, 27 Sep 2023 22:45:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695879955; x=1696484755; darn=ffmpeg.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=Dmv4OBEp18TUiBAHewLjv67K0WeErz8Q7zUGJa5aptg=; b=S5O+Ks2a/BmfyfbNz+0LDppB5rv9wRLEZ4c1p+SSuoBoNaIYZfu5DFBGDL/PvhMKyM WPmbZRIrnD/NH0rPJSWNsoV3stNivupuCSfc/1zkUWqmGxDZVXsprFj7ruJYUtdtoyVB 7G//tWTK++4/V4s1st3SopJC83xi3a0vSdhD8fZ6lysqqOT84XrdPYxkCATMKkqV84+i eKjCjBJP6KfYh7aQy/b12fU1hqFbNS2ZdOSHzcXrHZ+N2gGl6D/Ph/a0fbrm9VqS0jf8 gxEedSMAXK5InXDdhTWOx+SK9IZII1HtDrm00si3Y/RdaLOWH0RuGLkCLGYmB+Z/fpLW cPIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695879955; x=1696484755; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Dmv4OBEp18TUiBAHewLjv67K0WeErz8Q7zUGJa5aptg=; b=RWfqAOt23Dv/Y+nDw3d9jlZWY/pjw6Wv3sMzrljHpPHH0xiWQ8UEZ1rDnsv1XgwnRA DEovOE9f4JR3k/Yj9bSMZqlPSj5nUOgiLQAcggh7siSP2l5Jgu9vzxyl2OCDLt5Lo2W0 etFndNju/W4ayQC6h24kjpi2qo4JBvJejgVr3wPJhzKRYo9atV1Bp7tbrFYwMYVf6ahT y0kWH08wh3YqiyIvWhuj4t1mzJXX2OnUPbLgpCB8JPvIijX9ia3Teq43j2oseHvQOpac cZqR9WNrpQF4OBz+o26i3aZouQX+sajJ1bmGVVDy9JltF4VNx4NZiShNEWh7e8hZo1A4 lHuQ== X-Gm-Message-State: AOJu0Yynv70PhX6klulKBcNvDrexMNgpD69qv3R+rRLK/hUnmhuieEvk vBOVAM3Ml41uQ+gzNOdHsBngNw5bSGsysqz9/ksusI3ZlT0= X-Google-Smtp-Source: AGHT+IGhmXokjFh032XOvqYopK2y+sEmUupfnqtoH4KyKaKer3d49rKmBXyXpMXw5ZK8k6c/5+soAkMQrdYKKbl7wqk= X-Received: by 2002:a0c:a9d4:0:b0:65b:d75:129d with SMTP id c20-20020a0ca9d4000000b0065b0d75129dmr226102qvb.34.1695879955132; Wed, 27 Sep 2023 22:45:55 -0700 (PDT) MIME-Version: 1.0 References: <2238622.iZASKD2KPV@basile.remlab.net> In-Reply-To: <2238622.iZASKD2KPV@basile.remlab.net> From: flow gg Date: Thu, 28 Sep 2023 13:45:44 +0800 Message-ID: To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="0000000000004f2b2b060664d2d6" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --0000000000004f2b2b060664d2d6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Okay, I revert the volatile in ff_read_time How about this version? use vls instead vlseg, and use vfmacc The benchmark is sometimes better, sometimes the same fcmul_add_c: 3.5 fcmul_add_rvv_f32: 3.5 - af_afir.fcmul_add [OK] fcmul_add_c: 4.5 fcmul_add_rvv_f32: 4.2 - af_afir.fcmul_add [OK] fcmul_add_c: 4.2 fcmul_add_rvv_f32: 4.2 - af_afir.fcmul_add [OK] fcmul_add_c: 4.5 fcmul_add_rvv_f32: 4.2 - af_afir.fcmul_add [OK] fcmul_add_c: 4.7 fcmul_add_rvv_f32: 3.5 R=C3=A9mi Denis-Courmont =E4=BA=8E2023=E5=B9=B49=E6=9C=88= 28=E6=97=A5=E5=91=A8=E5=9B=9B 00:41=E5=86=99=E9=81=93=EF=BC=9A > Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a =C3=A9crit : > > benchmark: > > fcmul_add_c: 19.7 > > fcmul_add_rvv_f32: 6.7 > > With optimisations enabled and the benchmarking fix, I get this (on the > same > hardware, I believe): > > fcmul_add_c: 3.5 > fcmul_add_rvv_f32: 6.7 > > For sure unfortunate design limitations of T-Head C910 are to blame to no > small extent. It is not the first occurrence of an RVV optimisation that > turns > out worse than scalar due to those, and I still have honest hopes that > newer > (and conformant) IP would give saner results, but... I also believe that > the > code could be improved regardless. > > -- > R=C3=A9mi Denis-Courmont > http://www.remlab.net/ > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > --0000000000004f2b2b060664d2d6 Content-Type: text/x-patch; charset="UTF-8"; name="0001-af_afir-RISC-V-V-fcmul_add.patch" Content-Disposition: attachment; filename="0001-af_afir-RISC-V-V-fcmul_add.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_ln2qz4c90 RnJvbSAzZTk1Njk1OGIwMWU3ODBiNzM2MGRhY2U1OWI2MjQ4ZjYxYTZmMTJjIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBoIDxobGVmdGhsZWZ0QGdtYWlsLmNvbT4KRGF0ZTogVHVlLCAy NiBTZXAgMjAyMyAxNTowMzoxMiArMDgwMApTdWJqZWN0OiBbUEFUQ0hdIGFmX2FmaXI6IFJJU0Mt ViBWIGZjbXVsX2FkZAoKLS0tCiBsaWJhdmZpbHRlci9hZl9hZmlyZHNwLmggICAgICAgICB8ICAz ICsrCiBsaWJhdmZpbHRlci9yaXNjdi9NYWtlZmlsZSAgICAgICB8ICAyICsrCiBsaWJhdmZpbHRl ci9yaXNjdi9hZl9hZmlyX2luaXQuYyB8IDM5ICsrKysrKysrKysrKysrKysrKysrCiBsaWJhdmZp bHRlci9yaXNjdi9hZl9hZmlyX3J2di5TICB8IDYxICsrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrCiA0IGZpbGVzIGNoYW5nZWQsIDEwNSBpbnNlcnRpb25zKCspCiBjcmVhdGUgbW9kZSAx MDA2NDQgbGliYXZmaWx0ZXIvcmlzY3YvTWFrZWZpbGUKIGNyZWF0ZSBtb2RlIDEwMDY0NCBsaWJh dmZpbHRlci9yaXNjdi9hZl9hZmlyX2luaXQuYwogY3JlYXRlIG1vZGUgMTAwNjQ0IGxpYmF2Zmls dGVyL3Jpc2N2L2FmX2FmaXJfcnZ2LlMKCmRpZmYgLS1naXQgYS9saWJhdmZpbHRlci9hZl9hZmly ZHNwLmggYi9saWJhdmZpbHRlci9hZl9hZmlyZHNwLmgKaW5kZXggNDIwODUwMTM5My4uZDJkMWU5 MDljMSAxMDA2NDQKLS0tIGEvbGliYXZmaWx0ZXIvYWZfYWZpcmRzcC5oCisrKyBiL2xpYmF2Zmls dGVyL2FmX2FmaXJkc3AuaApAQCAtMzQsNiArMzQsNyBAQCB0eXBlZGVmIHN0cnVjdCBBdWRpb0ZJ UkRTUENvbnRleHQgewogfSBBdWRpb0ZJUkRTUENvbnRleHQ7CiAKIHZvaWQgZmZfYWZpcl9pbml0 X3g4NihBdWRpb0ZJUkRTUENvbnRleHQgKnMpOwordm9pZCBmZl9hZmlyX2luaXRfcmlzY3YoQXVk aW9GSVJEU1BDb250ZXh0ICpzKTsKIAogc3RhdGljIHZvaWQgZmNtdWxfYWRkX2MoZmxvYXQgKnN1 bSwgY29uc3QgZmxvYXQgKnQsIGNvbnN0IGZsb2F0ICpjLCBwdHJkaWZmX3QgbGVuKQogewpAQCAt NzYsNiArNzcsOCBAQCBzdGF0aWMgYXZfdW51c2VkIHZvaWQgZmZfYWZpcl9pbml0KEF1ZGlvRklS RFNQQ29udGV4dCAqZHNwKQogCiAjaWYgQVJDSF9YODYKICAgICBmZl9hZmlyX2luaXRfeDg2KGRz cCk7CisjZWxpZiBBUkNIX1JJU0NWCisgICAgZmZfYWZpcl9pbml0X3Jpc2N2KGRzcCk7CiAjZW5k aWYKIH0KIApkaWZmIC0tZ2l0IGEvbGliYXZmaWx0ZXIvcmlzY3YvTWFrZWZpbGUgYi9saWJhdmZp bHRlci9yaXNjdi9NYWtlZmlsZQpuZXcgZmlsZSBtb2RlIDEwMDY0NAppbmRleCAwMDAwMDAwMDAw Li4wYjk2OGE5YzBkCi0tLSAvZGV2L251bGwKKysrIGIvbGliYXZmaWx0ZXIvcmlzY3YvTWFrZWZp bGUKQEAgLTAsMCArMSwyIEBACitPQkpTICs9IHJpc2N2L2FmX2FmaXJfaW5pdC5vCitSVlYtT0JK UyArPSByaXNjdi9hZl9hZmlyX3J2di5vCmRpZmYgLS1naXQgYS9saWJhdmZpbHRlci9yaXNjdi9h Zl9hZmlyX2luaXQuYyBiL2xpYmF2ZmlsdGVyL3Jpc2N2L2FmX2FmaXJfaW5pdC5jCm5ldyBmaWxl IG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAwMDAuLmZmYTE3NmFiZDIKLS0tIC9kZXYvbnVsbAor KysgYi9saWJhdmZpbHRlci9yaXNjdi9hZl9hZmlyX2luaXQuYwpAQCAtMCwwICsxLDM5IEBACisv KgorICogQ29weXJpZ2h0IMKpIDIwMjMgaGxlZnQKKyAqCisgKiBUaGlzIGZpbGUgaXMgcGFydCBv ZiBGRm1wZWcuCisgKgorICogRkZtcGVnIGlzIGZyZWUgc29mdHdhcmU7IHlvdSBjYW4gcmVkaXN0 cmlidXRlIGl0IGFuZC9vcgorICogbW9kaWZ5IGl0IHVuZGVyIHRoZSB0ZXJtcyBvZiB0aGUgR05V IExlc3NlciBHZW5lcmFsIFB1YmxpYworICogTGljZW5zZSBhcyBwdWJsaXNoZWQgYnkgdGhlIEZy ZWUgU29mdHdhcmUgRm91bmRhdGlvbjsgZWl0aGVyCisgKiB2ZXJzaW9uIDIuMSBvZiB0aGUgTGlj ZW5zZSwgb3IgKGF0IHlvdXIgb3B0aW9uKSBhbnkgbGF0ZXIgdmVyc2lvbi4KKyAqCisgKiBGRm1w ZWcgaXMgZGlzdHJpYnV0ZWQgaW4gdGhlIGhvcGUgdGhhdCBpdCB3aWxsIGJlIHVzZWZ1bCwKKyAq IGJ1dCBXSVRIT1VUIEFOWSBXQVJSQU5UWTsgd2l0aG91dCBldmVuIHRoZSBpbXBsaWVkIHdhcnJh bnR5IG9mCisgKiBNRVJDSEFOVEFCSUxJVFkgb3IgRklUTkVTUyBGT1IgQSBQQVJUSUNVTEFSIFBV UlBPU0UuICBTZWUgdGhlIEdOVQorICogTGVzc2VyIEdlbmVyYWwgUHVibGljIExpY2Vuc2UgZm9y IG1vcmUgZGV0YWlscy4KKyAqCisgKiBZb3Ugc2hvdWxkIGhhdmUgcmVjZWl2ZWQgYSBjb3B5IG9m IHRoZSBHTlUgTGVzc2VyIEdlbmVyYWwgUHVibGljCisgKiBMaWNlbnNlIGFsb25nIHdpdGggRkZt cGVnOyBpZiBub3QsIHdyaXRlIHRvIHRoZSBGcmVlIFNvZnR3YXJlCisgKiBGb3VuZGF0aW9uLCBJ bmMuLCA1MSBGcmFua2xpbiBTdHJlZXQsIEZpZnRoIEZsb29yLCBCb3N0b24sIE1BIDAyMTEwLTEz MDEgVVNBCisgKi8KKworI2luY2x1ZGUgPHN0ZGludC5oPgorCisjaW5jbHVkZSAiY29uZmlnLmgi CisjaW5jbHVkZSAibGliYXZ1dGlsL2F0dHJpYnV0ZXMuaCIKKyNpbmNsdWRlICJsaWJhdnV0aWwv Y3B1LmgiCisjaW5jbHVkZSAibGliYXZmaWx0ZXIvYWZfYWZpcmRzcC5oIgorCit2b2lkIGZmX2Zj bXVsX2FkZF9ydnYoZmxvYXQgKnN1bSwgY29uc3QgZmxvYXQgKnQsIGNvbnN0IGZsb2F0ICpjLAor ICAgICAgICAgICAgICAgICAgICAgICBwdHJkaWZmX3QgbGVuKTsKKworYXZfY29sZCB2b2lkIGZm X2FmaXJfaW5pdF9yaXNjdihBdWRpb0ZJUkRTUENvbnRleHQgKnMpCit7CisjaWYgSEFWRV9SVlYK KyAgICBpbnQgZmxhZ3MgPSBhdl9nZXRfY3B1X2ZsYWdzKCk7CisKKyAgICBpZiAoZmxhZ3MgJiBB Vl9DUFVfRkxBR19SVlZfRjMyKQorICAgICAgICBzLT5mY211bF9hZGQgPSBmZl9mY211bF9hZGRf cnZ2OworI2VuZGlmCit9CmRpZmYgLS1naXQgYS9saWJhdmZpbHRlci9yaXNjdi9hZl9hZmlyX3J2 di5TIGIvbGliYXZmaWx0ZXIvcmlzY3YvYWZfYWZpcl9ydnYuUwpuZXcgZmlsZSBtb2RlIDEwMDY0 NAppbmRleCAwMDAwMDAwMDAwLi42YzE1NTg2MDA3Ci0tLSAvZGV2L251bGwKKysrIGIvbGliYXZm aWx0ZXIvcmlzY3YvYWZfYWZpcl9ydnYuUwpAQCAtMCwwICsxLDYxIEBACisvKgorICogQ29weXJp Z2h0IMKpIDIwMjMgaGxlZnQKKyAqCisgKiBUaGlzIGZpbGUgaXMgcGFydCBvZiBGRm1wZWcuCisg KgorICogRkZtcGVnIGlzIGZyZWUgc29mdHdhcmU7IHlvdSBjYW4gcmVkaXN0cmlidXRlIGl0IGFu ZC9vcgorICogbW9kaWZ5IGl0IHVuZGVyIHRoZSB0ZXJtcyBvZiB0aGUgR05VIExlc3NlciBHZW5l cmFsIFB1YmxpYworICogTGljZW5zZSBhcyBwdWJsaXNoZWQgYnkgdGhlIEZyZWUgU29mdHdhcmUg Rm91bmRhdGlvbjsgZWl0aGVyCisgKiB2ZXJzaW9uIDIuMSBvZiB0aGUgTGljZW5zZSwgb3IgKGF0 IHlvdXIgb3B0aW9uKSBhbnkgbGF0ZXIgdmVyc2lvbi4KKyAqCisgKiBGRm1wZWcgaXMgZGlzdHJp YnV0ZWQgaW4gdGhlIGhvcGUgdGhhdCBpdCB3aWxsIGJlIHVzZWZ1bCwKKyAqIGJ1dCBXSVRIT1VU IEFOWSBXQVJSQU5UWTsgd2l0aG91dCBldmVuIHRoZSBpbXBsaWVkIHdhcnJhbnR5IG9mCisgKiBN RVJDSEFOVEFCSUxJVFkgb3IgRklUTkVTUyBGT1IgQSBQQVJUSUNVTEFSIFBVUlBPU0UuICBTZWUg dGhlIEdOVQorICogTGVzc2VyIEdlbmVyYWwgUHVibGljIExpY2Vuc2UgZm9yIG1vcmUgZGV0YWls cy4KKyAqCisgKiBZb3Ugc2hvdWxkIGhhdmUgcmVjZWl2ZWQgYSBjb3B5IG9mIHRoZSBHTlUgTGVz c2VyIEdlbmVyYWwgUHVibGljCisgKiBMaWNlbnNlIGFsb25nIHdpdGggRkZtcGVnOyBpZiBub3Qs IHdyaXRlIHRvIHRoZSBGcmVlIFNvZnR3YXJlCisgKiBGb3VuZGF0aW9uLCBJbmMuLCA1MSBGcmFu a2xpbiBTdHJlZXQsIEZpZnRoIEZsb29yLCBCb3N0b24sIE1BIDAyMTEwLTEzMDEgVVNBCisgKi8K KworI2luY2x1ZGUgImxpYmF2dXRpbC9yaXNjdi9hc20uUyIKKworLy8gIHZvaWQgZmZfZmNtdWxf YWRkKGZsb2F0ICpzdW0sIGNvbnN0IGZsb2F0ICp0LCBjb25zdCBmbG9hdCAqYywgaW50IGxlbikK K2Z1bmMgZmZfZmNtdWxfYWRkX3J2diwgenZlMzJmCisxOgorICAgICAgICB2c2V0dmxpICAgICB0 MCwgYTMsIGUzMiwgbTEsIHRhLCBtYQorICAgICAgICBsaSAgICAgICAgICB0MSwgOAorICAgICAg ICBsaSAgICAgICAgICB0MiwgNAorICAgICAgICB2bHNlMzIudiAgICB2MCwgKGExKSwgdDEKKyAg ICAgICAgYWRkICAgICAgICAgYTEsIGExLCB0MgorICAgICAgICB2bHNlMzIudiAgICB2MSwgKGEx KSwgdDEKKyAgICAgICAgc3ViICAgICAgICAgYTEsIGExLCB0MgorICAgICAgICB2bHNlMzIudiAg ICB2MiwgKGEyKSwgdDEKKyAgICAgICAgYWRkICAgICAgICAgYTIsIGEyLCB0MgorICAgICAgICB2 bHNlMzIudiAgICB2MywgKGEyKSwgdDEKKyAgICAgICAgc3ViICAgICAgICAgYTIsIGEyLCB0Mgor ICAgICAgICB2bHNlMzIudiAgICB2NCwgKGEwKSwgdDEKKyAgICAgICAgYWRkICAgICAgICAgYTAs IGEwLCB0MgorICAgICAgICB2bHNlMzIudiAgICB2NSwgKGEwKSwgdDEKKyAgICAgICAgc3ViICAg ICAgICAgYTAsIGEwLCB0MgorICAgICAgICB2Zm1hY2MudnYgICB2NCwgdjAsIHYyCisgICAgICAg IHZmbm1zYWMudnYgIHY0LCB2MSwgdjMKKyAgICAgICAgdmZtYWNjLnZ2ICAgdjUsIHYwLCB2Mwor ICAgICAgICB2Zm1hY2MudnYgICB2NSwgdjEsIHYyCisgICAgICAgIHZzc2VnMmUzMi52IHY0LCAo YTApCisgICAgICAgIG11bCAgICAgICAgIHQzLCB0MSwgdDAKKyAgICAgICAgYWRkICAgICAgICAg YTAsIGEwLCB0MworICAgICAgICBhZGQgICAgICAgICBhMSwgYTEsIHQzCisgICAgICAgIGFkZCAg ICAgICAgIGEyLCBhMiwgdDMKKyAgICAgICAgc3ViICAgICAgICAgYTMsIGEzLCB0MAorICAgICAg ICBiZ3R6ICAgICAgICBhMywgMWIKKworICAgICAgICBmbHcgICAgICAgICBmYTAsIDAoYTEpCisg ICAgICAgIGZsdyAgICAgICAgIGZhMSwgMChhMikKKyAgICAgICAgZmx3ICAgICAgICAgZmEyLCAw KGEwKQorICAgICAgICBmbXVsLnMgICAgICBmYTAsIGZhMCwgZmExCisgICAgICAgIGZhZGQucyAg ICAgIGZhMiwgZmEyLCBmYTAKKyAgICAgICAgZnN3ICAgICAgICAgZmEyLCAwKGEwKQorCisgICAg ICAgIHJldAorZW5kZnVuYwotLSAKMi40Mi4wCgo= --0000000000004f2b2b060664d2d6 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --0000000000004f2b2b060664d2d6--