From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 426D643B31 for ; Thu, 14 Jul 2022 12:48:32 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 7801968BA61; Thu, 14 Jul 2022 15:48:29 +0300 (EEST) Received: from sonic304-22.consmr.mail.ir2.yahoo.com (sonic304-22.consmr.mail.ir2.yahoo.com [77.238.179.147]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id BA35068BA43 for ; Thu, 14 Jul 2022 15:48:22 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.it; s=s2048; t=1657802901; bh=LnMQiU3ulHnqvdoPPAbSWETMthtmzbAI5pPA5W0eeJ0=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject:Reply-To; b=Kyo/pgPD/9mddehqphgNnxV1+85Jnd8zXAVE6tdpwdfqaxEqLTz5IERpL7iDEpX3dZl0KbF1SleoBlGGULjanV7LtcneaBzqmWMOFxrJd/Ua/lVObsoy4rFlq7yFvwdG81g0kV2GKBAunBQU8wKTqm7eueTbsf56MpeKwvMfXDSMa0QSCG0FaYdA/S4xZ3WaibrtyXGeCt4o1gRcC4phSxseUax48DVJIr6hso2cO6nRdMtzyHB0DRPWj8mXEWGs3gl54mRoXWzBZTyLbKdrk3kknGk9O0xpAjDkoK5b5BxMStxqWCntwsCWm+i54b79r6J2zQDqo8H1T4dMfSvpxQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1657802901; bh=wLcqhWjV5bv2QhtoqD6Y6eK34Bg3qIXVQJ/Cx4irLGG=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=fDdMIbMR87evyREg4+YxjdkFtPCFtgGhgCOukyfmcfR+Ne5Wr609T2ETEMEJXwR+sq1BHTdbCBDXXEB9ebe57GuscurdcIsKFFM3sd/y+X+65y7Or3v7gH8D++1UxKggr26j9E53iPLTqYRnU+1G765GL7lpxx/PzYwBKwxsX0i9t8PGNEcqH2aZ0bYuuYIpgt0ftRA9zvKMw6fslpwKuXtxBGiz0nhCgTrNtA3uxME9ViSs7brSqCRCOhU2zn+fdDh2/jsf2dz4BwNAK+Wpsh8OkQXFU4O3gSRHH8vjoVuyZM0ZBThZwfuYYCm/yvS5znMC2G4vFH4tv+rXKXYXyA== X-YMail-OSG: jIsYPMwVM1nOtPj7BozLL7o21FLkUELgluq6HQNw0NL9G.LLEC_Hb.zWC46sZK0 GyIc67qLVXXo0zOdjj.FC5u0LWxZm6aJsf5xoyZ311g_IbL7MyIGSgnUw1lVb3Iz767iCINVAo2s lqAaX1hGIB5aPSExZlkzmkPR8EZ8l.5cuiq40_UlYG5zlLrGIhLp0L1ajKgID3JZ_O3ZhqrZrlq6 m7ts0wKM2VodQQiMLKUaJGfNXSe0pR9yVpTypRfU8N5Ti7xZzAzn2aZlYLRXi9bYTCxNux4I02zU 1KfDPXA8VMRv1i3dpZn1WtFazSNUuZw4F7O0cA3sa5Jsq7dR3LGNeiXUE0NI5NlvJl8a7WKJpdB8 muIkR_6.hBOXpbHZ3z5i1YOw218ps_ARwrAzhV89vFszv2ueAGa4IqkwYplJzPYocDgCvCPdNTrz yA0_GDDk8.GKQncKkSplv6.KBOXdAIDVy9uNzemmSpMwbcjepWG3PIzQc5Bi6nfo79PD5mfObXSc Rb_hpLWDCPcwUzcPf0MPqDe70rF7pWJEm8b3aWDyGU04tp3kV8cwyKVBwNpj67PJ19wFoMB.P2A5 vklsmyFiBym0VFqbnUi1QuSqY5yUhyD.7SZmB4UDJMKLxWeAuEjDF0hFqfNOY5Ede2q84cZgaZ2L P8ie1lpH1WbInue8X0S1Xht6PeFI1Y6GO5woJb7OxXCDHmfk.XOcOAah_B4gH7aRpjtESd1EcoQT rgINDv4jnUytWpbjOoIFJrLPLLXDKsy_D6nW3JdLSopo0QcrKSfYNnwQx3Qe4h4vFqbkJe1pSXSs M_U9fyyg1Oin_bauNYS50D_M3Oq23s23XYIcfVH04PlkmrsQ6vKEi_ym51L4Sfk8CaUhQKXAwo4y gShvZ3V0fiPb.6Spw0jJQyru9BTxDUtqMnZDagEpnffJ2hb3_HMIVOFZ1ouUw6.3nX3PghmRjGGO w9K37qUcoCaaFvSFjq9LmRPZp2.LJxptCExXiwMgYn4QCrqo5XvZ2s2LeiJ1SCIQ_0t9oABYHyhR oUQCtung.9aHALzKO.X7rDNoz92kgKTuWeHhOJf36Sa44QoaFPutyouwFm8OI3S.er2tuBPo2AI7 GyoBAtpleeNF4gQO0.tvCqyUlFWUgKK0int71H6XXAcQXa8I1r5VpL.fpvLnHKyJbRLvysPxH2fk FMlV6RSqDVK4fNS6ZHUWaL1ir8XqzucK20co.0LW3..KpvortPNcO2FnWyg17BTPORnf5CCrEFCg AYDP_.R3M9HPBTnHraFUM6sJek8i41cCtMGvZwHgBZZ.4j0NEol9mr0VStS7veDtGe.bUagHXa72 Yn.5B.jFZRqxnpEGbai0xBSk_WUOjpQ5Am5tYuBlEC07Xon9zcsPw.CBZR1FzF5pB0Ygxy_sDtjR evsPCbeHoAjwKu425bVyrL2hl8PkW.udvfw5ZWH7BoLv.gOMuHU_sAA7VLzAw5T1y1yUFGQDoTVU Jw6JXLFycaIg94feWIwhlEXhHqU8N8lsH9L4HL.cX6T8T5QhuVmuO.yqXvogfu.uVDotFtTPzutG Y3nTmDffA0TgVKNvsNXBsQiloh5QgeSCVLXB6OcveiTKCdaz8sTaWRkPTw6ueXRVhT2SDqs7gMGY 3OkWGEYTPNaQz.HUwTqjev.XaXA35SOODyL6f.Kj7PDKvkGhqZA6GDDEbqrJ6M7m8BiAr1Ee5N2U MvT0F1b6sLZzEMu6tQerAdys6JBLDjYt__HV.6LmGLBv4kEECTOP19C7C9vzsJ6JRIbJ5X70dldP isDcXHhUfT2dbLbHu5mBfDAX3GkdD_vNVfBG7BYwCmxShyPUdFd6q2uqUVm7b_5ErlBRTI37tBnR wqbbEN009tm9IZGqOgYVwYPkrSny2pllG0_qbp1ak4uodHaKzkCnPQrt8qCIxFhw4xBlmsCuCFXd x8ayInBycFIn2Jd_2z1n8XVAFhWG5.l7NSSk56AWQWICKqNf9RmhYvTFNKJH0me.tS10fPuUfsWB e9q5_k_O5JnljTFhg20WYbXSMQmpPtD3uvx3R3zYdSytPNbEyck2DNA28v7g0z1iVJWAoMlXKKWG nIk9dZkF3H.5gifLNiBeU9AYYOseF5QAVwrFceYJLqHRamjF0JqXWizdL2Ub3FsnkngZfUyM.SR3 KL9q58Z5XPXkb2uK_zH2nUgZ3LkZKlivJK_1.uxGjWacGQSb3gP9LXBA8MHqLncpCgpHiUCIReoK Qk7pP5JqnIvw4mXb05fYK.faE7jJc37jQB8LmVFRbAgr6eF89oe4N3Ch5gZM1ALWki2p6f2mjcvW GUznS3LiwMF33OZ0NUd9037x6msKxBBcIpSmUVu9n5JxWyTv4hA-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.ir2.yahoo.com with HTTP; Thu, 14 Jul 2022 12:48:21 +0000 Date: Thu, 14 Jul 2022 12:48:06 +0000 (UTC) From: Marco Vianini To: "ffmpeg-devel@ffmpeg.org" Message-ID: <223849145.334536.1657802886620@mail.yahoo.com> In-Reply-To: <3dd32061-2748-8223-eeac-b9a90780cdb1@gmail.com> References: <632087708.1175797.1657705107285.ref@mail.yahoo.com> <632087708.1175797.1657705107285@mail.yahoo.com> <1557769752.1457983.1657724008350@mail.yahoo.com> <606778079.1512531.1657727656800@mail.yahoo.com> <3dd32061-2748-8223-eeac-b9a90780cdb1@gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_334535_514916752.1657802886620" X-Mailer: WebService/1.1.20407 YMailNorrin Subject: Re: [FFmpeg-devel] Performances improvement in "image_copy_plane" X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: ------=_Part_334535_514916752.1657802886620 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wednesday, July 13, 2022 at 06:16:15 PM GMT+2, James Almer wrote:=20 On 7/13/2022 12:54 PM, Marco Vianini wrote: > Sorry, my mail client was using html format. > I hope now the mail will be sent correctly. >=20 >=20 > You can get a very big improvement of performances in the special (but ve= ry likely) case of: "(dst_linesize =3D=3D bytewidth && src_linesize =3D=3D = bytewidth)" >=20 > In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCP= Y, instead of a smaller memcpy for every row (that is looping for height ti= mes). >=20 > Code: > " > static void image_copy_plane(uint8_t=C2=A0 =C2=A0 =C2=A0 =C2=A0*dst, ptrd= iff_t dst_linesize, >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const uint8_t *src, ptrdiff_t src_lin= esize, >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ptrdiff_t bytewidth, int height) > { >=C2=A0 =C2=A0 =C2=A0 if (!dst || !src) >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return; >=C2=A0 =C2=A0 =C2=A0 av_assert0(abs(src_linesize) >=3D bytewidth); >=C2=A0 =C2=A0 =C2=A0 av_assert0(abs(dst_linesize) >=3D bytewidth); >=C2=A0 =C2=A0 =C2=A0=20 >=C2=A0 =C2=A0 =C2=A0 /// MY PATCH START >=C2=A0 =C2=A0 =C2=A0 /// Coalesce rows. >=C2=A0 =C2=A0 =C2=A0 if (dst_linesize =3D=3D bytewidth && src_linesize =3D= =3D bytewidth) { >=C2=A0 =C2=A0 =C2=A0 =C2=A0 bytewidth *=3D height; >=C2=A0 =C2=A0 =C2=A0 =C2=A0 height =3D 1; >=C2=A0 =C2=A0 =C2=A0 =C2=A0 src_linesize =3D dst_linesize =3D 0; >=C2=A0 =C2=A0 =C2=A0 } >=C2=A0 =C2=A0 =C2=A0 /// MY PATCH STOP >=20 >=C2=A0 =C2=A0 =C2=A0 for (;height > 0; height--) { >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 memcpy(dst, src, bytewidth); >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dst +=3D dst_linesize; >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 src +=3D src_linesize; >=C2=A0 =C2=A0 =C2=A0 } > } > " >=20 >=20 > I did following tests on Windows 10 64bit. > I compiled code in Release. > I copied my pc camera frames 1000 times (resolution 1920x1080): >=20 > With Coalesce: > copy_cnt=3D100=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D36574 (average= =3D365.74) > copy_cnt=3D200=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D78207 (average= =3D391.035) > copy_cnt=3D300=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D122170(average= =3D407.233) > copy_cnt=3D400=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D163678(average= =3D409.195) > copy_cnt=3D500=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D201872(average= =3D403.744) > copy_cnt=3D600=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D246174(average= =3D410.29) > copy_cnt=3D700=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D287043(average= =3D410.061) > copy_cnt=3D800=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D326462(average= =3D408.077) > copy_cnt=3D900=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D356882(average= =3D396.536) > copy_cnt=3D1000 size=3D1920x1080 tot_time_copy(us)=3D394566(average=3D394= .566) >=20 > Without Coalesce: > copy_cnt=3D100=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D44303 (average= =3D443.03) > copy_cnt=3D200=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D100501(average= =3D502.505) > copy_cnt=3D300=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D150097(average= =3D500.323) > copy_cnt=3D400=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D201010(average= =3D502.525) > copy_cnt=3D500=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D256818(average= =3D513.636) > copy_cnt=3D600=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D303273(average= =3D505.455) > copy_cnt=3D700=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D359152(average= =3D513.074) > copy_cnt=3D800=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D414413(average= =3D518.016) > copy_cnt=3D900=C2=A0 size=3D1920x1080 tot_time_copy(us)=3D465315(average= =3D517.017) > copy_cnt=3D1000 size=3D1920x1080 tot_time_copy(us)=3D520381(average=3D520= .381) >=20 >=20 > I think the results are very good. > What do you think about? It looks like a good speed up, but we need a patch created with git=20 format-patch that can be applied to the source tree to properly review=20 this. Can you send that? _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". I generated the eml file with "git format-patch" (see attachment). Is it ok for You? Thanks ------=_Part_334535_514916752.1657802886620 Content-Type: message/rfc822 Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="0001-image_copy_plane-improve-performance-by-coalesce-row-i.eml" Content-ID: <00970e27-0e7f-832a-37a1-ae09bf506c40@yahoo.com> From: Marco Vianini Date: Thu, 14 Jul 2022 14:39:13 +0200 Subject: [PATCH] image_copy_plane: improve performance by coalesce row, if possible X-Unsent: 1 Signed-off-by: Marco Vianini --- libavutil/imgutils.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 9ab5757cf6..9ccb398a3b 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -349,6 +349,14 @@ static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize, return; av_assert0(FFABS(src_linesize) >= bytewidth); av_assert0(FFABS(dst_linesize) >= bytewidth); + + if (dst_linesize == bytewidth && src_linesize == bytewidth) { + /** Coalesce rows in this specific case, for perfomances improvement */ + bytewidth *= height; + height = 1; + src_linesize = dst_linesize = 0; + } + for (;height > 0; height--) { memcpy(dst, src, bytewidth); dst += dst_linesize; -- 2.30.0.windows.2 ------=_Part_334535_514916752.1657802886620 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ------=_Part_334535_514916752.1657802886620--