* [FFmpeg-devel] Performances improvement in "image_copy_plane" [not found] <632087708.1175797.1657705107285.ref@mail.yahoo.com> @ 2022-07-13 9:38 ` Marco Vianini 2022-07-13 9:54 ` Paul B Mahol 2022-07-13 15:12 ` Timo Rothenpieler 0 siblings, 2 replies; 8+ messages in thread From: Marco Vianini @ 2022-07-13 9:38 UTC (permalink / raw) To: ffmpeg-devel You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)" In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times). Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize, const uint8_t *src, ptrdiff_t src_linesize, ptrdiff_t bytewidth, int height){ if (!dst || !src) return; av_assert0(abs(src_linesize) >= bytewidth); av_assert0(abs(dst_linesize) >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize == bytewidth && src_linesize == bytewidth) { bytewidth *= height; height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP for (;height > 0; height--) { memcpy(dst, src, bytewidth); dst += dst_linesize; src += src_linesize; }}" What do You think about?Thank You Marco Vianini _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane" 2022-07-13 9:38 ` [FFmpeg-devel] Performances improvement in "image_copy_plane" Marco Vianini @ 2022-07-13 9:54 ` Paul B Mahol 2022-07-13 14:53 ` Marco Vianini 2022-07-13 15:12 ` Timo Rothenpieler 1 sibling, 1 reply; 8+ messages in thread From: Paul B Mahol @ 2022-07-13 9:54 UTC (permalink / raw) To: FFmpeg development discussions and patches On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini < marco_vianini-at-yahoo.it@ffmpeg.org> wrote: > You can get a very big improvement of performances in the special (but > very likely) case of: "(dst_linesize == bytewidth && src_linesize == > bytewidth)" > > In this case in fact We can "Coalesce rows", that is using ONLY ONE > MEMCPY, instead of a smaller memcpy for every row (that is looping for > height times). > > Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t > dst_linesize, const uint8_t *src, ptrdiff_t > src_linesize, ptrdiff_t bytewidth, int > height){ if (!dst || !src) return; > av_assert0(abs(src_linesize) >= bytewidth); av_assert0(abs(dst_linesize) > >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize > == bytewidth && src_linesize == bytewidth) { bytewidth *= height; > height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP > for (;height > 0; height--) { memcpy(dst, src, bytewidth); > dst += dst_linesize; src += src_linesize; }}" > What do You think about?Thank You > Show the benchmark numbers. > Marco Vianini > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane" 2022-07-13 9:54 ` Paul B Mahol @ 2022-07-13 14:53 ` Marco Vianini 2022-07-13 15:10 ` Paul B Mahol 0 siblings, 1 reply; 8+ messages in thread From: Marco Vianini @ 2022-07-13 14:53 UTC (permalink / raw) To: FFmpeg development discussions and patches I did following tests on Windows 10 64bit.I compiled code in Release. I copied my pc camera frames 1000 times (resolution 1920x1080): With Coalesce (MY PATCH):copy_cnt=100 size=1920x1080 tot_time_copy(us)=36574 (average=365.74)copy_cnt=200 size=1920x1080 tot_time_copy(us)=78207 (average=391.035)copy_cnt=300 size=1920x1080 tot_time_copy(us)=122170(average=407.233)copy_cnt=400 size=1920x1080 tot_time_copy(us)=163678(average=409.195)copy_cnt=500 size=1920x1080 tot_time_copy(us)=201872(average=403.744)copy_cnt=600 size=1920x1080 tot_time_copy(us)=246174(average=410.29)copy_cnt=700 size=1920x1080 tot_time_copy(us)=287043(average=410.061)copy_cnt=800 size=1920x1080 tot_time_copy(us)=326462(average=408.077)copy_cnt=900 size=1920x1080 tot_time_copy(us)=356882(average=396.536)copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566) Without Coalesce:copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 (average=443.03)copy_cnt=200 size=1920x1080 tot_time_copy(us)=100501(average=502.505)copy_cnt=300 size=1920x1080 tot_time_copy(us)=150097(average=500.323)copy_cnt=400 size=1920x1080 tot_time_copy(us)=201010(average=502.525)copy_cnt=500 size=1920x1080 tot_time_copy(us)=256818(average=513.636)copy_cnt=600 size=1920x1080 tot_time_copy(us)=303273(average=505.455)copy_cnt=700 size=1920x1080 tot_time_copy(us)=359152(average=513.074)copy_cnt=800 size=1920x1080 tot_time_copy(us)=414413(average=518.016)copy_cnt=900 size=1920x1080 tot_time_copy(us)=465315(average=517.017)copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381) I think the results are very good.What do you think about? Thank You Il mercoledì 13 luglio 2022 11:52:23 CEST, Paul B Mahol <onemda@gmail.com> ha scritto: On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini < marco_vianini-at-yahoo.it@ffmpeg.org> wrote: > You can get a very big improvement of performances in the special (but > very likely) case of: "(dst_linesize == bytewidth && src_linesize == > bytewidth)" > > In this case in fact We can "Coalesce rows", that is using ONLY ONE > MEMCPY, instead of a smaller memcpy for every row (that is looping for > height times). > > Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t > dst_linesize, const uint8_t *src, ptrdiff_t > src_linesize, ptrdiff_t bytewidth, int > height){ if (!dst || !src) return; > av_assert0(abs(src_linesize) >= bytewidth); av_assert0(abs(dst_linesize) > >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize > == bytewidth && src_linesize == bytewidth) { bytewidth *= height; > height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP > for (;height > 0; height--) { memcpy(dst, src, bytewidth); > dst += dst_linesize; src += src_linesize; }}" > What do You think about?Thank You > Show the benchmark numbers. > Marco Vianini > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane" 2022-07-13 14:53 ` Marco Vianini @ 2022-07-13 15:10 ` Paul B Mahol 2022-07-13 15:54 ` Marco Vianini 0 siblings, 1 reply; 8+ messages in thread From: Paul B Mahol @ 2022-07-13 15:10 UTC (permalink / raw) To: FFmpeg development discussions and patches On Wed, Jul 13, 2022 at 5:02 PM Marco Vianini < marco_vianini-at-yahoo.it@ffmpeg.org> wrote: > I did following tests on Windows 10 64bit.I compiled code in Release. > I copied my pc camera frames 1000 times (resolution 1920x1080): > With Coalesce (MY PATCH):copy_cnt=100 size=1920x1080 > tot_time_copy(us)=36574 (average=365.74)copy_cnt=200 size=1920x1080 > tot_time_copy(us)=78207 (average=391.035)copy_cnt=300 size=1920x1080 > tot_time_copy(us)=122170(average=407.233)copy_cnt=400 size=1920x1080 > tot_time_copy(us)=163678(average=409.195)copy_cnt=500 size=1920x1080 > tot_time_copy(us)=201872(average=403.744)copy_cnt=600 size=1920x1080 > tot_time_copy(us)=246174(average=410.29)copy_cnt=700 size=1920x1080 > tot_time_copy(us)=287043(average=410.061)copy_cnt=800 size=1920x1080 > tot_time_copy(us)=326462(average=408.077)copy_cnt=900 size=1920x1080 > tot_time_copy(us)=356882(average=396.536)copy_cnt=1000 size=1920x1080 > tot_time_copy(us)=394566(average=394.566) > Without Coalesce:copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 > (average=443.03)copy_cnt=200 size=1920x1080 > tot_time_copy(us)=100501(average=502.505)copy_cnt=300 size=1920x1080 > tot_time_copy(us)=150097(average=500.323)copy_cnt=400 size=1920x1080 > tot_time_copy(us)=201010(average=502.525)copy_cnt=500 size=1920x1080 > tot_time_copy(us)=256818(average=513.636)copy_cnt=600 size=1920x1080 > tot_time_copy(us)=303273(average=505.455)copy_cnt=700 size=1920x1080 > tot_time_copy(us)=359152(average=513.074)copy_cnt=800 size=1920x1080 > tot_time_copy(us)=414413(average=518.016)copy_cnt=900 size=1920x1080 > tot_time_copy(us)=465315(average=517.017)copy_cnt=1000 size=1920x1080 > tot_time_copy(us)=520381(average=520.381) > I think the results are very good.What do you think about? > Thank You > > First stop top posting. Where is patch? > > Il mercoledì 13 luglio 2022 11:52:23 CEST, Paul B Mahol < > onemda@gmail.com> ha scritto: > > On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini < > marco_vianini-at-yahoo.it@ffmpeg.org> wrote: > > > You can get a very big improvement of performances in the special (but > > very likely) case of: "(dst_linesize == bytewidth && src_linesize == > > bytewidth)" > > > > In this case in fact We can "Coalesce rows", that is using ONLY ONE > > MEMCPY, instead of a smaller memcpy for every row (that is looping for > > height times). > > > > Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t > > dst_linesize, const uint8_t *src, ptrdiff_t > > src_linesize, ptrdiff_t bytewidth, int > > height){ if (!dst || !src) return; > > av_assert0(abs(src_linesize) >= bytewidth); > av_assert0(abs(dst_linesize) > > >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize > > == bytewidth && src_linesize == bytewidth) { bytewidth *= height; > > height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP > > for (;height > 0; height--) { memcpy(dst, src, bytewidth); > > dst += dst_linesize; src += src_linesize; }}" > > What do You think about?Thank You > > > > Show the benchmark numbers. > > > > Marco Vianini > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane" 2022-07-13 15:10 ` Paul B Mahol @ 2022-07-13 15:54 ` Marco Vianini 2022-07-13 16:15 ` James Almer 0 siblings, 1 reply; 8+ messages in thread From: Marco Vianini @ 2022-07-13 15:54 UTC (permalink / raw) To: FFmpeg development discussions and patches On Wednesday, July 13, 2022 at 05:08:27 PM GMT+2, Paul B Mahol <onemda@gmail.com> wrote: On Wed, Jul 13, 2022 at 5:02 PM Marco Vianini < marco_vianini-at-yahoo.it@ffmpeg.org> wrote: > I did following tests on Windows 10 64bit.I compiled code in Release. > I copied my pc camera frames 1000 times (resolution 1920x1080): > With Coalesce (MY PATCH):copy_cnt=100 size=1920x1080 > tot_time_copy(us)=36574 (average=365.74)copy_cnt=200 size=1920x1080 > tot_time_copy(us)=78207 (average=391.035)copy_cnt=300 size=1920x1080 > tot_time_copy(us)=122170(average=407.233)copy_cnt=400 size=1920x1080 > tot_time_copy(us)=163678(average=409.195)copy_cnt=500 size=1920x1080 > tot_time_copy(us)=201872(average=403.744)copy_cnt=600 size=1920x1080 > tot_time_copy(us)=246174(average=410.29)copy_cnt=700 size=1920x1080 > tot_time_copy(us)=287043(average=410.061)copy_cnt=800 size=1920x1080 > tot_time_copy(us)=326462(average=408.077)copy_cnt=900 size=1920x1080 > tot_time_copy(us)=356882(average=396.536)copy_cnt=1000 size=1920x1080 > tot_time_copy(us)=394566(average=394.566) > Without Coalesce:copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 > (average=443.03)copy_cnt=200 size=1920x1080 > tot_time_copy(us)=100501(average=502.505)copy_cnt=300 size=1920x1080 > tot_time_copy(us)=150097(average=500.323)copy_cnt=400 size=1920x1080 > tot_time_copy(us)=201010(average=502.525)copy_cnt=500 size=1920x1080 > tot_time_copy(us)=256818(average=513.636)copy_cnt=600 size=1920x1080 > tot_time_copy(us)=303273(average=505.455)copy_cnt=700 size=1920x1080 > tot_time_copy(us)=359152(average=513.074)copy_cnt=800 size=1920x1080 > tot_time_copy(us)=414413(average=518.016)copy_cnt=900 size=1920x1080 > tot_time_copy(us)=465315(average=517.017)copy_cnt=1000 size=1920x1080 > tot_time_copy(us)=520381(average=520.381) > I think the results are very good.What do you think about? > Thank You > > First stop top posting. Where is patch? > > Il mercoledì 13 luglio 2022 11:52:23 CEST, Paul B Mahol < > onemda@gmail.com> ha scritto: > > On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini < > marco_vianini-at-yahoo.it@ffmpeg.org> wrote: > > > You can get a very big improvement of performances in the special (but > > very likely) case of: "(dst_linesize == bytewidth && src_linesize == > > bytewidth)" > > > > In this case in fact We can "Coalesce rows", that is using ONLY ONE > > MEMCPY, instead of a smaller memcpy for every row (that is looping for > > height times). > > > > Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t > > dst_linesize, const uint8_t *src, ptrdiff_t > > src_linesize, ptrdiff_t bytewidth, int > > height){ if (!dst || !src) return; > > av_assert0(abs(src_linesize) >= bytewidth); > av_assert0(abs(dst_linesize) > > >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize > > == bytewidth && src_linesize == bytewidth) { bytewidth *= height; > > height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP > > for (;height > 0; height--) { memcpy(dst, src, bytewidth); > > dst += dst_linesize; src += src_linesize; }}" > > What do You think about?Thank You > > > > Show the benchmark numbers. > > > > Marco Vianini > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". Sorry, my mail client was using html format. I hope now the mail will be sent correctly. You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)" In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times). Code: " static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize, const uint8_t *src, ptrdiff_t src_linesize, ptrdiff_t bytewidth, int height) { if (!dst || !src) return; av_assert0(abs(src_linesize) >= bytewidth); av_assert0(abs(dst_linesize) >= bytewidth); /// MY PATCH START /// Coalesce rows. if (dst_linesize == bytewidth && src_linesize == bytewidth) { bytewidth *= height; height = 1; src_linesize = dst_linesize = 0; } /// MY PATCH STOP for (;height > 0; height--) { memcpy(dst, src, bytewidth); dst += dst_linesize; src += src_linesize; } } " I did following tests on Windows 10 64bit. I compiled code in Release. I copied my pc camera frames 1000 times (resolution 1920x1080): With Coalesce: copy_cnt=100 size=1920x1080 tot_time_copy(us)=36574 (average=365.74) copy_cnt=200 size=1920x1080 tot_time_copy(us)=78207 (average=391.035) copy_cnt=300 size=1920x1080 tot_time_copy(us)=122170(average=407.233) copy_cnt=400 size=1920x1080 tot_time_copy(us)=163678(average=409.195) copy_cnt=500 size=1920x1080 tot_time_copy(us)=201872(average=403.744) copy_cnt=600 size=1920x1080 tot_time_copy(us)=246174(average=410.29) copy_cnt=700 size=1920x1080 tot_time_copy(us)=287043(average=410.061) copy_cnt=800 size=1920x1080 tot_time_copy(us)=326462(average=408.077) copy_cnt=900 size=1920x1080 tot_time_copy(us)=356882(average=396.536) copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566) Without Coalesce: copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 (average=443.03) copy_cnt=200 size=1920x1080 tot_time_copy(us)=100501(average=502.505) copy_cnt=300 size=1920x1080 tot_time_copy(us)=150097(average=500.323) copy_cnt=400 size=1920x1080 tot_time_copy(us)=201010(average=502.525) copy_cnt=500 size=1920x1080 tot_time_copy(us)=256818(average=513.636) copy_cnt=600 size=1920x1080 tot_time_copy(us)=303273(average=505.455) copy_cnt=700 size=1920x1080 tot_time_copy(us)=359152(average=513.074) copy_cnt=800 size=1920x1080 tot_time_copy(us)=414413(average=518.016) copy_cnt=900 size=1920x1080 tot_time_copy(us)=465315(average=517.017) copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381) I think the results are very good. What do you think about? Thank You _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane" 2022-07-13 15:54 ` Marco Vianini @ 2022-07-13 16:15 ` James Almer 2022-07-14 12:48 ` Marco Vianini 0 siblings, 1 reply; 8+ messages in thread From: James Almer @ 2022-07-13 16:15 UTC (permalink / raw) To: ffmpeg-devel On 7/13/2022 12:54 PM, Marco Vianini wrote: > Sorry, my mail client was using html format. > I hope now the mail will be sent correctly. > > > You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)" > > In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times). > > Code: > " > static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize, > const uint8_t *src, ptrdiff_t src_linesize, > ptrdiff_t bytewidth, int height) > { > if (!dst || !src) > return; > av_assert0(abs(src_linesize) >= bytewidth); > av_assert0(abs(dst_linesize) >= bytewidth); > > /// MY PATCH START > /// Coalesce rows. > if (dst_linesize == bytewidth && src_linesize == bytewidth) { > bytewidth *= height; > height = 1; > src_linesize = dst_linesize = 0; > } > /// MY PATCH STOP > > for (;height > 0; height--) { > memcpy(dst, src, bytewidth); > dst += dst_linesize; > src += src_linesize; > } > } > " > > > I did following tests on Windows 10 64bit. > I compiled code in Release. > I copied my pc camera frames 1000 times (resolution 1920x1080): > > With Coalesce: > copy_cnt=100 size=1920x1080 tot_time_copy(us)=36574 (average=365.74) > copy_cnt=200 size=1920x1080 tot_time_copy(us)=78207 (average=391.035) > copy_cnt=300 size=1920x1080 tot_time_copy(us)=122170(average=407.233) > copy_cnt=400 size=1920x1080 tot_time_copy(us)=163678(average=409.195) > copy_cnt=500 size=1920x1080 tot_time_copy(us)=201872(average=403.744) > copy_cnt=600 size=1920x1080 tot_time_copy(us)=246174(average=410.29) > copy_cnt=700 size=1920x1080 tot_time_copy(us)=287043(average=410.061) > copy_cnt=800 size=1920x1080 tot_time_copy(us)=326462(average=408.077) > copy_cnt=900 size=1920x1080 tot_time_copy(us)=356882(average=396.536) > copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566) > > Without Coalesce: > copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 (average=443.03) > copy_cnt=200 size=1920x1080 tot_time_copy(us)=100501(average=502.505) > copy_cnt=300 size=1920x1080 tot_time_copy(us)=150097(average=500.323) > copy_cnt=400 size=1920x1080 tot_time_copy(us)=201010(average=502.525) > copy_cnt=500 size=1920x1080 tot_time_copy(us)=256818(average=513.636) > copy_cnt=600 size=1920x1080 tot_time_copy(us)=303273(average=505.455) > copy_cnt=700 size=1920x1080 tot_time_copy(us)=359152(average=513.074) > copy_cnt=800 size=1920x1080 tot_time_copy(us)=414413(average=518.016) > copy_cnt=900 size=1920x1080 tot_time_copy(us)=465315(average=517.017) > copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381) > > > I think the results are very good. > What do you think about? It looks like a good speed up, but we need a patch created with git format-patch that can be applied to the source tree to properly review this. Can you send that? _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane" 2022-07-13 16:15 ` James Almer @ 2022-07-14 12:48 ` Marco Vianini 0 siblings, 0 replies; 8+ messages in thread From: Marco Vianini @ 2022-07-14 12:48 UTC (permalink / raw) To: ffmpeg-devel [-- Attachment #1: Type: text/plain, Size: 3802 bytes --] On Wednesday, July 13, 2022 at 06:16:15 PM GMT+2, James Almer <jamrial@gmail.com> wrote: On 7/13/2022 12:54 PM, Marco Vianini wrote: > Sorry, my mail client was using html format. > I hope now the mail will be sent correctly. > > > You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)" > > In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times). > > Code: > " > static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize, > const uint8_t *src, ptrdiff_t src_linesize, > ptrdiff_t bytewidth, int height) > { > if (!dst || !src) > return; > av_assert0(abs(src_linesize) >= bytewidth); > av_assert0(abs(dst_linesize) >= bytewidth); > > /// MY PATCH START > /// Coalesce rows. > if (dst_linesize == bytewidth && src_linesize == bytewidth) { > bytewidth *= height; > height = 1; > src_linesize = dst_linesize = 0; > } > /// MY PATCH STOP > > for (;height > 0; height--) { > memcpy(dst, src, bytewidth); > dst += dst_linesize; > src += src_linesize; > } > } > " > > > I did following tests on Windows 10 64bit. > I compiled code in Release. > I copied my pc camera frames 1000 times (resolution 1920x1080): > > With Coalesce: > copy_cnt=100 size=1920x1080 tot_time_copy(us)=36574 (average=365.74) > copy_cnt=200 size=1920x1080 tot_time_copy(us)=78207 (average=391.035) > copy_cnt=300 size=1920x1080 tot_time_copy(us)=122170(average=407.233) > copy_cnt=400 size=1920x1080 tot_time_copy(us)=163678(average=409.195) > copy_cnt=500 size=1920x1080 tot_time_copy(us)=201872(average=403.744) > copy_cnt=600 size=1920x1080 tot_time_copy(us)=246174(average=410.29) > copy_cnt=700 size=1920x1080 tot_time_copy(us)=287043(average=410.061) > copy_cnt=800 size=1920x1080 tot_time_copy(us)=326462(average=408.077) > copy_cnt=900 size=1920x1080 tot_time_copy(us)=356882(average=396.536) > copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566) > > Without Coalesce: > copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 (average=443.03) > copy_cnt=200 size=1920x1080 tot_time_copy(us)=100501(average=502.505) > copy_cnt=300 size=1920x1080 tot_time_copy(us)=150097(average=500.323) > copy_cnt=400 size=1920x1080 tot_time_copy(us)=201010(average=502.525) > copy_cnt=500 size=1920x1080 tot_time_copy(us)=256818(average=513.636) > copy_cnt=600 size=1920x1080 tot_time_copy(us)=303273(average=505.455) > copy_cnt=700 size=1920x1080 tot_time_copy(us)=359152(average=513.074) > copy_cnt=800 size=1920x1080 tot_time_copy(us)=414413(average=518.016) > copy_cnt=900 size=1920x1080 tot_time_copy(us)=465315(average=517.017) > copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381) > > > I think the results are very good. > What do you think about? It looks like a good speed up, but we need a patch created with git format-patch that can be applied to the source tree to properly review this. Can you send that? _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". I generated the eml file with "git format-patch" (see attachment). Is it ok for You? Thanks [-- Attachment #2: 0001-image_copy_plane-improve-performance-by-coalesce-row-i.eml --] [-- Type: message/rfc822, Size: 1045 bytes --] From: Marco Vianini <marco_vianini@yahoo.it> Subject: [PATCH] image_copy_plane: improve performance by coalesce row, if possible Date: Thu, 14 Jul 2022 14:39:13 +0200 Signed-off-by: Marco Vianini <marco_vianini@yahoo.it> --- libavutil/imgutils.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c index 9ab5757cf6..9ccb398a3b 100644 --- a/libavutil/imgutils.c +++ b/libavutil/imgutils.c @@ -349,6 +349,14 @@ static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize, return; av_assert0(FFABS(src_linesize) >= bytewidth); av_assert0(FFABS(dst_linesize) >= bytewidth); + + if (dst_linesize == bytewidth && src_linesize == bytewidth) { + /** Coalesce rows in this specific case, for perfomances improvement */ + bytewidth *= height; + height = 1; + src_linesize = dst_linesize = 0; + } + for (;height > 0; height--) { memcpy(dst, src, bytewidth); dst += dst_linesize; -- 2.30.0.windows.2 [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane" 2022-07-13 9:38 ` [FFmpeg-devel] Performances improvement in "image_copy_plane" Marco Vianini 2022-07-13 9:54 ` Paul B Mahol @ 2022-07-13 15:12 ` Timo Rothenpieler 1 sibling, 0 replies; 8+ messages in thread From: Timo Rothenpieler @ 2022-07-13 15:12 UTC (permalink / raw) To: ffmpeg-devel On 13.07.2022 11:38, Marco Vianini wrote: > You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)" Isn't all that matters dst_linesize == src_linesize, and then you can memcpy() the whole plane? > In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times). > > Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize, const uint8_t *src, ptrdiff_t src_linesize, ptrdiff_t bytewidth, int height){ if (!dst || !src) return; av_assert0(abs(src_linesize) >= bytewidth); av_assert0(abs(dst_linesize) >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize == bytewidth && src_linesize == bytewidth) { bytewidth *= height; height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP > for (;height > 0; height--) { memcpy(dst, src, bytewidth); dst += dst_linesize; src += src_linesize; }}" > What do You think about?Thank You > Marco Vianini That code is mangled by your mail client and practically unreadable. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-07-14 12:48 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <632087708.1175797.1657705107285.ref@mail.yahoo.com> 2022-07-13 9:38 ` [FFmpeg-devel] Performances improvement in "image_copy_plane" Marco Vianini 2022-07-13 9:54 ` Paul B Mahol 2022-07-13 14:53 ` Marco Vianini 2022-07-13 15:10 ` Paul B Mahol 2022-07-13 15:54 ` Marco Vianini 2022-07-13 16:15 ` James Almer 2022-07-14 12:48 ` Marco Vianini 2022-07-13 15:12 ` Timo Rothenpieler
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git