On Wednesday, July 13, 2022 at 06:16:15 PM GMT+2, James Almer wrote: On 7/13/2022 12:54 PM, Marco Vianini wrote: > Sorry, my mail client was using html format. > I hope now the mail will be sent correctly. > > > You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)" > > In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times). > > Code: > " > static void image_copy_plane(uint8_t       *dst, ptrdiff_t dst_linesize, >                               const uint8_t *src, ptrdiff_t src_linesize, >                               ptrdiff_t bytewidth, int height) > { >      if (!dst || !src) >          return; >      av_assert0(abs(src_linesize) >= bytewidth); >      av_assert0(abs(dst_linesize) >= bytewidth); >      >      /// MY PATCH START >      /// Coalesce rows. >      if (dst_linesize == bytewidth && src_linesize == bytewidth) { >        bytewidth *= height; >        height = 1; >        src_linesize = dst_linesize = 0; >      } >      /// MY PATCH STOP > >      for (;height > 0; height--) { >          memcpy(dst, src, bytewidth); >          dst += dst_linesize; >          src += src_linesize; >      } > } > " > > > I did following tests on Windows 10 64bit. > I compiled code in Release. > I copied my pc camera frames 1000 times (resolution 1920x1080): > > With Coalesce: > copy_cnt=100  size=1920x1080 tot_time_copy(us)=36574 (average=365.74) > copy_cnt=200  size=1920x1080 tot_time_copy(us)=78207 (average=391.035) > copy_cnt=300  size=1920x1080 tot_time_copy(us)=122170(average=407.233) > copy_cnt=400  size=1920x1080 tot_time_copy(us)=163678(average=409.195) > copy_cnt=500  size=1920x1080 tot_time_copy(us)=201872(average=403.744) > copy_cnt=600  size=1920x1080 tot_time_copy(us)=246174(average=410.29) > copy_cnt=700  size=1920x1080 tot_time_copy(us)=287043(average=410.061) > copy_cnt=800  size=1920x1080 tot_time_copy(us)=326462(average=408.077) > copy_cnt=900  size=1920x1080 tot_time_copy(us)=356882(average=396.536) > copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566) > > Without Coalesce: > copy_cnt=100  size=1920x1080 tot_time_copy(us)=44303 (average=443.03) > copy_cnt=200  size=1920x1080 tot_time_copy(us)=100501(average=502.505) > copy_cnt=300  size=1920x1080 tot_time_copy(us)=150097(average=500.323) > copy_cnt=400  size=1920x1080 tot_time_copy(us)=201010(average=502.525) > copy_cnt=500  size=1920x1080 tot_time_copy(us)=256818(average=513.636) > copy_cnt=600  size=1920x1080 tot_time_copy(us)=303273(average=505.455) > copy_cnt=700  size=1920x1080 tot_time_copy(us)=359152(average=513.074) > copy_cnt=800  size=1920x1080 tot_time_copy(us)=414413(average=518.016) > copy_cnt=900  size=1920x1080 tot_time_copy(us)=465315(average=517.017) > copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381) > > > I think the results are very good. > What do you think about? It looks like a good speed up, but we need a patch created with git format-patch that can be applied to the source tree to properly review this. Can you send that? _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". I generated the eml file with "git format-patch" (see attachment). Is it ok for You? Thanks