Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed
* [FFmpeg-devel] Performances improvement in "image_copy_plane"
       [not found] <632087708.1175797.1657705107285.ref@mail.yahoo.com>
@ 2022-07-13  9:38 ` Marco Vianini
  2022-07-13  9:54   ` Paul B Mahol
  2022-07-13 15:12   ` Timo Rothenpieler
  0 siblings, 2 replies; 8+ messages in thread
From: Marco Vianini @ 2022-07-13  9:38 UTC (permalink / raw)
  To: ffmpeg-devel

 You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"

In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).

Code:"static void image_copy_plane(uint8_t       *dst, ptrdiff_t dst_linesize,                             const uint8_t *src, ptrdiff_t src_linesize,                             ptrdiff_t bytewidth, int height){    if (!dst || !src)        return;    av_assert0(abs(src_linesize) >= bytewidth);    av_assert0(abs(dst_linesize) >= bytewidth); // MY PATCH START    // Coalesce rows.    if (dst_linesize == bytewidth && src_linesize == bytewidth) {      bytewidth *= height;      height = 1;      src_linesize = dst_linesize = 0;    }// MY PATCH STOP
    for (;height > 0; height--) {        memcpy(dst, src, bytewidth);        dst += dst_linesize;        src += src_linesize;    }}"
What do You think about?Thank You
Marco Vianini
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
  2022-07-13  9:38 ` [FFmpeg-devel] Performances improvement in "image_copy_plane" Marco Vianini
@ 2022-07-13  9:54   ` Paul B Mahol
  2022-07-13 14:53     ` Marco Vianini
  2022-07-13 15:12   ` Timo Rothenpieler
  1 sibling, 1 reply; 8+ messages in thread
From: Paul B Mahol @ 2022-07-13  9:54 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini <
marco_vianini-at-yahoo.it@ffmpeg.org> wrote:

>  You can get a very big improvement of performances in the special (but
> very likely) case of: "(dst_linesize == bytewidth && src_linesize ==
> bytewidth)"
>
> In this case in fact We can "Coalesce rows", that is using ONLY ONE
> MEMCPY, instead of a smaller memcpy for every row (that is looping for
> height times).
>
> Code:"static void image_copy_plane(uint8_t       *dst, ptrdiff_t
> dst_linesize,                             const uint8_t *src, ptrdiff_t
> src_linesize,                             ptrdiff_t bytewidth, int
> height){    if (!dst || !src)        return;
> av_assert0(abs(src_linesize) >= bytewidth);    av_assert0(abs(dst_linesize)
> >= bytewidth); // MY PATCH START    // Coalesce rows.    if (dst_linesize
> == bytewidth && src_linesize == bytewidth) {      bytewidth *= height;
> height = 1;      src_linesize = dst_linesize = 0;    }// MY PATCH STOP
>     for (;height > 0; height--) {        memcpy(dst, src, bytewidth);
>   dst += dst_linesize;        src += src_linesize;    }}"
> What do You think about?Thank You
>

Show the benchmark numbers.


> Marco Vianini
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
  2022-07-13  9:54   ` Paul B Mahol
@ 2022-07-13 14:53     ` Marco Vianini
  2022-07-13 15:10       ` Paul B Mahol
  0 siblings, 1 reply; 8+ messages in thread
From: Marco Vianini @ 2022-07-13 14:53 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

 I did following tests on Windows 10 64bit.I compiled code in Release.
I copied my pc camera frames 1000 times (resolution 1920x1080):
With Coalesce (MY PATCH):copy_cnt=100  size=1920x1080 tot_time_copy(us)=36574 (average=365.74)copy_cnt=200  size=1920x1080 tot_time_copy(us)=78207 (average=391.035)copy_cnt=300  size=1920x1080 tot_time_copy(us)=122170(average=407.233)copy_cnt=400  size=1920x1080 tot_time_copy(us)=163678(average=409.195)copy_cnt=500  size=1920x1080 tot_time_copy(us)=201872(average=403.744)copy_cnt=600  size=1920x1080 tot_time_copy(us)=246174(average=410.29)copy_cnt=700  size=1920x1080 tot_time_copy(us)=287043(average=410.061)copy_cnt=800  size=1920x1080 tot_time_copy(us)=326462(average=408.077)copy_cnt=900  size=1920x1080 tot_time_copy(us)=356882(average=396.536)copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566)
Without Coalesce:copy_cnt=100  size=1920x1080 tot_time_copy(us)=44303 (average=443.03)copy_cnt=200  size=1920x1080 tot_time_copy(us)=100501(average=502.505)copy_cnt=300  size=1920x1080 tot_time_copy(us)=150097(average=500.323)copy_cnt=400  size=1920x1080 tot_time_copy(us)=201010(average=502.525)copy_cnt=500  size=1920x1080 tot_time_copy(us)=256818(average=513.636)copy_cnt=600  size=1920x1080 tot_time_copy(us)=303273(average=505.455)copy_cnt=700  size=1920x1080 tot_time_copy(us)=359152(average=513.074)copy_cnt=800  size=1920x1080 tot_time_copy(us)=414413(average=518.016)copy_cnt=900  size=1920x1080 tot_time_copy(us)=465315(average=517.017)copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381)
I think the results are very good.What do you think about?
Thank You


    Il mercoledì 13 luglio 2022 11:52:23 CEST, Paul B Mahol <onemda@gmail.com> ha scritto:  
 
 On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini <
marco_vianini-at-yahoo.it@ffmpeg.org> wrote:

>  You can get a very big improvement of performances in the special (but
> very likely) case of: "(dst_linesize == bytewidth && src_linesize ==
> bytewidth)"
>
> In this case in fact We can "Coalesce rows", that is using ONLY ONE
> MEMCPY, instead of a smaller memcpy for every row (that is looping for
> height times).
>
> Code:"static void image_copy_plane(uint8_t      *dst, ptrdiff_t
> dst_linesize,                            const uint8_t *src, ptrdiff_t
> src_linesize,                            ptrdiff_t bytewidth, int
> height){    if (!dst || !src)        return;
> av_assert0(abs(src_linesize) >= bytewidth);    av_assert0(abs(dst_linesize)
> >= bytewidth); // MY PATCH START    // Coalesce rows.    if (dst_linesize
> == bytewidth && src_linesize == bytewidth) {      bytewidth *= height;
> height = 1;      src_linesize = dst_linesize = 0;    }// MY PATCH STOP
>    for (;height > 0; height--) {        memcpy(dst, src, bytewidth);
>  dst += dst_linesize;        src += src_linesize;    }}"
> What do You think about?Thank You
>

Show the benchmark numbers.


> Marco Vianini
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
  
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
  2022-07-13 14:53     ` Marco Vianini
@ 2022-07-13 15:10       ` Paul B Mahol
  2022-07-13 15:54         ` Marco Vianini
  0 siblings, 1 reply; 8+ messages in thread
From: Paul B Mahol @ 2022-07-13 15:10 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Wed, Jul 13, 2022 at 5:02 PM Marco Vianini <
marco_vianini-at-yahoo.it@ffmpeg.org> wrote:

>  I did following tests on Windows 10 64bit.I compiled code in Release.
> I copied my pc camera frames 1000 times (resolution 1920x1080):
> With Coalesce (MY PATCH):copy_cnt=100  size=1920x1080
> tot_time_copy(us)=36574 (average=365.74)copy_cnt=200  size=1920x1080
> tot_time_copy(us)=78207 (average=391.035)copy_cnt=300  size=1920x1080
> tot_time_copy(us)=122170(average=407.233)copy_cnt=400  size=1920x1080
> tot_time_copy(us)=163678(average=409.195)copy_cnt=500  size=1920x1080
> tot_time_copy(us)=201872(average=403.744)copy_cnt=600  size=1920x1080
> tot_time_copy(us)=246174(average=410.29)copy_cnt=700  size=1920x1080
> tot_time_copy(us)=287043(average=410.061)copy_cnt=800  size=1920x1080
> tot_time_copy(us)=326462(average=408.077)copy_cnt=900  size=1920x1080
> tot_time_copy(us)=356882(average=396.536)copy_cnt=1000 size=1920x1080
> tot_time_copy(us)=394566(average=394.566)
> Without Coalesce:copy_cnt=100  size=1920x1080 tot_time_copy(us)=44303
> (average=443.03)copy_cnt=200  size=1920x1080
> tot_time_copy(us)=100501(average=502.505)copy_cnt=300  size=1920x1080
> tot_time_copy(us)=150097(average=500.323)copy_cnt=400  size=1920x1080
> tot_time_copy(us)=201010(average=502.525)copy_cnt=500  size=1920x1080
> tot_time_copy(us)=256818(average=513.636)copy_cnt=600  size=1920x1080
> tot_time_copy(us)=303273(average=505.455)copy_cnt=700  size=1920x1080
> tot_time_copy(us)=359152(average=513.074)copy_cnt=800  size=1920x1080
> tot_time_copy(us)=414413(average=518.016)copy_cnt=900  size=1920x1080
> tot_time_copy(us)=465315(average=517.017)copy_cnt=1000 size=1920x1080
> tot_time_copy(us)=520381(average=520.381)
> I think the results are very good.What do you think about?
> Thank You
>
>
First stop top posting.

Where is patch?


>
>     Il mercoledì 13 luglio 2022 11:52:23 CEST, Paul B Mahol <
> onemda@gmail.com> ha scritto:
>
>  On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini <
> marco_vianini-at-yahoo.it@ffmpeg.org> wrote:
>
> >  You can get a very big improvement of performances in the special (but
> > very likely) case of: "(dst_linesize == bytewidth && src_linesize ==
> > bytewidth)"
> >
> > In this case in fact We can "Coalesce rows", that is using ONLY ONE
> > MEMCPY, instead of a smaller memcpy for every row (that is looping for
> > height times).
> >
> > Code:"static void image_copy_plane(uint8_t      *dst, ptrdiff_t
> > dst_linesize,                            const uint8_t *src, ptrdiff_t
> > src_linesize,                            ptrdiff_t bytewidth, int
> > height){    if (!dst || !src)        return;
> > av_assert0(abs(src_linesize) >= bytewidth);
> av_assert0(abs(dst_linesize)
> > >= bytewidth); // MY PATCH START    // Coalesce rows.    if (dst_linesize
> > == bytewidth && src_linesize == bytewidth) {      bytewidth *= height;
> > height = 1;      src_linesize = dst_linesize = 0;    }// MY PATCH STOP
> >    for (;height > 0; height--) {        memcpy(dst, src, bytewidth);
> >  dst += dst_linesize;        src += src_linesize;    }}"
> > What do You think about?Thank You
> >
>
> Show the benchmark numbers.
>
>
> > Marco Vianini
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
  2022-07-13  9:38 ` [FFmpeg-devel] Performances improvement in "image_copy_plane" Marco Vianini
  2022-07-13  9:54   ` Paul B Mahol
@ 2022-07-13 15:12   ` Timo Rothenpieler
  1 sibling, 0 replies; 8+ messages in thread
From: Timo Rothenpieler @ 2022-07-13 15:12 UTC (permalink / raw)
  To: ffmpeg-devel

On 13.07.2022 11:38, Marco Vianini wrote:
>   You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"

Isn't all that matters dst_linesize == src_linesize, and then you can 
memcpy() the whole plane?

> In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).
> 
> Code:"static void image_copy_plane(uint8_t       *dst, ptrdiff_t dst_linesize,                             const uint8_t *src, ptrdiff_t src_linesize,                             ptrdiff_t bytewidth, int height){    if (!dst || !src)        return;    av_assert0(abs(src_linesize) >= bytewidth);    av_assert0(abs(dst_linesize) >= bytewidth); // MY PATCH START    // Coalesce rows.    if (dst_linesize == bytewidth && src_linesize == bytewidth) {      bytewidth *= height;      height = 1;      src_linesize = dst_linesize = 0;    }// MY PATCH STOP
>      for (;height > 0; height--) {        memcpy(dst, src, bytewidth);        dst += dst_linesize;        src += src_linesize;    }}"
> What do You think about?Thank You
> Marco Vianini

That code is mangled by your mail client and practically unreadable.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
  2022-07-13 15:10       ` Paul B Mahol
@ 2022-07-13 15:54         ` Marco Vianini
  2022-07-13 16:15           ` James Almer
  0 siblings, 1 reply; 8+ messages in thread
From: Marco Vianini @ 2022-07-13 15:54 UTC (permalink / raw)
  To: FFmpeg development discussions and patches








On Wednesday, July 13, 2022 at 05:08:27 PM GMT+2, Paul B Mahol <onemda@gmail.com> wrote: 





On Wed, Jul 13, 2022 at 5:02 PM Marco Vianini <
marco_vianini-at-yahoo.it@ffmpeg.org> wrote:

>  I did following tests on Windows 10 64bit.I compiled code in Release.
> I copied my pc camera frames 1000 times (resolution 1920x1080):
> With Coalesce (MY PATCH):copy_cnt=100  size=1920x1080
> tot_time_copy(us)=36574 (average=365.74)copy_cnt=200  size=1920x1080
> tot_time_copy(us)=78207 (average=391.035)copy_cnt=300  size=1920x1080
> tot_time_copy(us)=122170(average=407.233)copy_cnt=400  size=1920x1080
> tot_time_copy(us)=163678(average=409.195)copy_cnt=500  size=1920x1080
> tot_time_copy(us)=201872(average=403.744)copy_cnt=600  size=1920x1080
> tot_time_copy(us)=246174(average=410.29)copy_cnt=700  size=1920x1080
> tot_time_copy(us)=287043(average=410.061)copy_cnt=800  size=1920x1080
> tot_time_copy(us)=326462(average=408.077)copy_cnt=900  size=1920x1080
> tot_time_copy(us)=356882(average=396.536)copy_cnt=1000 size=1920x1080
> tot_time_copy(us)=394566(average=394.566)
> Without Coalesce:copy_cnt=100  size=1920x1080 tot_time_copy(us)=44303
> (average=443.03)copy_cnt=200  size=1920x1080
> tot_time_copy(us)=100501(average=502.505)copy_cnt=300  size=1920x1080
> tot_time_copy(us)=150097(average=500.323)copy_cnt=400  size=1920x1080
> tot_time_copy(us)=201010(average=502.525)copy_cnt=500  size=1920x1080
> tot_time_copy(us)=256818(average=513.636)copy_cnt=600  size=1920x1080
> tot_time_copy(us)=303273(average=505.455)copy_cnt=700  size=1920x1080
> tot_time_copy(us)=359152(average=513.074)copy_cnt=800  size=1920x1080
> tot_time_copy(us)=414413(average=518.016)copy_cnt=900  size=1920x1080
> tot_time_copy(us)=465315(average=517.017)copy_cnt=1000 size=1920x1080
> tot_time_copy(us)=520381(average=520.381)
> I think the results are very good.What do you think about?
> Thank You
>
>
First stop top posting.

Where is patch?


>
>    Il mercoledì 13 luglio 2022 11:52:23 CEST, Paul B Mahol <
> onemda@gmail.com> ha scritto:
>
>  On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini <
> marco_vianini-at-yahoo.it@ffmpeg.org> wrote:
>
> >  You can get a very big improvement of performances in the special (but
> > very likely) case of: "(dst_linesize == bytewidth && src_linesize ==
> > bytewidth)"
> >
> > In this case in fact We can "Coalesce rows", that is using ONLY ONE
> > MEMCPY, instead of a smaller memcpy for every row (that is looping for
> > height times).
> >
> > Code:"static void image_copy_plane(uint8_t      *dst, ptrdiff_t
> > dst_linesize,                            const uint8_t *src, ptrdiff_t
> > src_linesize,                            ptrdiff_t bytewidth, int
> > height){    if (!dst || !src)        return;
> > av_assert0(abs(src_linesize) >= bytewidth);
> av_assert0(abs(dst_linesize)
> > >= bytewidth); // MY PATCH START    // Coalesce rows.    if (dst_linesize
> > == bytewidth && src_linesize == bytewidth) {      bytewidth *= height;
> > height = 1;      src_linesize = dst_linesize = 0;    }// MY PATCH STOP
> >    for (;height > 0; height--) {        memcpy(dst, src, bytewidth);
> >  dst += dst_linesize;        src += src_linesize;    }}"
> > What do You think about?Thank You
> >
>
> Show the benchmark numbers.
>
>
> > Marco Vianini
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".


Sorry, my mail client was using html format.
I hope now the mail will be sent correctly.


You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"

In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).

Code:
"
static void image_copy_plane(uint8_t       *dst, ptrdiff_t dst_linesize,
                             const uint8_t *src, ptrdiff_t src_linesize,
                             ptrdiff_t bytewidth, int height)
{
    if (!dst || !src)
        return;
    av_assert0(abs(src_linesize) >= bytewidth);
    av_assert0(abs(dst_linesize) >= bytewidth);
    
    /// MY PATCH START
    /// Coalesce rows.
    if (dst_linesize == bytewidth && src_linesize == bytewidth) {
      bytewidth *= height;
      height = 1;
      src_linesize = dst_linesize = 0;
    }
    /// MY PATCH STOP

    for (;height > 0; height--) {
        memcpy(dst, src, bytewidth);
        dst += dst_linesize;
        src += src_linesize;
    }
}
"


I did following tests on Windows 10 64bit.
I compiled code in Release.
I copied my pc camera frames 1000 times (resolution 1920x1080):

With Coalesce:
copy_cnt=100  size=1920x1080 tot_time_copy(us)=36574 (average=365.74)
copy_cnt=200  size=1920x1080 tot_time_copy(us)=78207 (average=391.035)
copy_cnt=300  size=1920x1080 tot_time_copy(us)=122170(average=407.233)
copy_cnt=400  size=1920x1080 tot_time_copy(us)=163678(average=409.195)
copy_cnt=500  size=1920x1080 tot_time_copy(us)=201872(average=403.744)
copy_cnt=600  size=1920x1080 tot_time_copy(us)=246174(average=410.29)
copy_cnt=700  size=1920x1080 tot_time_copy(us)=287043(average=410.061)
copy_cnt=800  size=1920x1080 tot_time_copy(us)=326462(average=408.077)
copy_cnt=900  size=1920x1080 tot_time_copy(us)=356882(average=396.536)
copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566)

Without Coalesce:
copy_cnt=100  size=1920x1080 tot_time_copy(us)=44303 (average=443.03)
copy_cnt=200  size=1920x1080 tot_time_copy(us)=100501(average=502.505)
copy_cnt=300  size=1920x1080 tot_time_copy(us)=150097(average=500.323)
copy_cnt=400  size=1920x1080 tot_time_copy(us)=201010(average=502.525)
copy_cnt=500  size=1920x1080 tot_time_copy(us)=256818(average=513.636)
copy_cnt=600  size=1920x1080 tot_time_copy(us)=303273(average=505.455)
copy_cnt=700  size=1920x1080 tot_time_copy(us)=359152(average=513.074)
copy_cnt=800  size=1920x1080 tot_time_copy(us)=414413(average=518.016)
copy_cnt=900  size=1920x1080 tot_time_copy(us)=465315(average=517.017)
copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381)


I think the results are very good.
What do you think about?


Thank You


_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
  2022-07-13 15:54         ` Marco Vianini
@ 2022-07-13 16:15           ` James Almer
  2022-07-14 12:48             ` Marco Vianini
  0 siblings, 1 reply; 8+ messages in thread
From: James Almer @ 2022-07-13 16:15 UTC (permalink / raw)
  To: ffmpeg-devel

On 7/13/2022 12:54 PM, Marco Vianini wrote:
> Sorry, my mail client was using html format.
> I hope now the mail will be sent correctly.
> 
> 
> You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"
> 
> In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).
> 
> Code:
> "
> static void image_copy_plane(uint8_t       *dst, ptrdiff_t dst_linesize,
>                               const uint8_t *src, ptrdiff_t src_linesize,
>                               ptrdiff_t bytewidth, int height)
> {
>      if (!dst || !src)
>          return;
>      av_assert0(abs(src_linesize) >= bytewidth);
>      av_assert0(abs(dst_linesize) >= bytewidth);
>      
>      /// MY PATCH START
>      /// Coalesce rows.
>      if (dst_linesize == bytewidth && src_linesize == bytewidth) {
>        bytewidth *= height;
>        height = 1;
>        src_linesize = dst_linesize = 0;
>      }
>      /// MY PATCH STOP
> 
>      for (;height > 0; height--) {
>          memcpy(dst, src, bytewidth);
>          dst += dst_linesize;
>          src += src_linesize;
>      }
> }
> "
> 
> 
> I did following tests on Windows 10 64bit.
> I compiled code in Release.
> I copied my pc camera frames 1000 times (resolution 1920x1080):
> 
> With Coalesce:
> copy_cnt=100  size=1920x1080 tot_time_copy(us)=36574 (average=365.74)
> copy_cnt=200  size=1920x1080 tot_time_copy(us)=78207 (average=391.035)
> copy_cnt=300  size=1920x1080 tot_time_copy(us)=122170(average=407.233)
> copy_cnt=400  size=1920x1080 tot_time_copy(us)=163678(average=409.195)
> copy_cnt=500  size=1920x1080 tot_time_copy(us)=201872(average=403.744)
> copy_cnt=600  size=1920x1080 tot_time_copy(us)=246174(average=410.29)
> copy_cnt=700  size=1920x1080 tot_time_copy(us)=287043(average=410.061)
> copy_cnt=800  size=1920x1080 tot_time_copy(us)=326462(average=408.077)
> copy_cnt=900  size=1920x1080 tot_time_copy(us)=356882(average=396.536)
> copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566)
> 
> Without Coalesce:
> copy_cnt=100  size=1920x1080 tot_time_copy(us)=44303 (average=443.03)
> copy_cnt=200  size=1920x1080 tot_time_copy(us)=100501(average=502.505)
> copy_cnt=300  size=1920x1080 tot_time_copy(us)=150097(average=500.323)
> copy_cnt=400  size=1920x1080 tot_time_copy(us)=201010(average=502.525)
> copy_cnt=500  size=1920x1080 tot_time_copy(us)=256818(average=513.636)
> copy_cnt=600  size=1920x1080 tot_time_copy(us)=303273(average=505.455)
> copy_cnt=700  size=1920x1080 tot_time_copy(us)=359152(average=513.074)
> copy_cnt=800  size=1920x1080 tot_time_copy(us)=414413(average=518.016)
> copy_cnt=900  size=1920x1080 tot_time_copy(us)=465315(average=517.017)
> copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381)
> 
> 
> I think the results are very good.
> What do you think about?

It looks like a good speed up, but we need a patch created with git 
format-patch that can be applied to the source tree to properly review 
this. Can you send that?
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
  2022-07-13 16:15           ` James Almer
@ 2022-07-14 12:48             ` Marco Vianini
  0 siblings, 0 replies; 8+ messages in thread
From: Marco Vianini @ 2022-07-14 12:48 UTC (permalink / raw)
  To: ffmpeg-devel

[-- Attachment #1: Type: text/plain, Size: 3802 bytes --]








On Wednesday, July 13, 2022 at 06:16:15 PM GMT+2, James Almer <jamrial@gmail.com> wrote: 





On 7/13/2022 12:54 PM, Marco Vianini wrote:
> Sorry, my mail client was using html format.
> I hope now the mail will be sent correctly.
> 
> 
> You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"
> 
> In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).
> 
> Code:
> "
> static void image_copy_plane(uint8_t       *dst, ptrdiff_t dst_linesize,
>                               const uint8_t *src, ptrdiff_t src_linesize,
>                               ptrdiff_t bytewidth, int height)
> {
>      if (!dst || !src)
>          return;
>      av_assert0(abs(src_linesize) >= bytewidth);
>      av_assert0(abs(dst_linesize) >= bytewidth);
>      
>      /// MY PATCH START
>      /// Coalesce rows.
>      if (dst_linesize == bytewidth && src_linesize == bytewidth) {
>        bytewidth *= height;
>        height = 1;
>        src_linesize = dst_linesize = 0;
>      }
>      /// MY PATCH STOP
> 
>      for (;height > 0; height--) {
>          memcpy(dst, src, bytewidth);
>          dst += dst_linesize;
>          src += src_linesize;
>      }
> }
> "
> 
> 
> I did following tests on Windows 10 64bit.
> I compiled code in Release.
> I copied my pc camera frames 1000 times (resolution 1920x1080):
> 
> With Coalesce:
> copy_cnt=100  size=1920x1080 tot_time_copy(us)=36574 (average=365.74)
> copy_cnt=200  size=1920x1080 tot_time_copy(us)=78207 (average=391.035)
> copy_cnt=300  size=1920x1080 tot_time_copy(us)=122170(average=407.233)
> copy_cnt=400  size=1920x1080 tot_time_copy(us)=163678(average=409.195)
> copy_cnt=500  size=1920x1080 tot_time_copy(us)=201872(average=403.744)
> copy_cnt=600  size=1920x1080 tot_time_copy(us)=246174(average=410.29)
> copy_cnt=700  size=1920x1080 tot_time_copy(us)=287043(average=410.061)
> copy_cnt=800  size=1920x1080 tot_time_copy(us)=326462(average=408.077)
> copy_cnt=900  size=1920x1080 tot_time_copy(us)=356882(average=396.536)
> copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566)
> 
> Without Coalesce:
> copy_cnt=100  size=1920x1080 tot_time_copy(us)=44303 (average=443.03)
> copy_cnt=200  size=1920x1080 tot_time_copy(us)=100501(average=502.505)
> copy_cnt=300  size=1920x1080 tot_time_copy(us)=150097(average=500.323)
> copy_cnt=400  size=1920x1080 tot_time_copy(us)=201010(average=502.525)
> copy_cnt=500  size=1920x1080 tot_time_copy(us)=256818(average=513.636)
> copy_cnt=600  size=1920x1080 tot_time_copy(us)=303273(average=505.455)
> copy_cnt=700  size=1920x1080 tot_time_copy(us)=359152(average=513.074)
> copy_cnt=800  size=1920x1080 tot_time_copy(us)=414413(average=518.016)
> copy_cnt=900  size=1920x1080 tot_time_copy(us)=465315(average=517.017)
> copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381)
> 
> 
> I think the results are very good.
> What do you think about?

It looks like a good speed up, but we need a patch created with git 
format-patch that can be applied to the source tree to properly review 
this. Can you send that?

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".


I generated the eml file with "git format-patch" (see attachment).
Is it ok for You?
Thanks

[-- Attachment #2: 0001-image_copy_plane-improve-performance-by-coalesce-row-i.eml --]
[-- Type: message/rfc822, Size: 1045 bytes --]

From: Marco Vianini <marco_vianini@yahoo.it>
Subject: [PATCH] image_copy_plane: improve performance by coalesce row, if possible
Date: Thu, 14 Jul 2022 14:39:13 +0200

Signed-off-by: Marco Vianini <marco_vianini@yahoo.it>
---
 libavutil/imgutils.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
index 9ab5757cf6..9ccb398a3b 100644
--- a/libavutil/imgutils.c
+++ b/libavutil/imgutils.c
@@ -349,6 +349,14 @@ static void image_copy_plane(uint8_t       *dst, ptrdiff_t dst_linesize,
         return;
     av_assert0(FFABS(src_linesize) >= bytewidth);
     av_assert0(FFABS(dst_linesize) >= bytewidth);
+
+    if (dst_linesize == bytewidth && src_linesize == bytewidth) {
+        /** Coalesce rows in this specific case, for perfomances improvement */
+        bytewidth *= height;
+        height = 1;
+        src_linesize = dst_linesize = 0;
+    }
+
     for (;height > 0; height--) {
         memcpy(dst, src, bytewidth);
         dst += dst_linesize;
-- 
2.30.0.windows.2


[-- Attachment #3: Type: text/plain, Size: 251 bytes --]

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-14 12:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <632087708.1175797.1657705107285.ref@mail.yahoo.com>
2022-07-13  9:38 ` [FFmpeg-devel] Performances improvement in "image_copy_plane" Marco Vianini
2022-07-13  9:54   ` Paul B Mahol
2022-07-13 14:53     ` Marco Vianini
2022-07-13 15:10       ` Paul B Mahol
2022-07-13 15:54         ` Marco Vianini
2022-07-13 16:15           ` James Almer
2022-07-14 12:48             ` Marco Vianini
2022-07-13 15:12   ` Timo Rothenpieler

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git