* [FFmpeg-devel] Performances improvement in "image_copy_plane"
[not found] <632087708.1175797.1657705107285.ref@mail.yahoo.com>
@ 2022-07-13 9:38 ` Marco Vianini
2022-07-13 9:54 ` Paul B Mahol
2022-07-13 15:12 ` Timo Rothenpieler
0 siblings, 2 replies; 8+ messages in thread
From: Marco Vianini @ 2022-07-13 9:38 UTC (permalink / raw)
To: ffmpeg-devel
You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"
In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).
Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize, const uint8_t *src, ptrdiff_t src_linesize, ptrdiff_t bytewidth, int height){ if (!dst || !src) return; av_assert0(abs(src_linesize) >= bytewidth); av_assert0(abs(dst_linesize) >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize == bytewidth && src_linesize == bytewidth) { bytewidth *= height; height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP
for (;height > 0; height--) { memcpy(dst, src, bytewidth); dst += dst_linesize; src += src_linesize; }}"
What do You think about?Thank You
Marco Vianini
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
2022-07-13 9:38 ` [FFmpeg-devel] Performances improvement in "image_copy_plane" Marco Vianini
@ 2022-07-13 9:54 ` Paul B Mahol
2022-07-13 14:53 ` Marco Vianini
2022-07-13 15:12 ` Timo Rothenpieler
1 sibling, 1 reply; 8+ messages in thread
From: Paul B Mahol @ 2022-07-13 9:54 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini <
marco_vianini-at-yahoo.it@ffmpeg.org> wrote:
> You can get a very big improvement of performances in the special (but
> very likely) case of: "(dst_linesize == bytewidth && src_linesize ==
> bytewidth)"
>
> In this case in fact We can "Coalesce rows", that is using ONLY ONE
> MEMCPY, instead of a smaller memcpy for every row (that is looping for
> height times).
>
> Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t
> dst_linesize, const uint8_t *src, ptrdiff_t
> src_linesize, ptrdiff_t bytewidth, int
> height){ if (!dst || !src) return;
> av_assert0(abs(src_linesize) >= bytewidth); av_assert0(abs(dst_linesize)
> >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize
> == bytewidth && src_linesize == bytewidth) { bytewidth *= height;
> height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP
> for (;height > 0; height--) { memcpy(dst, src, bytewidth);
> dst += dst_linesize; src += src_linesize; }}"
> What do You think about?Thank You
>
Show the benchmark numbers.
> Marco Vianini
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
2022-07-13 9:54 ` Paul B Mahol
@ 2022-07-13 14:53 ` Marco Vianini
2022-07-13 15:10 ` Paul B Mahol
0 siblings, 1 reply; 8+ messages in thread
From: Marco Vianini @ 2022-07-13 14:53 UTC (permalink / raw)
To: FFmpeg development discussions and patches
I did following tests on Windows 10 64bit.I compiled code in Release.
I copied my pc camera frames 1000 times (resolution 1920x1080):
With Coalesce (MY PATCH):copy_cnt=100 size=1920x1080 tot_time_copy(us)=36574 (average=365.74)copy_cnt=200 size=1920x1080 tot_time_copy(us)=78207 (average=391.035)copy_cnt=300 size=1920x1080 tot_time_copy(us)=122170(average=407.233)copy_cnt=400 size=1920x1080 tot_time_copy(us)=163678(average=409.195)copy_cnt=500 size=1920x1080 tot_time_copy(us)=201872(average=403.744)copy_cnt=600 size=1920x1080 tot_time_copy(us)=246174(average=410.29)copy_cnt=700 size=1920x1080 tot_time_copy(us)=287043(average=410.061)copy_cnt=800 size=1920x1080 tot_time_copy(us)=326462(average=408.077)copy_cnt=900 size=1920x1080 tot_time_copy(us)=356882(average=396.536)copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566)
Without Coalesce:copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 (average=443.03)copy_cnt=200 size=1920x1080 tot_time_copy(us)=100501(average=502.505)copy_cnt=300 size=1920x1080 tot_time_copy(us)=150097(average=500.323)copy_cnt=400 size=1920x1080 tot_time_copy(us)=201010(average=502.525)copy_cnt=500 size=1920x1080 tot_time_copy(us)=256818(average=513.636)copy_cnt=600 size=1920x1080 tot_time_copy(us)=303273(average=505.455)copy_cnt=700 size=1920x1080 tot_time_copy(us)=359152(average=513.074)copy_cnt=800 size=1920x1080 tot_time_copy(us)=414413(average=518.016)copy_cnt=900 size=1920x1080 tot_time_copy(us)=465315(average=517.017)copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381)
I think the results are very good.What do you think about?
Thank You
Il mercoledì 13 luglio 2022 11:52:23 CEST, Paul B Mahol <onemda@gmail.com> ha scritto:
On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini <
marco_vianini-at-yahoo.it@ffmpeg.org> wrote:
> You can get a very big improvement of performances in the special (but
> very likely) case of: "(dst_linesize == bytewidth && src_linesize ==
> bytewidth)"
>
> In this case in fact We can "Coalesce rows", that is using ONLY ONE
> MEMCPY, instead of a smaller memcpy for every row (that is looping for
> height times).
>
> Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t
> dst_linesize, const uint8_t *src, ptrdiff_t
> src_linesize, ptrdiff_t bytewidth, int
> height){ if (!dst || !src) return;
> av_assert0(abs(src_linesize) >= bytewidth); av_assert0(abs(dst_linesize)
> >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize
> == bytewidth && src_linesize == bytewidth) { bytewidth *= height;
> height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP
> for (;height > 0; height--) { memcpy(dst, src, bytewidth);
> dst += dst_linesize; src += src_linesize; }}"
> What do You think about?Thank You
>
Show the benchmark numbers.
> Marco Vianini
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
2022-07-13 14:53 ` Marco Vianini
@ 2022-07-13 15:10 ` Paul B Mahol
2022-07-13 15:54 ` Marco Vianini
0 siblings, 1 reply; 8+ messages in thread
From: Paul B Mahol @ 2022-07-13 15:10 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Wed, Jul 13, 2022 at 5:02 PM Marco Vianini <
marco_vianini-at-yahoo.it@ffmpeg.org> wrote:
> I did following tests on Windows 10 64bit.I compiled code in Release.
> I copied my pc camera frames 1000 times (resolution 1920x1080):
> With Coalesce (MY PATCH):copy_cnt=100 size=1920x1080
> tot_time_copy(us)=36574 (average=365.74)copy_cnt=200 size=1920x1080
> tot_time_copy(us)=78207 (average=391.035)copy_cnt=300 size=1920x1080
> tot_time_copy(us)=122170(average=407.233)copy_cnt=400 size=1920x1080
> tot_time_copy(us)=163678(average=409.195)copy_cnt=500 size=1920x1080
> tot_time_copy(us)=201872(average=403.744)copy_cnt=600 size=1920x1080
> tot_time_copy(us)=246174(average=410.29)copy_cnt=700 size=1920x1080
> tot_time_copy(us)=287043(average=410.061)copy_cnt=800 size=1920x1080
> tot_time_copy(us)=326462(average=408.077)copy_cnt=900 size=1920x1080
> tot_time_copy(us)=356882(average=396.536)copy_cnt=1000 size=1920x1080
> tot_time_copy(us)=394566(average=394.566)
> Without Coalesce:copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303
> (average=443.03)copy_cnt=200 size=1920x1080
> tot_time_copy(us)=100501(average=502.505)copy_cnt=300 size=1920x1080
> tot_time_copy(us)=150097(average=500.323)copy_cnt=400 size=1920x1080
> tot_time_copy(us)=201010(average=502.525)copy_cnt=500 size=1920x1080
> tot_time_copy(us)=256818(average=513.636)copy_cnt=600 size=1920x1080
> tot_time_copy(us)=303273(average=505.455)copy_cnt=700 size=1920x1080
> tot_time_copy(us)=359152(average=513.074)copy_cnt=800 size=1920x1080
> tot_time_copy(us)=414413(average=518.016)copy_cnt=900 size=1920x1080
> tot_time_copy(us)=465315(average=517.017)copy_cnt=1000 size=1920x1080
> tot_time_copy(us)=520381(average=520.381)
> I think the results are very good.What do you think about?
> Thank You
>
>
First stop top posting.
Where is patch?
>
> Il mercoledì 13 luglio 2022 11:52:23 CEST, Paul B Mahol <
> onemda@gmail.com> ha scritto:
>
> On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini <
> marco_vianini-at-yahoo.it@ffmpeg.org> wrote:
>
> > You can get a very big improvement of performances in the special (but
> > very likely) case of: "(dst_linesize == bytewidth && src_linesize ==
> > bytewidth)"
> >
> > In this case in fact We can "Coalesce rows", that is using ONLY ONE
> > MEMCPY, instead of a smaller memcpy for every row (that is looping for
> > height times).
> >
> > Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t
> > dst_linesize, const uint8_t *src, ptrdiff_t
> > src_linesize, ptrdiff_t bytewidth, int
> > height){ if (!dst || !src) return;
> > av_assert0(abs(src_linesize) >= bytewidth);
> av_assert0(abs(dst_linesize)
> > >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize
> > == bytewidth && src_linesize == bytewidth) { bytewidth *= height;
> > height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP
> > for (;height > 0; height--) { memcpy(dst, src, bytewidth);
> > dst += dst_linesize; src += src_linesize; }}"
> > What do You think about?Thank You
> >
>
> Show the benchmark numbers.
>
>
> > Marco Vianini
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
2022-07-13 9:38 ` [FFmpeg-devel] Performances improvement in "image_copy_plane" Marco Vianini
2022-07-13 9:54 ` Paul B Mahol
@ 2022-07-13 15:12 ` Timo Rothenpieler
1 sibling, 0 replies; 8+ messages in thread
From: Timo Rothenpieler @ 2022-07-13 15:12 UTC (permalink / raw)
To: ffmpeg-devel
On 13.07.2022 11:38, Marco Vianini wrote:
> You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"
Isn't all that matters dst_linesize == src_linesize, and then you can
memcpy() the whole plane?
> In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).
>
> Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize, const uint8_t *src, ptrdiff_t src_linesize, ptrdiff_t bytewidth, int height){ if (!dst || !src) return; av_assert0(abs(src_linesize) >= bytewidth); av_assert0(abs(dst_linesize) >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize == bytewidth && src_linesize == bytewidth) { bytewidth *= height; height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP
> for (;height > 0; height--) { memcpy(dst, src, bytewidth); dst += dst_linesize; src += src_linesize; }}"
> What do You think about?Thank You
> Marco Vianini
That code is mangled by your mail client and practically unreadable.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
2022-07-13 15:10 ` Paul B Mahol
@ 2022-07-13 15:54 ` Marco Vianini
2022-07-13 16:15 ` James Almer
0 siblings, 1 reply; 8+ messages in thread
From: Marco Vianini @ 2022-07-13 15:54 UTC (permalink / raw)
To: FFmpeg development discussions and patches
On Wednesday, July 13, 2022 at 05:08:27 PM GMT+2, Paul B Mahol <onemda@gmail.com> wrote:
On Wed, Jul 13, 2022 at 5:02 PM Marco Vianini <
marco_vianini-at-yahoo.it@ffmpeg.org> wrote:
> I did following tests on Windows 10 64bit.I compiled code in Release.
> I copied my pc camera frames 1000 times (resolution 1920x1080):
> With Coalesce (MY PATCH):copy_cnt=100 size=1920x1080
> tot_time_copy(us)=36574 (average=365.74)copy_cnt=200 size=1920x1080
> tot_time_copy(us)=78207 (average=391.035)copy_cnt=300 size=1920x1080
> tot_time_copy(us)=122170(average=407.233)copy_cnt=400 size=1920x1080
> tot_time_copy(us)=163678(average=409.195)copy_cnt=500 size=1920x1080
> tot_time_copy(us)=201872(average=403.744)copy_cnt=600 size=1920x1080
> tot_time_copy(us)=246174(average=410.29)copy_cnt=700 size=1920x1080
> tot_time_copy(us)=287043(average=410.061)copy_cnt=800 size=1920x1080
> tot_time_copy(us)=326462(average=408.077)copy_cnt=900 size=1920x1080
> tot_time_copy(us)=356882(average=396.536)copy_cnt=1000 size=1920x1080
> tot_time_copy(us)=394566(average=394.566)
> Without Coalesce:copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303
> (average=443.03)copy_cnt=200 size=1920x1080
> tot_time_copy(us)=100501(average=502.505)copy_cnt=300 size=1920x1080
> tot_time_copy(us)=150097(average=500.323)copy_cnt=400 size=1920x1080
> tot_time_copy(us)=201010(average=502.525)copy_cnt=500 size=1920x1080
> tot_time_copy(us)=256818(average=513.636)copy_cnt=600 size=1920x1080
> tot_time_copy(us)=303273(average=505.455)copy_cnt=700 size=1920x1080
> tot_time_copy(us)=359152(average=513.074)copy_cnt=800 size=1920x1080
> tot_time_copy(us)=414413(average=518.016)copy_cnt=900 size=1920x1080
> tot_time_copy(us)=465315(average=517.017)copy_cnt=1000 size=1920x1080
> tot_time_copy(us)=520381(average=520.381)
> I think the results are very good.What do you think about?
> Thank You
>
>
First stop top posting.
Where is patch?
>
> Il mercoledì 13 luglio 2022 11:52:23 CEST, Paul B Mahol <
> onemda@gmail.com> ha scritto:
>
> On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini <
> marco_vianini-at-yahoo.it@ffmpeg.org> wrote:
>
> > You can get a very big improvement of performances in the special (but
> > very likely) case of: "(dst_linesize == bytewidth && src_linesize ==
> > bytewidth)"
> >
> > In this case in fact We can "Coalesce rows", that is using ONLY ONE
> > MEMCPY, instead of a smaller memcpy for every row (that is looping for
> > height times).
> >
> > Code:"static void image_copy_plane(uint8_t *dst, ptrdiff_t
> > dst_linesize, const uint8_t *src, ptrdiff_t
> > src_linesize, ptrdiff_t bytewidth, int
> > height){ if (!dst || !src) return;
> > av_assert0(abs(src_linesize) >= bytewidth);
> av_assert0(abs(dst_linesize)
> > >= bytewidth); // MY PATCH START // Coalesce rows. if (dst_linesize
> > == bytewidth && src_linesize == bytewidth) { bytewidth *= height;
> > height = 1; src_linesize = dst_linesize = 0; }// MY PATCH STOP
> > for (;height > 0; height--) { memcpy(dst, src, bytewidth);
> > dst += dst_linesize; src += src_linesize; }}"
> > What do You think about?Thank You
> >
>
> Show the benchmark numbers.
>
>
> > Marco Vianini
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
Sorry, my mail client was using html format.
I hope now the mail will be sent correctly.
You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"
In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).
Code:
"
static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize,
const uint8_t *src, ptrdiff_t src_linesize,
ptrdiff_t bytewidth, int height)
{
if (!dst || !src)
return;
av_assert0(abs(src_linesize) >= bytewidth);
av_assert0(abs(dst_linesize) >= bytewidth);
/// MY PATCH START
/// Coalesce rows.
if (dst_linesize == bytewidth && src_linesize == bytewidth) {
bytewidth *= height;
height = 1;
src_linesize = dst_linesize = 0;
}
/// MY PATCH STOP
for (;height > 0; height--) {
memcpy(dst, src, bytewidth);
dst += dst_linesize;
src += src_linesize;
}
}
"
I did following tests on Windows 10 64bit.
I compiled code in Release.
I copied my pc camera frames 1000 times (resolution 1920x1080):
With Coalesce:
copy_cnt=100 size=1920x1080 tot_time_copy(us)=36574 (average=365.74)
copy_cnt=200 size=1920x1080 tot_time_copy(us)=78207 (average=391.035)
copy_cnt=300 size=1920x1080 tot_time_copy(us)=122170(average=407.233)
copy_cnt=400 size=1920x1080 tot_time_copy(us)=163678(average=409.195)
copy_cnt=500 size=1920x1080 tot_time_copy(us)=201872(average=403.744)
copy_cnt=600 size=1920x1080 tot_time_copy(us)=246174(average=410.29)
copy_cnt=700 size=1920x1080 tot_time_copy(us)=287043(average=410.061)
copy_cnt=800 size=1920x1080 tot_time_copy(us)=326462(average=408.077)
copy_cnt=900 size=1920x1080 tot_time_copy(us)=356882(average=396.536)
copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566)
Without Coalesce:
copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 (average=443.03)
copy_cnt=200 size=1920x1080 tot_time_copy(us)=100501(average=502.505)
copy_cnt=300 size=1920x1080 tot_time_copy(us)=150097(average=500.323)
copy_cnt=400 size=1920x1080 tot_time_copy(us)=201010(average=502.525)
copy_cnt=500 size=1920x1080 tot_time_copy(us)=256818(average=513.636)
copy_cnt=600 size=1920x1080 tot_time_copy(us)=303273(average=505.455)
copy_cnt=700 size=1920x1080 tot_time_copy(us)=359152(average=513.074)
copy_cnt=800 size=1920x1080 tot_time_copy(us)=414413(average=518.016)
copy_cnt=900 size=1920x1080 tot_time_copy(us)=465315(average=517.017)
copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381)
I think the results are very good.
What do you think about?
Thank You
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
2022-07-13 15:54 ` Marco Vianini
@ 2022-07-13 16:15 ` James Almer
2022-07-14 12:48 ` Marco Vianini
0 siblings, 1 reply; 8+ messages in thread
From: James Almer @ 2022-07-13 16:15 UTC (permalink / raw)
To: ffmpeg-devel
On 7/13/2022 12:54 PM, Marco Vianini wrote:
> Sorry, my mail client was using html format.
> I hope now the mail will be sent correctly.
>
>
> You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"
>
> In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).
>
> Code:
> "
> static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize,
> const uint8_t *src, ptrdiff_t src_linesize,
> ptrdiff_t bytewidth, int height)
> {
> if (!dst || !src)
> return;
> av_assert0(abs(src_linesize) >= bytewidth);
> av_assert0(abs(dst_linesize) >= bytewidth);
>
> /// MY PATCH START
> /// Coalesce rows.
> if (dst_linesize == bytewidth && src_linesize == bytewidth) {
> bytewidth *= height;
> height = 1;
> src_linesize = dst_linesize = 0;
> }
> /// MY PATCH STOP
>
> for (;height > 0; height--) {
> memcpy(dst, src, bytewidth);
> dst += dst_linesize;
> src += src_linesize;
> }
> }
> "
>
>
> I did following tests on Windows 10 64bit.
> I compiled code in Release.
> I copied my pc camera frames 1000 times (resolution 1920x1080):
>
> With Coalesce:
> copy_cnt=100 size=1920x1080 tot_time_copy(us)=36574 (average=365.74)
> copy_cnt=200 size=1920x1080 tot_time_copy(us)=78207 (average=391.035)
> copy_cnt=300 size=1920x1080 tot_time_copy(us)=122170(average=407.233)
> copy_cnt=400 size=1920x1080 tot_time_copy(us)=163678(average=409.195)
> copy_cnt=500 size=1920x1080 tot_time_copy(us)=201872(average=403.744)
> copy_cnt=600 size=1920x1080 tot_time_copy(us)=246174(average=410.29)
> copy_cnt=700 size=1920x1080 tot_time_copy(us)=287043(average=410.061)
> copy_cnt=800 size=1920x1080 tot_time_copy(us)=326462(average=408.077)
> copy_cnt=900 size=1920x1080 tot_time_copy(us)=356882(average=396.536)
> copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566)
>
> Without Coalesce:
> copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 (average=443.03)
> copy_cnt=200 size=1920x1080 tot_time_copy(us)=100501(average=502.505)
> copy_cnt=300 size=1920x1080 tot_time_copy(us)=150097(average=500.323)
> copy_cnt=400 size=1920x1080 tot_time_copy(us)=201010(average=502.525)
> copy_cnt=500 size=1920x1080 tot_time_copy(us)=256818(average=513.636)
> copy_cnt=600 size=1920x1080 tot_time_copy(us)=303273(average=505.455)
> copy_cnt=700 size=1920x1080 tot_time_copy(us)=359152(average=513.074)
> copy_cnt=800 size=1920x1080 tot_time_copy(us)=414413(average=518.016)
> copy_cnt=900 size=1920x1080 tot_time_copy(us)=465315(average=517.017)
> copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381)
>
>
> I think the results are very good.
> What do you think about?
It looks like a good speed up, but we need a patch created with git
format-patch that can be applied to the source tree to properly review
this. Can you send that?
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [FFmpeg-devel] Performances improvement in "image_copy_plane"
2022-07-13 16:15 ` James Almer
@ 2022-07-14 12:48 ` Marco Vianini
0 siblings, 0 replies; 8+ messages in thread
From: Marco Vianini @ 2022-07-14 12:48 UTC (permalink / raw)
To: ffmpeg-devel
[-- Attachment #1: Type: text/plain, Size: 3802 bytes --]
On Wednesday, July 13, 2022 at 06:16:15 PM GMT+2, James Almer <jamrial@gmail.com> wrote:
On 7/13/2022 12:54 PM, Marco Vianini wrote:
> Sorry, my mail client was using html format.
> I hope now the mail will be sent correctly.
>
>
> You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"
>
> In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).
>
> Code:
> "
> static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize,
> const uint8_t *src, ptrdiff_t src_linesize,
> ptrdiff_t bytewidth, int height)
> {
> if (!dst || !src)
> return;
> av_assert0(abs(src_linesize) >= bytewidth);
> av_assert0(abs(dst_linesize) >= bytewidth);
>
> /// MY PATCH START
> /// Coalesce rows.
> if (dst_linesize == bytewidth && src_linesize == bytewidth) {
> bytewidth *= height;
> height = 1;
> src_linesize = dst_linesize = 0;
> }
> /// MY PATCH STOP
>
> for (;height > 0; height--) {
> memcpy(dst, src, bytewidth);
> dst += dst_linesize;
> src += src_linesize;
> }
> }
> "
>
>
> I did following tests on Windows 10 64bit.
> I compiled code in Release.
> I copied my pc camera frames 1000 times (resolution 1920x1080):
>
> With Coalesce:
> copy_cnt=100 size=1920x1080 tot_time_copy(us)=36574 (average=365.74)
> copy_cnt=200 size=1920x1080 tot_time_copy(us)=78207 (average=391.035)
> copy_cnt=300 size=1920x1080 tot_time_copy(us)=122170(average=407.233)
> copy_cnt=400 size=1920x1080 tot_time_copy(us)=163678(average=409.195)
> copy_cnt=500 size=1920x1080 tot_time_copy(us)=201872(average=403.744)
> copy_cnt=600 size=1920x1080 tot_time_copy(us)=246174(average=410.29)
> copy_cnt=700 size=1920x1080 tot_time_copy(us)=287043(average=410.061)
> copy_cnt=800 size=1920x1080 tot_time_copy(us)=326462(average=408.077)
> copy_cnt=900 size=1920x1080 tot_time_copy(us)=356882(average=396.536)
> copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566)
>
> Without Coalesce:
> copy_cnt=100 size=1920x1080 tot_time_copy(us)=44303 (average=443.03)
> copy_cnt=200 size=1920x1080 tot_time_copy(us)=100501(average=502.505)
> copy_cnt=300 size=1920x1080 tot_time_copy(us)=150097(average=500.323)
> copy_cnt=400 size=1920x1080 tot_time_copy(us)=201010(average=502.525)
> copy_cnt=500 size=1920x1080 tot_time_copy(us)=256818(average=513.636)
> copy_cnt=600 size=1920x1080 tot_time_copy(us)=303273(average=505.455)
> copy_cnt=700 size=1920x1080 tot_time_copy(us)=359152(average=513.074)
> copy_cnt=800 size=1920x1080 tot_time_copy(us)=414413(average=518.016)
> copy_cnt=900 size=1920x1080 tot_time_copy(us)=465315(average=517.017)
> copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381)
>
>
> I think the results are very good.
> What do you think about?
It looks like a good speed up, but we need a patch created with git
format-patch that can be applied to the source tree to properly review
this. Can you send that?
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
I generated the eml file with "git format-patch" (see attachment).
Is it ok for You?
Thanks
[-- Attachment #2: 0001-image_copy_plane-improve-performance-by-coalesce-row-i.eml --]
[-- Type: message/rfc822, Size: 1045 bytes --]
From: Marco Vianini <marco_vianini@yahoo.it>
Subject: [PATCH] image_copy_plane: improve performance by coalesce row, if possible
Date: Thu, 14 Jul 2022 14:39:13 +0200
Signed-off-by: Marco Vianini <marco_vianini@yahoo.it>
---
libavutil/imgutils.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
index 9ab5757cf6..9ccb398a3b 100644
--- a/libavutil/imgutils.c
+++ b/libavutil/imgutils.c
@@ -349,6 +349,14 @@ static void image_copy_plane(uint8_t *dst, ptrdiff_t dst_linesize,
return;
av_assert0(FFABS(src_linesize) >= bytewidth);
av_assert0(FFABS(dst_linesize) >= bytewidth);
+
+ if (dst_linesize == bytewidth && src_linesize == bytewidth) {
+ /** Coalesce rows in this specific case, for perfomances improvement */
+ bytewidth *= height;
+ height = 1;
+ src_linesize = dst_linesize = 0;
+ }
+
for (;height > 0; height--) {
memcpy(dst, src, bytewidth);
dst += dst_linesize;
--
2.30.0.windows.2
[-- Attachment #3: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-07-14 12:48 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <632087708.1175797.1657705107285.ref@mail.yahoo.com>
2022-07-13 9:38 ` [FFmpeg-devel] Performances improvement in "image_copy_plane" Marco Vianini
2022-07-13 9:54 ` Paul B Mahol
2022-07-13 14:53 ` Marco Vianini
2022-07-13 15:10 ` Paul B Mahol
2022-07-13 15:54 ` Marco Vianini
2022-07-13 16:15 ` James Almer
2022-07-14 12:48 ` Marco Vianini
2022-07-13 15:12 ` Timo Rothenpieler
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git