* [FFmpeg-devel] [PATCH v3] avcodec/mathops: Optimize generic mid_pred function
@ 2023-03-06 9:10 Junxian Zhu
0 siblings, 0 replies; 5+ messages in thread
From: Junxian Zhu @ 2023-03-06 9:10 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Junxian Zhu, jiaxun.yang
From: Junxian Zhu <zhujunxian@oss.cipunited.com>
Rewrite mid_pred function in generic mathops.h, reduce branch jump to improve performance. And because nowadays new version compiler can compile enough short asmbbely code as handwritting in these function, so remove specified optimized mips inline asmbbely mathops.h.
Signed-off-by: Junxian Zhu <zhujunxian@oss.cipunited.com>
---
libavcodec/mathops.h | 20 ++++--------
libavcodec/mips/mathops.h | 67 ---------------------------------------
2 files changed, 6 insertions(+), 81 deletions(-)
delete mode 100644 libavcodec/mips/mathops.h
diff --git a/libavcodec/mathops.h b/libavcodec/mathops.h
index c89054d6ed..526ffe0eec 100644
--- a/libavcodec/mathops.h
+++ b/libavcodec/mathops.h
@@ -41,8 +41,6 @@ extern const uint8_t ff_zigzag_scan[16+1];
# include "arm/mathops.h"
#elif ARCH_AVR32
# include "avr32/mathops.h"
-#elif ARCH_MIPS
-# include "mips/mathops.h"
#elif ARCH_PPC
# include "ppc/mathops.h"
#elif ARCH_X86
@@ -98,18 +96,12 @@ static av_always_inline unsigned UMULH(unsigned a, unsigned b){
#define mid_pred mid_pred
static inline av_const int mid_pred(int a, int b, int c)
{
- if(a>b){
- if(c>b){
- if(c>a) b=a;
- else b=c;
- }
- }else{
- if(b>c){
- if(c>a) b=c;
- else b=a;
- }
- }
- return b;
+ int t0,t1,t2,t3;
+ t0 = (a > b) ? b : a ;
+ t1 = (a > b) ? a : b ;
+ t2 = (t0 > c) ? t0 : c;
+ t3 = (t1 > t2) ? t2 : t1;
+ return t3;
}
#endif
diff --git a/libavcodec/mips/mathops.h b/libavcodec/mips/mathops.h
deleted file mode 100644
index bb9dc8375a..0000000000
--- a/libavcodec/mips/mathops.h
+++ /dev/null
@@ -1,67 +0,0 @@
-/*
- * Copyright (c) 2009 Mans Rullgard <mans@mansr.com>
- * Copyright (c) 2015 Zhou Xiaoyong <zhouxiaoyong@loongson.cn>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#ifndef AVCODEC_MIPS_MATHOPS_H
-#define AVCODEC_MIPS_MATHOPS_H
-
-#include <stdint.h>
-#include "config.h"
-#include "libavutil/common.h"
-
-#if HAVE_INLINE_ASM
-
-#if HAVE_LOONGSON3
-
-#define MULH MULH
-static inline av_const int MULH(int a, int b)
-{
- int c;
- __asm__ ("dmult %1, %2 \n\t"
- "mflo %0 \n\t"
- "dsrl %0, %0, 32 \n\t"
- : "=r"(c)
- : "r"(a),"r"(b)
- : "hi", "lo");
- return c;
-}
-
-#define mid_pred mid_pred
-static inline av_const int mid_pred(int a, int b, int c)
-{
- int t = b;
- __asm__ ("sgt $8, %1, %2 \n\t"
- "movn %0, %1, $8 \n\t"
- "movn %1, %2, $8 \n\t"
- "sgt $8, %1, %3 \n\t"
- "movz %1, %3, $8 \n\t"
- "sgt $8, %0, %1 \n\t"
- "movn %0, %1, $8 \n\t"
- : "+&r"(t),"+&r"(a)
- : "r"(b),"r"(c)
- : "$8");
- return t;
-}
-
-#endif /* HAVE_LOONGSON3 */
-
-#endif /* HAVE_INLINE_ASM */
-
-#endif /* AVCODEC_MIPS_MATHOPS_H */
--
2.39.2.windows.1
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH v3] avcodec/mathops: Optimize generic mid_pred function
2023-03-15 10:09 ` YunQiang Su
@ 2023-03-16 21:56 ` Michael Niedermayer
0 siblings, 0 replies; 5+ messages in thread
From: Michael Niedermayer @ 2023-03-16 21:56 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 3229 bytes --]
On Wed, Mar 15, 2023 at 06:09:13PM +0800, YunQiang Su wrote:
> Michael Niedermayer <michael@niedermayer.cc> 于2023年3月8日周三 04:45写道:
> >
> > On Tue, Mar 07, 2023 at 05:08:27PM +0800, Junxian Zhu wrote:
> > > From: Junxian Zhu <zhujunxian@oss.cipunited.com>
> > >
> > > Rewrite mid_pred function in generic mathops.h, reduce branch jump to improve performance. And because nowadays new version compiler can compile enough short asmbbely code as handwritting in these function, so remove specified optimized mips inline asmbbely mathops.h.
> >
> > as you write, that it improves performance
> > what speed effect does this have exactly?
> > thx
> >
>
> I tested the performance, using this code
[...]
> On MacOS 13.2 with Apple M1:
> The old code the new code
> 2.1s 2.3s
>
> On Cavium ThunderX / arm64 (GCC 10.2.1 -O3)
> The old code the new code
> 52.7s 37.8s
>
> On Loongson 3A4000/mips64el (GCC 10.2.1 -O3)
> The old code the new code
> 90s 5s
>
> On Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz (GCC 10.2.1 -O3)
> The old code the new code
> 14.4s 15.4s
>
> On SF19A2890/MIPS interAptiv (GCC 10.2.1 -O3)
> The old code the new code
> 314s 39.3s
>
> On Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz (GCC 12.2.0 -O3)
> The old code the new code
> 14.4s 8.8s
>
> On sifive,bullet0/rv64imafdc (GCC 12.2.0 -O3, 1e6 times instead of 1e7)
> The old code the new code
> 11.9s 15.2s
>
> On Freescale i.MX53/ARMv7 Processor rev 5 (v7l) (GCC 12.2.0 -O3, 1e6
> times instead of 1e7)
> The old code the new code
> 24.1s 15.7s
>
> On POWER8 (architected), altivec supported, BIG ENDIAN, ppc64 (GCC 12.2.0 -O3)
> The old code the new code
> 43.1s 50.8s
>
> On POWER8 (architected), altivec supported, LITTLE ENDIAN, ppc64el
> (GCC 12.2.0 -O3)
> The old code the new code
> 7.8s 4.7s
>
> On PA8900 (Shortfin) PA-RISC (GCC 12.2.0 -O3 1e6 times instead of 1e7)
> The old code the new code
> 39.9s 47.2s
>
> On IBM/S390 aka s390x (GCC 12.2.0 -O3)
> The old code the new code
> 82.2s 30.8s
>
> On Intel(R) Itanium(R) Processor 9320 (GCC 12.2.0 -O3)
> The old code the new code
> 89.5s 78.1s
>
> Cavium Octeon III V0.2 FPU V0.0 /mipsel (GCC 12.2.0 -O3)
> The old code the new code
> 117.5s 118.5s
These cover a quite extensive set of hw, impressive
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Dictatorship: All citizens are under surveillance, all their steps and
actions recorded, for the politicians to enforce control.
Democracy: All politicians are under surveillance, all their steps and
actions recorded, for the citizens to enforce control.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH v3] avcodec/mathops: Optimize generic mid_pred function
2023-03-07 20:45 ` Michael Niedermayer
@ 2023-03-15 10:09 ` YunQiang Su
2023-03-16 21:56 ` Michael Niedermayer
0 siblings, 1 reply; 5+ messages in thread
From: YunQiang Su @ 2023-03-15 10:09 UTC (permalink / raw)
To: FFmpeg development discussions and patches
Michael Niedermayer <michael@niedermayer.cc> 于2023年3月8日周三 04:45写道:
>
> On Tue, Mar 07, 2023 at 05:08:27PM +0800, Junxian Zhu wrote:
> > From: Junxian Zhu <zhujunxian@oss.cipunited.com>
> >
> > Rewrite mid_pred function in generic mathops.h, reduce branch jump to improve performance. And because nowadays new version compiler can compile enough short asmbbely code as handwritting in these function, so remove specified optimized mips inline asmbbely mathops.h.
>
> as you write, that it improves performance
> what speed effect does this have exactly?
> thx
>
I tested the performance, using this code
```
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define FFMIN(a, b) ( a>b ? b : a )
#define FFMAX(a, b) ( a>b ? a : b )
int mid_pred(int a, int b, int c)
{
#if OLD
if(a>b){
if(c>b){
if(c>a) b=a;
else b=c;
}
}else{
if(b>c){
if(c>a) b=c;
else b=a;
}
}
return b;
#else
int t0,t1,t2,t3;
t0 = (a > b) ? b : a ;
t1 = (a > b) ? a : b ;
t2 = (t0 > c) ? t0 : c;
t3 = (t1 > t2) ? t2 : t1;
return t3;
#endif
}
int main() {
int a[1024], b[1024], c[1024], d[1024];
srand(time(NULL));
for(int i=0; i<1024; i++) {
a[i] = rand();
b[i] = rand();
c[i] = rand();
}
for (int j=0; j<1e7+rand()%2; j++)
for(int i=0; i<1024; i++)
d[i] = mid_pred(a[i], b[i], c[i]);
printf("%d, %d\n", d[rand()%1024], j);
}
```
On MacOS 13.2 with Apple M1:
The old code the new code
2.1s 2.3s
On Cavium ThunderX / arm64 (GCC 10.2.1 -O3)
The old code the new code
52.7s 37.8s
On Loongson 3A4000/mips64el (GCC 10.2.1 -O3)
The old code the new code
90s 5s
On Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz (GCC 10.2.1 -O3)
The old code the new code
14.4s 15.4s
On SF19A2890/MIPS interAptiv (GCC 10.2.1 -O3)
The old code the new code
314s 39.3s
On Intel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz (GCC 12.2.0 -O3)
The old code the new code
14.4s 8.8s
On sifive,bullet0/rv64imafdc (GCC 12.2.0 -O3, 1e6 times instead of 1e7)
The old code the new code
11.9s 15.2s
On Freescale i.MX53/ARMv7 Processor rev 5 (v7l) (GCC 12.2.0 -O3, 1e6
times instead of 1e7)
The old code the new code
24.1s 15.7s
On POWER8 (architected), altivec supported, BIG ENDIAN, ppc64 (GCC 12.2.0 -O3)
The old code the new code
43.1s 50.8s
On POWER8 (architected), altivec supported, LITTLE ENDIAN, ppc64el
(GCC 12.2.0 -O3)
The old code the new code
7.8s 4.7s
On PA8900 (Shortfin) PA-RISC (GCC 12.2.0 -O3 1e6 times instead of 1e7)
The old code the new code
39.9s 47.2s
On IBM/S390 aka s390x (GCC 12.2.0 -O3)
The old code the new code
82.2s 30.8s
On Intel(R) Itanium(R) Processor 9320 (GCC 12.2.0 -O3)
The old code the new code
89.5s 78.1s
Cavium Octeon III V0.2 FPU V0.0 /mipsel (GCC 12.2.0 -O3)
The old code the new code
117.5s 118.5s
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> It is dangerous to be right in matters on which the established authorities
> are wrong. -- Voltaire
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [FFmpeg-devel] [PATCH v3] avcodec/mathops: Optimize generic mid_pred function
2023-03-07 9:08 Junxian Zhu
@ 2023-03-07 20:45 ` Michael Niedermayer
2023-03-15 10:09 ` YunQiang Su
0 siblings, 1 reply; 5+ messages in thread
From: Michael Niedermayer @ 2023-03-07 20:45 UTC (permalink / raw)
To: FFmpeg development discussions and patches
[-- Attachment #1.1: Type: text/plain, Size: 674 bytes --]
On Tue, Mar 07, 2023 at 05:08:27PM +0800, Junxian Zhu wrote:
> From: Junxian Zhu <zhujunxian@oss.cipunited.com>
>
> Rewrite mid_pred function in generic mathops.h, reduce branch jump to improve performance. And because nowadays new version compiler can compile enough short asmbbely code as handwritting in these function, so remove specified optimized mips inline asmbbely mathops.h.
as you write, that it improves performance
what speed effect does this have exactly?
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 251 bytes --]
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
* [FFmpeg-devel] [PATCH v3] avcodec/mathops: Optimize generic mid_pred function
@ 2023-03-07 9:08 Junxian Zhu
2023-03-07 20:45 ` Michael Niedermayer
0 siblings, 1 reply; 5+ messages in thread
From: Junxian Zhu @ 2023-03-07 9:08 UTC (permalink / raw)
To: ffmpeg-devel; +Cc: Junxian Zhu, jiaxun.yang
From: Junxian Zhu <zhujunxian@oss.cipunited.com>
Rewrite mid_pred function in generic mathops.h, reduce branch jump to improve performance. And because nowadays new version compiler can compile enough short asmbbely code as handwritting in these function, so remove specified optimized mips inline asmbbely mathops.h.
Signed-off-by: Junxian Zhu <zhujunxian@oss.cipunited.com>
---
libavcodec/mathops.h | 19 +++--------
libavcodec/mips/mathops.h | 67 ---------------------------------------
2 files changed, 5 insertions(+), 81 deletions(-)
delete mode 100644 libavcodec/mips/mathops.h
diff --git a/libavcodec/mathops.h b/libavcodec/mathops.h
index c89054d6ed..f2ba4fabce 100644
--- a/libavcodec/mathops.h
+++ b/libavcodec/mathops.h
@@ -41,8 +41,6 @@ extern const uint8_t ff_zigzag_scan[16+1];
# include "arm/mathops.h"
#elif ARCH_AVR32
# include "avr32/mathops.h"
-#elif ARCH_MIPS
-# include "mips/mathops.h"
#elif ARCH_PPC
# include "ppc/mathops.h"
#elif ARCH_X86
@@ -98,18 +96,11 @@ static av_always_inline unsigned UMULH(unsigned a, unsigned b){
#define mid_pred mid_pred
static inline av_const int mid_pred(int a, int b, int c)
{
- if(a>b){
- if(c>b){
- if(c>a) b=a;
- else b=c;
- }
- }else{
- if(b>c){
- if(c>a) b=c;
- else b=a;
- }
- }
- return b;
+ int t0,t1,t2,t3;
+ int t0 = FFMIN(a, b);
+ int t1 = FFMAX(a, b);
+ int t2 = FFMAX(t0, c);
+ return FFMIN(t1, t2);
}
#endif
diff --git a/libavcodec/mips/mathops.h b/libavcodec/mips/mathops.h
deleted file mode 100644
index bb9dc8375a..0000000000
--- a/libavcodec/mips/mathops.h
+++ /dev/null
@@ -1,67 +0,0 @@
-/*
- * Copyright (c) 2009 Mans Rullgard <mans@mansr.com>
- * Copyright (c) 2015 Zhou Xiaoyong <zhouxiaoyong@loongson.cn>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#ifndef AVCODEC_MIPS_MATHOPS_H
-#define AVCODEC_MIPS_MATHOPS_H
-
-#include <stdint.h>
-#include "config.h"
-#include "libavutil/common.h"
-
-#if HAVE_INLINE_ASM
-
-#if HAVE_LOONGSON3
-
-#define MULH MULH
-static inline av_const int MULH(int a, int b)
-{
- int c;
- __asm__ ("dmult %1, %2 \n\t"
- "mflo %0 \n\t"
- "dsrl %0, %0, 32 \n\t"
- : "=r"(c)
- : "r"(a),"r"(b)
- : "hi", "lo");
- return c;
-}
-
-#define mid_pred mid_pred
-static inline av_const int mid_pred(int a, int b, int c)
-{
- int t = b;
- __asm__ ("sgt $8, %1, %2 \n\t"
- "movn %0, %1, $8 \n\t"
- "movn %1, %2, $8 \n\t"
- "sgt $8, %1, %3 \n\t"
- "movz %1, %3, $8 \n\t"
- "sgt $8, %0, %1 \n\t"
- "movn %0, %1, $8 \n\t"
- : "+&r"(t),"+&r"(a)
- : "r"(b),"r"(c)
- : "$8");
- return t;
-}
-
-#endif /* HAVE_LOONGSON3 */
-
-#endif /* HAVE_INLINE_ASM */
-
-#endif /* AVCODEC_MIPS_MATHOPS_H */
--
2.39.2.windows.1
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-03-16 21:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-06 9:10 [FFmpeg-devel] [PATCH v3] avcodec/mathops: Optimize generic mid_pred function Junxian Zhu
2023-03-07 9:08 Junxian Zhu
2023-03-07 20:45 ` Michael Niedermayer
2023-03-15 10:09 ` YunQiang Su
2023-03-16 21:56 ` Michael Niedermayer
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git