From: Marton Balint <cus@passwd.hu>
To: ffmpeg-devel@ffmpeg.org
Cc: Marton Balint <cus@passwd.hu>,
Devin Heitmueller <dheitmueller@ltnglobal.com>,
Devin Heitmueller <devin.heitmueller@ltnglobal.com>
Subject: [FFmpeg-devel] [PATCH] avcodec/bitpacked_dec: optimize bitpacked_decode_yuv422p10
Date: Wed, 13 Dec 2023 20:58:20 +0100
Message-ID: <20231213195820.21046-1-cus@passwd.hu> (raw)
In-Reply-To: <CAHGibzF0gg0Tu0bpdOs71=y=5Nmg+MHJZya315B93TyQJ6SqDQ@mail.gmail.com>
From: Devin Heitmueller <devin.heitmueller@ltnglobal.com>
Rework the code a bit to speed up the 10-bit bitpacked decoding
routine. This is probably about as fast as I can get it without
switching to assembly language.
Demonstratable with:
./ffmpeg -f lavfi -i "smptehdbars=size=3840x2160" -c bitpacked -f image2 -frames:v 1 source.yuv
./ffmpeg -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le out.yuv
On my development system, it went from 80ms for a 2160p frame
down to 20ms (i.e. a 4X speedup). Good enough for now, I hope...
Comments from Marton:
Originally on my system better performance could be achieved by simply
switching to the cached bitstream reader, but for Devin it was slower than
his direct byte operations.
I changed the order of writing output from u/y/v/y to u/v/y/y, and that made
the code faster than the cached bitstream reader on my system as well.
TIMER measurement of the decode loop on Ryzen 5 3600 with command line:
./ffmpeg -stream_loop 256 -threads 1 -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le -f null none -loglevel error
Before: 823204127 decicycles in YUV, 256 runs, 0 skips
After: 315070524 decicycles in YUV, 256 runs, 0 skips
Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
---
libavcodec/bitpacked_dec.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/libavcodec/bitpacked_dec.c b/libavcodec/bitpacked_dec.c
index c88f861993..54c008bd86 100644
--- a/libavcodec/bitpacked_dec.c
+++ b/libavcodec/bitpacked_dec.c
@@ -28,7 +28,6 @@
#include "avcodec.h"
#include "codec_internal.h"
-#include "get_bits.h"
#include "libavutil/imgutils.h"
#include "thread.h"
@@ -65,7 +64,7 @@ static int bitpacked_decode_yuv422p10(AVCodecContext *avctx, AVFrame *frame,
{
uint64_t frame_size = (uint64_t)avctx->width * (uint64_t)avctx->height * 20;
uint64_t packet_size = (uint64_t)avpkt->size * 8;
- GetBitContext bc;
+ uint8_t *src;
uint16_t *y, *u, *v;
int ret, i, j;
@@ -79,20 +78,18 @@ static int bitpacked_decode_yuv422p10(AVCodecContext *avctx, AVFrame *frame,
if (avctx->width % 2)
return AVERROR_PATCHWELCOME;
- ret = init_get_bits(&bc, avpkt->data, avctx->width * avctx->height * 20);
- if (ret)
- return ret;
-
+ src = avpkt->data;
for (i = 0; i < avctx->height; i++) {
y = (uint16_t*)(frame->data[0] + i * frame->linesize[0]);
u = (uint16_t*)(frame->data[1] + i * frame->linesize[1]);
v = (uint16_t*)(frame->data[2] + i * frame->linesize[2]);
for (j = 0; j < avctx->width; j += 2) {
- *u++ = get_bits(&bc, 10);
- *y++ = get_bits(&bc, 10);
- *v++ = get_bits(&bc, 10);
- *y++ = get_bits(&bc, 10);
+ *u++ = (src[0] << 2) | (src[1] >> 6);
+ *v++ = ((src[2] << 6) | (src[3] >> 2)) & 0x3ff;
+ *y++ = ((src[1] << 4) | (src[2] >> 4)) & 0x3ff;
+ *y++ = ((src[3] << 8) | (src[4])) & 0x3ff;
+ src += 5;
}
}
--
2.35.3
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2023-12-13 19:58 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-05 21:54 [FFmpeg-devel] [RFC/PATCH] bitpacked_dec: Optimization for bitpacked_dec decoder performance Devin Heitmueller
2023-05-06 11:32 ` Lance Wang
2023-05-06 11:49 ` Devin Heitmueller
2023-05-06 11:52 ` Paul B Mahol
2023-05-06 12:13 ` Devin Heitmueller
2023-05-06 12:16 ` James Almer
2023-05-06 12:40 ` Devin Heitmueller
2023-05-10 11:16 ` Lance Wang
2023-05-11 22:20 ` Marton Balint
2023-05-12 15:26 ` Devin Heitmueller
2023-12-13 19:58 ` Marton Balint [this message]
2023-12-28 20:42 ` [FFmpeg-devel] [PATCH] avcodec/bitpacked_dec: optimize bitpacked_decode_yuv422p10 Marton Balint
2023-06-12 16:05 ` [FFmpeg-devel] [RFC/PATCH] bitpacked_dec: Optimization for bitpacked_dec decoder performance Paul B Mahol
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231213195820.21046-1-cus@passwd.hu \
--to=cus@passwd.hu \
--cc=devin.heitmueller@ltnglobal.com \
--cc=dheitmueller@ltnglobal.com \
--cc=ffmpeg-devel@ffmpeg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
This inbox may be cloned and mirrored by anyone:
git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
ffmpegdev@gitmailbox.com
public-inbox-index ffmpegdev
Example config snippet for mirrors.
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git