From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id 7FD7E4C331 for ; Sat, 7 Jun 2025 10:18:47 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id B59D468BE50; Sat, 7 Jun 2025 13:18:34 +0300 (EEST) Received: from out203-205-221-242.mail.qq.com (out203-205-221-242.mail.qq.com [203.205.221.242]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id 0A2CA68BA25 for ; Sat, 7 Jun 2025 13:18:26 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1749291498; bh=EhytVCAF/MghLzGIDxYOmw+UDs/QQ/oiEFF0i5QmKfU=; h=From:To:Cc:Subject:Date; b=F2OHua3VYpTTi9bQhH+ZmnYknRj5ugQZI3Ysw/6zdIkiaVI4bjfuWJKm1LiIzsi+p jypzhMTmdl1uufT2opy0FYjw/UzDE7BCowYd89w6IDqqPcbYDHMn7nev4HDzuka02l Fyj+C0PEmQFLO84swno+Y1tXDaX/4Crju97XmbUg= Received: from black.. ([240e:3b7:3276:26d0:8e5f:26b8:da55:55d0]) by newxmesmtplogicsvrsza36-0.qq.com (NewEsmtp) with SMTP id 4913B494; Sat, 07 Jun 2025 18:18:17 +0800 X-QQ-mid: xmsmtpt1749291497t41gs8oul Message-ID: X-QQ-XMAILINFO: NPZcszwnKSXOVzK/bE/6lXuzRnR3rfw1gplVkuEnCXjzOYEDZLsu1QP0mUgZ3a bfS8Dke60zdAuMLxextpHArpqooB9c8oQCvQuvcxQ8J85RO9NRtBuS9hCeqGYCDKkiiyb3CY7TF2 g3m7n3el4XRVxgHwdwuaHTjQm+KxKEFlLPVotmxH88cshI5+Fv9Q4TrJCYzdue5RYkS80qULT7PS I4VAERtzPbtInCcP5Stm2AATF4+Px0UFxoCBqNdWb/kRYTxRb7lAsabXK3Gw+ZVklB22PPvmkXIR lonLkqDGZ0tw7JtJacHpy1kSO6s3UWFSMmfcfiPoWQHYiet3V3Pq0l3MP0d+6aCq/bgUfbe+KPu0 GepLMqJnS+R/KgL6bx9jme38KZsWAZ6bctnv7JdaaKBO8V0dgXS557SPYhLObLDCpAm/C6e8OwlX AYyXlBDhj9SccDMnQ7agswao4/XlObN0GBQZqNtAaENQiR9n+xY4wh+luURA7cf3/hkfSM+l4o0D +ixf2FZk1MWa7WyZw60SKvV3MdhqAJYFJs3Bu9b+PhDdc7hcdydN/bzAqwGJRJAzDGWdHGoxivOq 74rp/92wGYGC7pxgEnRYmYckRML2+mvpDoCJzn7re7z+ly0dPwpgfzGydC/O0OgU6TbwTiDWenDw FB/te419Kilf8lYDccYiL6y2twOCgEwzcBDl2i/BYeLFp5UMxuCcPB8yWLU3O4Kb2DjBpCG7pCbA 4sdr24nctTKZqyH9vAlZmQRyi1Qdp0pZ0CBwYgfD8T3U8iqOhrEYurrumrcbtPJFcv+S0KjbfpYr FO2rQNlv0TEbbB9wCwBhW2tnSXe4mqh60c1PTBWMDDIXuax1fLBCxhg87M8JqihQ0G2PfK17jKFo N6zeLNGIr6MKKpG3kBSX8lY7jyIPUyKxV4FCZx99DZRMW6EituXQ4B8J0v0wwHAXna2NXNXFUa4/ C8KL4U38DcFPIIoYOyiGUa+ATcoH2e/NGi6ummzvtNmQ8AjwwZSg== X-QQ-XMRINFO: Mp0Kj//9VHAxr69bL5MkOOs= From: Zhao Zhili To: ffmpeg-devel@ffmpeg.org Date: Sat, 7 Jun 2025 18:18:16 +0800 X-OQ-MSGID: <20250607101816.454311-1-quinkblack@foxmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Subject: [FFmpeg-devel] [PATCH 2/2] wasm/hevc: Add sao_edge_filter X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Cc: Zhao Zhili Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: From: Zhao Zhili hevc_sao_edge_8_8_c: 124.5 ( 1.00x) hevc_sao_edge_8_8_simd128: 18.1 ( 6.89x) hevc_sao_edge_16_8_c: 478.6 ( 1.00x) hevc_sao_edge_16_8_simd128: 48.9 ( 9.79x) hevc_sao_edge_32_8_c: 2021.1 ( 1.00x) hevc_sao_edge_32_8_simd128: 187.4 (10.79x) hevc_sao_edge_48_8_c: 4295.5 ( 1.00x) hevc_sao_edge_48_8_simd128: 397.4 (10.81x) hevc_sao_edge_64_8_c: 7245.5 ( 1.00x) hevc_sao_edge_64_8_simd128: 709.5 (10.21x) Signed-off-by: Zhao Zhili --- libavcodec/wasm/hevc/dsp_init.c | 6 ++ libavcodec/wasm/hevc/sao.c | 140 ++++++++++++++++++++++++++++++++ libavcodec/wasm/hevc/sao.h | 10 +++ 3 files changed, 156 insertions(+) diff --git a/libavcodec/wasm/hevc/dsp_init.c b/libavcodec/wasm/hevc/dsp_init.c index 76a1031ff4..8672bbc2e1 100644 --- a/libavcodec/wasm/hevc/dsp_init.c +++ b/libavcodec/wasm/hevc/dsp_init.c @@ -42,6 +42,12 @@ av_cold void ff_hevc_dsp_init_wasm(HEVCDSPContext *c, const int bit_depth) c->sao_band_filter[2] = c->sao_band_filter[3] = c->sao_band_filter[4] = ff_hevc_sao_band_filter_16x16_8_simd128; + + c->sao_edge_filter[0] = ff_hevc_sao_edge_filter_8x8_8_simd128; + c->sao_edge_filter[1] = + c->sao_edge_filter[2] = + c->sao_edge_filter[3] = + c->sao_edge_filter[4] = ff_hevc_sao_edge_filter_16x16_8_simd128; } else if (bit_depth == 10) { c->idct[0] = ff_hevc_idct_4x4_10_simd128; c->idct[1] = ff_hevc_idct_8x8_10_simd128; diff --git a/libavcodec/wasm/hevc/sao.c b/libavcodec/wasm/hevc/sao.c index 82134af7f3..a863b8e720 100644 --- a/libavcodec/wasm/hevc/sao.c +++ b/libavcodec/wasm/hevc/sao.c @@ -22,6 +22,10 @@ #include +#include "libavcodec/defs.h" + +#define HEVC_MAX_PB_SIZE 64 + void ff_hevc_sao_band_filter_8x8_8_simd128(uint8_t *dst, const uint8_t *src, ptrdiff_t stride_dst, ptrdiff_t stride_src, @@ -111,3 +115,139 @@ void ff_hevc_sao_band_filter_16x16_8_simd128(uint8_t *dst, const uint8_t *src, src += stride_src; } } + +void ff_hevc_sao_edge_filter_8x8_8_simd128(uint8_t *dst, const uint8_t *src, + ptrdiff_t stride_dst, + const int16_t *sao_offset_val, + int eo, int width, int height) +{ + static const int8_t pos[4][2][2] = { + { { -1, 0 }, { 1, 0 } }, // horizontal + { { 0, -1 }, { 0, 1 } }, // vertical + { { -1, -1 }, { 1, 1 } }, // 45 degree + { { 1, -1 }, { -1, 1 } }, // 135 degree + }; + int a_stride, b_stride; + ptrdiff_t stride_src = (2 * HEVC_MAX_PB_SIZE + AV_INPUT_BUFFER_PADDING_SIZE); + const v128_t edge_idx = wasm_u8x16_make(1, 2, 0, 3, + 4, 0, 0, 0, + 0, 0, 0, 0, + 0, 0, 0, 0); + v128_t sao_offset = wasm_v128_load(sao_offset_val); + v128_t one = wasm_i8x16_const_splat(1); + v128_t two = wasm_i8x16_const_splat(2); + + a_stride = pos[eo][0][0] + pos[eo][0][1] * stride_src; + b_stride = pos[eo][1][0] + pos[eo][1][1] * stride_src; + for (int y = height; y > 0; y -= 2) { + v128_t v0, v1, v2; + v128_t diff0, diff1; + + v0 = wasm_v128_load64_zero(src); + v1 = wasm_v128_load64_zero(src + a_stride); + v2 = wasm_v128_load64_zero(src + b_stride); + src += stride_src; + v0 = wasm_v128_load64_lane(src, v0, 1); + v1 = wasm_v128_load64_lane(src + a_stride, v1, 1); + v2 = wasm_v128_load64_lane(src + b_stride, v2, 1); + src += stride_src; + + diff0 = wasm_u8x16_gt(v0, v1); + v1 = wasm_u8x16_lt(v0, v1); + diff0 = wasm_i8x16_sub(v1, diff0); + + diff1 = wasm_u8x16_gt(v0, v2); + v2 = wasm_u8x16_lt(v0, v2); + diff1 = wasm_i8x16_sub(v2, diff1); + + v1 = wasm_i8x16_add(diff0, two); + v1 = wasm_i8x16_add(v1, diff1); + + v2 = wasm_i8x16_swizzle(edge_idx, v1); // offset_val + v1 = wasm_i8x16_shl(v2, 1); // Access int16_t + v2 = wasm_i8x16_add(v1, one); // Access upper half of int16_t + diff0 = wasm_i8x16_shuffle(v1, v2, 0, 16, 1, 17, 2, 18, 3, 19, 4, + 20, 5, 21, 6, 22, 7, 23); + diff1 = wasm_i8x16_shuffle(v1, v2, 8, 24, 9, 25, 10, 26, 11, 27, + 12, 28, 13, 29, 14, 30, 15, 31); + v1 = wasm_u16x8_extend_high_u8x16(v0); + v0 = wasm_u16x8_extend_low_u8x16(v0); + diff0 = wasm_i8x16_swizzle(sao_offset, diff0); + diff1 = wasm_i8x16_swizzle(sao_offset, diff1); + + v0 = wasm_i16x8_add_sat(v0, diff0); + v1 = wasm_i16x8_add_sat(v1, diff1); + v0 = wasm_u8x16_narrow_i16x8(v0, v1); + + wasm_v128_store64_lane(dst, v0, 0); + dst += stride_dst; + wasm_v128_store64_lane(dst, v0, 1); + dst += stride_dst; + } +} + +void ff_hevc_sao_edge_filter_16x16_8_simd128(uint8_t *dst, const uint8_t *src, + ptrdiff_t stride_dst, + const int16_t *sao_offset_val, + int eo, int width, int height) +{ + static const int8_t pos[4][2][2] = { + { { -1, 0 }, { 1, 0 } }, // horizontal + { { 0, -1 }, { 0, 1 } }, // vertical + { { -1, -1 }, { 1, 1 } }, // 45 degree + { { 1, -1 }, { -1, 1 } }, // 135 degree + }; + int a_stride, b_stride; + ptrdiff_t stride_src = (2 * HEVC_MAX_PB_SIZE + AV_INPUT_BUFFER_PADDING_SIZE); + const v128_t edge_idx = wasm_u8x16_make(1, 2, 0, 3, + 4, 0, 0, 0, + 0, 0, 0, 0, + 0, 0, 0, 0); + v128_t sao_offset = wasm_v128_load(sao_offset_val); + v128_t one = wasm_i8x16_const_splat(1); + v128_t two = wasm_i8x16_const_splat(2); + + a_stride = pos[eo][0][0] + pos[eo][0][1] * stride_src; + b_stride = pos[eo][1][0] + pos[eo][1][1] * stride_src; + for (int y = height; y > 0; y--) { + for (int x = 0; x < width; x += 16) { + v128_t v0, v1, v2; + v128_t diff0, diff1; + + v0 = wasm_v128_load(&src[x]); + v1 = wasm_v128_load(&src[x + a_stride]); + v2 = wasm_v128_load(&src[x + b_stride]); + + diff0 = wasm_u8x16_gt(v0, v1); + v1 = wasm_u8x16_lt(v0, v1); + diff0 = wasm_i8x16_sub(v1, diff0); + + diff1 = wasm_u8x16_gt(v0, v2); + v2 = wasm_u8x16_lt(v0, v2); + diff1 = wasm_i8x16_sub(v2, diff1); + + v1 = wasm_i8x16_add(diff0, two); + v1 = wasm_i8x16_add(v1, diff1); + + v2 = wasm_i8x16_swizzle(edge_idx, v1); // offset_val + v1 = wasm_i8x16_shl(v2, 1); // Access int16_t + v2 = wasm_i8x16_add(v1, one); // Access upper half of int16_t + diff0 = wasm_i8x16_shuffle(v1, v2, 0, 16, 1, 17, 2, 18, 3, 19, 4, + 20, 5, 21, 6, 22, 7, 23); + diff1 = wasm_i8x16_shuffle(v1, v2, 8, 24, 9, 25, 10, 26, 11, 27, + 12, 28, 13, 29, 14, 30, 15, 31); + v1 = wasm_u16x8_extend_high_u8x16(v0); + v0 = wasm_u16x8_extend_low_u8x16(v0); + diff0 = wasm_i8x16_swizzle(sao_offset, diff0); + diff1 = wasm_i8x16_swizzle(sao_offset, diff1); + + v0 = wasm_i16x8_add_sat(v0, diff0); + v1 = wasm_i16x8_add_sat(v1, diff1); + v0 = wasm_u8x16_narrow_i16x8(v0, v1); + wasm_v128_store(&dst[x], v0); + } + + src += stride_src; + dst += stride_dst; + } +} diff --git a/libavcodec/wasm/hevc/sao.h b/libavcodec/wasm/hevc/sao.h index 6119ec90f1..052420d7de 100644 --- a/libavcodec/wasm/hevc/sao.h +++ b/libavcodec/wasm/hevc/sao.h @@ -38,4 +38,14 @@ void ff_hevc_sao_band_filter_16x16_8_simd128(uint8_t *_dst, const uint8_t *_src, int sao_left_class, int width, int height); +void ff_hevc_sao_edge_filter_8x8_8_simd128(uint8_t *_dst, const uint8_t *_src, + ptrdiff_t stride_dst, + const int16_t *sao_offset_val, + int eo, int width, int height); + +void ff_hevc_sao_edge_filter_16x16_8_simd128(uint8_t *_dst, const uint8_t *_src, + ptrdiff_t stride_dst, + const int16_t *sao_offset_val, + int eo, int width, int height); + #endif \ No newline at end of file -- 2.43.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".