From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTPS id A15C44DE73 for ; Sat, 7 Feb 2026 00:36:34 +0000 (UTC) Authentication-Results: ffbox; dkim=fail (body hash mismatch (got b'jJGQifDnAIgrxyyceCwflLXwViMVW/Gqm/3bNev6p0M=', expected b'87Z/Hlg3lzhLUbPAcVhWRex2Cxb3ZhvjT6jWmgM/7EM=')) header.d=ffmpeg.org header.i=@ffmpeg.org header.a=rsa-sha256 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1770424569; h=mime-version : to : date : message-id : reply-to : subject : list-id : list-archive : list-archive : list-help : list-owner : list-post : list-subscribe : list-unsubscribe : from : cc : content-type : content-transfer-encoding : from; bh=jJGQifDnAIgrxyyceCwflLXwViMVW/Gqm/3bNev6p0M=; b=hPnNMFhhhXQS4GZ9F7eUaDd2QiW7JXt8B26Ryzo10y9uyEkdZueD9QR4ZsvNsNdI1diIk b5xiCia6lPdQcsSQqiMENuklFSm8IvW4QQU2VoWhl1/sZMYidXRQgaDIy6nvof07hTdwO+5 0wQrW8BrEyMCv+0jjErXyzqrb/49svyQtvqJRcrUPvUXLh2LS3/qDM48GrYYJT/z1aQ8y1f dxFbWFaj7JkP6TOK9n3yz1F4b6Ka+GqxVF4kGveNEjQ1EvaTj+Dmb1qpGqwgBukCcCknaYO 8AQRv7gYNQNqkd6Opi/KfQT2z/MS7fZNoA9+dQqCFzTYkuPDt3A6g8OuqSQA== Received: from [172.20.0.4] (unknown [172.20.0.4]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 9F256691119; Sat, 7 Feb 2026 02:36:09 +0200 (EET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=ffmpeg.org; s=arc; t=1770424549; b=nQxmqupiWpT287q2LydIjJnDOwtL3Bw9uIeU7piF1uAPzOvqkbAu7UktVp7IZCENUxNGv iFDQN64HrzfrXyVZDpyRTqIS6+5uCTcDIqsDfAk/9Z74Zz4r5dkTSDbaQnzicN7KrceTKlc FiF8OLR0SL2i3UsvWQXU/DMMCdGyMHjs34PKiSSppsmZlriczsMNbifSGEEKXLKmvbQ/LUP seUOjw8iDGjhHScrKLcPjEGi4Yvf+vXNlJkSlqHiNWb0rPMiP6Kz8XYBLrbGN5iR91/NCc8 stYxgWQpkvq0IA3PJgXh6qNr415ZKRrIyHW/PYVx3gDTaCAUrFK8gEyGeXKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ffmpeg.org; s=arc; t=1770424549; h=from : sender : reply-to : subject : date : message-id : to : cc : mime-version : content-type : content-transfer-encoding : content-id : content-description : resent-date : resent-from : resent-sender : resent-to : resent-cc : resent-message-id : in-reply-to : references : list-id : list-help : list-unsubscribe : list-subscribe : list-post : list-owner : list-archive; bh=XY/U8oLOEdW8dcKUprS49wCZjMf4ZQ2XmRyoMCnDrC4=; b=BDrISy4LxuNV36vxUdMTwUSd/VLNcF4fTMTAqg34bazM8U80gCU4+uxKkQkjmnEtqpzhN 6pEu4Hi9kvSw6v9XSWqSe6vlMg907/5AKfaBhPOS1fHELxlhJMRNLQrZ80TvBEo0Cy7dR5M 5Nr1gpdwgVUYf1xxvxBtby7G63AMM8ORd7IeKdozE+34OSgudV6AXlTn0fph+daL32NkxrT 1Lj+UShGv1RA43aiDJjhXj/NpmgQ/o8O7kVyx3zRFyxoJdH41Am3p55HUWffxa4/UkDOdID nhXUBVyFnzqnhogKmDkiLgx6u7qT1iWIs0U4INNLYXX8f5r/9413RJpO45YQ== ARC-Authentication-Results: i=1; ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none; dmarc=pass header.from=ffmpeg.org policy.dmarc=quarantine Authentication-Results: ffmpeg.org; dkim=pass header.d=ffmpeg.org header.i=@ffmpeg.org; arc=none (Message is not ARC signed); dmarc=pass (Used From Domain Record) header.from=ffmpeg.org policy.dmarc=quarantine DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ffmpeg.org; i=@ffmpeg.org; q=dns/txt; s=mail; t=1770424539; h=content-type : mime-version : content-transfer-encoding : from : to : reply-to : subject : date : from; bh=87Z/Hlg3lzhLUbPAcVhWRex2Cxb3ZhvjT6jWmgM/7EM=; b=E3ciywKTc1Bd4luZuyAKhMQlNTgY5K6nAP2Rgf//C63On/f/SyEw3NxeJol2QDPko3kAS jHoMXETYsMj+EVr7ozV/qtZCybO1nhPnW+jpJH9A/gItmTyvJHU4EMO/J3oAjiw/pNI+b2k WGbBgcCx5REZR6+lwnywszcQPEkKdxCO0CI3xNipL9+f5JyPkuviHW2ChO/A/k3KB/eWusC pp0bTMks+PVFoo9HuGjAnekszo6/KFOsuOCPdVZrPmpbqL6id3mSJYQkGicIgmnpZMY7W8R zzT0PyK06/rpHImjWQq+Ti3kDL4my8J08b1hi8EurIqzPK61m28L7bWGZieg== Received: from c8d966988b92 (code.ffmpeg.org [188.245.149.3]) by ffbox0-bg.ffmpeg.org (Postfix) with ESMTPS id A85E06910A5 for ; Sat, 7 Feb 2026 02:35:39 +0200 (EET) MIME-Version: 1.0 To: ffmpeg-devel@ffmpeg.org Date: Sat, 07 Feb 2026 00:35:39 -0000 Message-ID: <177042453992.25.12408312386808369972@4457048688e7> Message-ID-Hash: UHV7J6XWYV3EAVM7D2YXTNAN2S63ORN5 X-Message-ID-Hash: UHV7J6XWYV3EAVM7D2YXTNAN2S63ORN5 X-MailFrom: code@ffmpeg.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-ffmpeg-devel.ffmpeg.org-0; header-match-ffmpeg-devel.ffmpeg.org-1; header-match-ffmpeg-devel.ffmpeg.org-2; header-match-ffmpeg-devel.ffmpeg.org-3; emergency; member-moderation X-Mailman-Version: 3.3.10 Precedence: list Reply-To: FFmpeg development discussions and patches Subject: [FFmpeg-devel] [PR] avcodec/x86/mpegvideoencdsp_init: Port draw_edges to SSSE3 (PR #21670) List-Id: FFmpeg development discussions and patches Archived-At: Archived-At: List-Archive: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: mkver via ffmpeg-devel Cc: mkver Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Archived-At: List-Archive: List-Post: PR #21670 opened by mkver URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21670 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21670.patch >>From 1e2b709bf3b8ca96364d2a8724b2946969ad2728 Mon Sep 17 00:00:00 2001 From: Andreas Rheinhardt Date: Sat, 7 Feb 2026 00:38:47 +0100 Subject: [PATCH] avcodec/x86/mpegvideoencdsp_init: Port draw_edges to SSSE3 Benchmarks: draw_edges_8_1724_4_c: 2672.2 ( 1.00x) draw_edges_8_1724_4_mmx: 3191.5 ( 0.84x) draw_edges_8_1724_4_ssse3: 2179.6 ( 1.23x) draw_edges_8_1724_8_c: 2852.3 ( 1.00x) draw_edges_8_1724_8_mmx: 3683.0 ( 0.77x) draw_edges_8_1724_8_ssse3: 2225.7 ( 1.28x) draw_edges_8_1724_16_c: 4169.4 ( 1.00x) draw_edges_8_1724_16_mmx: 4665.9 ( 0.89x) draw_edges_8_1724_16_ssse3: 2765.8 ( 1.51x) draw_edges_128_407_4_c: 1126.6 ( 1.00x) draw_edges_128_407_4_mmx: 943.9 ( 1.19x) draw_edges_128_407_4_ssse3: 925.7 ( 1.22x) draw_edges_128_407_8_c: 1208.8 ( 1.00x) draw_edges_128_407_8_mmx: 1119.1 ( 1.08x) draw_edges_128_407_8_ssse3: 997.8 ( 1.21x) draw_edges_128_407_16_c: 1352.4 ( 1.00x) draw_edges_128_407_16_mmx: 1368.7 ( 0.99x) draw_edges_128_407_16_ssse3: 1148.3 ( 1.18x) draw_edges_1080_31_4_c: 228.5 ( 1.00x) draw_edges_1080_31_4_mmx: 240.8 ( 0.95x) draw_edges_1080_31_4_ssse3: 226.7 ( 1.01x) draw_edges_1080_31_8_c: 411.1 ( 1.00x) draw_edges_1080_31_8_mmx: 432.9 ( 0.95x) draw_edges_1080_31_8_ssse3: 403.2 ( 1.02x) draw_edges_1080_31_16_c: 1121.2 ( 1.00x) draw_edges_1080_31_16_mmx: 1124.9 ( 1.00x) draw_edges_1080_31_16_ssse3: 1125.4 ( 1.00x) draw_edges_1920_4_4_c: 310.8 ( 1.00x) draw_edges_1920_4_4_mmx: 311.6 ( 1.00x) draw_edges_1920_4_4_ssse3: 311.6 ( 1.00x) draw_edges_1920_4_4_negstride_c: 307.0 ( 1.00x) draw_edges_1920_4_4_negstride_mmx: 306.7 ( 1.00x) draw_edges_1920_4_4_negstride_ssse3: 306.7 ( 1.00x) draw_edges_1920_4_8_c: 724.2 ( 1.00x) draw_edges_1920_4_8_mmx: 724.9 ( 1.00x) draw_edges_1920_4_8_ssse3: 717.3 ( 1.01x) draw_edges_1920_4_8_negstride_c: 719.2 ( 1.00x) draw_edges_1920_4_8_negstride_mmx: 717.1 ( 1.00x) draw_edges_1920_4_8_negstride_ssse3: 710.9 ( 1.01x) draw_edges_1920_4_16_c: 1752.9 ( 1.00x) draw_edges_1920_4_16_mmx: 1754.6 ( 1.00x) draw_edges_1920_4_16_ssse3: 1751.1 ( 1.00x) draw_edges_1920_4_16_negstride_c: 1783.2 ( 1.00x) draw_edges_1920_4_16_negstride_mmx: 1778.2 ( 1.00x) draw_edges_1920_4_16_negstride_ssse3: 1768.3 ( 1.01x) Signed-off-by: Andreas Rheinhardt --- libavcodec/mpegvideo_enc.c | 3 - libavcodec/x86/mpegvideoencdsp_init.c | 131 +++++++++++--------------- tests/checkasm/mpegvideoencdsp.c | 4 +- 3 files changed, 56 insertions(+), 82 deletions(-) diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c index a4f78c25db..46c8863a14 100644 --- a/libavcodec/mpegvideo_enc.c +++ b/libavcodec/mpegvideo_enc.c @@ -1419,7 +1419,6 @@ static int load_input_picture(MPVMainEncContext *const m, const AVFrame *pic_arg EDGE_BOTTOM); } } - emms_c(); } pic->display_picture_number = display_picture_number; @@ -1886,8 +1885,6 @@ static void frame_end(MPVMainEncContext *const m) EDGE_TOP | EDGE_BOTTOM); } - emms_c(); - m->last_pict_type = s->c.pict_type; m->last_lambda_for[s->c.pict_type] = s->c.cur_pic.ptr->f->quality; if (s->c.pict_type != AV_PICTURE_TYPE_B) diff --git a/libavcodec/x86/mpegvideoencdsp_init.c b/libavcodec/x86/mpegvideoencdsp_init.c index 220c75785a..eba0997dec 100644 --- a/libavcodec/x86/mpegvideoencdsp_init.c +++ b/libavcodec/x86/mpegvideoencdsp_init.c @@ -34,7 +34,6 @@ int ff_pix_sum16_xop(const uint8_t *pix, ptrdiff_t line_size); int ff_pix_norm1_sse2(const uint8_t *pix, ptrdiff_t line_size); void ff_add_8x8basis_ssse3(int16_t rem[64], const int16_t basis[64], int scale); -#if HAVE_INLINE_ASM #if HAVE_SSSE3_INLINE #define SCALE_OFFSET -1 @@ -84,77 +83,62 @@ static int try_8x8basis_ssse3(const int16_t rem[64], const int16_t weight[64], c ); return i; } -#endif /* HAVE_SSSE3_INLINE */ /* Draw the edges of width 'w' of an image of size width, height */ -static void draw_edges_mmx(uint8_t *buf, ptrdiff_t wrap, int width, int height, - int w, int h, int sides) +static void draw_edges_ssse3(uint8_t *buf, ptrdiff_t wrap, int width, int height, + int w, int h, int sides) { - uint8_t *ptr, *last_line; + uint8_t *ptr = buf, *last_line; int i; + av_assert1(w == 16 || w == 8 || w == 4); + /* left and right */ - ptr = buf; - if (w == 8) { - __asm__ volatile ( - "1: \n\t" - "movd (%0), %%mm0 \n\t" - "punpcklbw %%mm0, %%mm0 \n\t" - "punpcklwd %%mm0, %%mm0 \n\t" - "punpckldq %%mm0, %%mm0 \n\t" - "movq %%mm0, -8(%0) \n\t" - "movq -8(%0, %2), %%mm1 \n\t" - "punpckhbw %%mm1, %%mm1 \n\t" - "punpckhwd %%mm1, %%mm1 \n\t" - "punpckhdq %%mm1, %%mm1 \n\t" - "movq %%mm1, (%0, %2) \n\t" - "add %1, %0 \n\t" - "cmp %3, %0 \n\t" - "jnz 1b \n\t" - : "+r" (ptr) - : "r" ((x86_reg) wrap), "r" ((x86_reg) width), - "r" (ptr + wrap * height)); - } else if (w == 16) { - __asm__ volatile ( - "1: \n\t" - "movd (%0), %%mm0 \n\t" - "punpcklbw %%mm0, %%mm0 \n\t" - "punpcklwd %%mm0, %%mm0 \n\t" - "punpckldq %%mm0, %%mm0 \n\t" - "movq %%mm0, -8(%0) \n\t" - "movq %%mm0, -16(%0) \n\t" - "movq -8(%0, %2), %%mm1 \n\t" - "punpckhbw %%mm1, %%mm1 \n\t" - "punpckhwd %%mm1, %%mm1 \n\t" - "punpckhdq %%mm1, %%mm1 \n\t" - "movq %%mm1, (%0, %2) \n\t" - "movq %%mm1, 8(%0, %2) \n\t" - "add %1, %0 \n\t" - "cmp %3, %0 \n\t" - "jnz 1b \n\t" - : "+r"(ptr) - : "r"((x86_reg)wrap), "r"((x86_reg)width), "r"(ptr + wrap * height) - ); - } else { - av_assert1(w == 4); - __asm__ volatile ( - "1: \n\t" - "movd (%0), %%mm0 \n\t" - "punpcklbw %%mm0, %%mm0 \n\t" - "punpcklwd %%mm0, %%mm0 \n\t" - "movd %%mm0, -4(%0) \n\t" - "movd -4(%0, %2), %%mm1 \n\t" - "punpcklbw %%mm1, %%mm1 \n\t" - "punpckhwd %%mm1, %%mm1 \n\t" - "punpckhdq %%mm1, %%mm1 \n\t" - "movd %%mm1, (%0, %2) \n\t" - "add %1, %0 \n\t" - "cmp %3, %0 \n\t" - "jnz 1b \n\t" - : "+r" (ptr) - : "r" ((x86_reg) wrap), "r" ((x86_reg) width), - "r" (ptr + wrap * height)); - } + __asm__ volatile ( + "pcmpeqw %%xmm3, %%xmm3 \n\t" + "pxor %%xmm2, %%xmm2 \n\t" + "psrlw $14, %%xmm3 \n\t" // pw_3 + "pshufb %%xmm2, %%xmm3 \n\t" // pb_3 + "cmp $8, %4 \n\t" + "jg 16f \n\t" + "jl 4f \n\t" + "8: \n\t" + "movd (%0), %%xmm0 \n\t" + "movd -4(%0, %2), %%xmm1 \n\t" + "pshufb %%xmm2, %%xmm0 \n\t" + "pshufb %%xmm3, %%xmm1 \n\t" + "movq %%xmm0, -8(%0) \n\t" + "movq %%xmm1, (%0, %2) \n\t" + "add %1, %0 \n\t" + "cmp %3, %0 \n\t" + "jnz 8b \n\t" + "jmp 1f \n\t" + "4: \n\t" + "movd (%0), %%xmm0 \n\t" + "movd -4(%0, %2), %%xmm1 \n\t" + "pshufb %%xmm2, %%xmm0 \n\t" + "pshufb %%xmm3, %%xmm1 \n\t" + "movd %%xmm0, -4(%0) \n\t" + "movd %%xmm1, (%0, %2) \n\t" + "add %1, %0 \n\t" + "cmp %3, %0 \n\t" + "jnz 4b \n\t" + "jmp 1f \n\t" + "16: \n\t" + "movd (%0), %%xmm0 \n\t" + "movd -4(%0, %2), %%xmm1 \n\t" + "pshufb %%xmm2, %%xmm0 \n\t" + "pshufb %%xmm3, %%xmm1 \n\t" + "movdqu %%xmm0, -16(%0) \n\t" + "movdqu %%xmm1, (%0, %2) \n\t" + "add %1, %0 \n\t" + "cmp %3, %0 \n\t" + "jnz 16b \n\t" + "1: \n\t" + : "+r" (ptr) + : "r" ((x86_reg) wrap), "r" ((x86_reg) width), "r"(ptr + wrap * height), "g" (w) + XMM_CLOBBERS_ONLY("%xmm0", "%xmm1", "%xmm2", "%xmm3") + ); /* top and bottom + corners */ buf -= w; @@ -168,8 +152,7 @@ static void draw_edges_mmx(uint8_t *buf, ptrdiff_t wrap, int width, int height, // bottom memcpy(last_line + (i + 1) * wrap, last_line, width + w + w); } - -#endif /* HAVE_INLINE_ASM */ +#endif /* HAVE_SSSE3_INLINE */ av_cold void ff_mpegvideoencdsp_init_x86(MpegvideoEncDSPContext *c, AVCodecContext *avctx) @@ -186,20 +169,14 @@ av_cold void ff_mpegvideoencdsp_init_x86(MpegvideoEncDSPContext *c, c->pix_sum = ff_pix_sum16_xop; } -#if HAVE_INLINE_ASM - - if (INLINE_MMX(cpu_flags)) { - if (avctx->bits_per_raw_sample <= 8) { - c->draw_edges = draw_edges_mmx; - } - } -#endif /* HAVE_INLINE_ASM */ - if (X86_SSSE3(cpu_flags)) { #if HAVE_SSSE3_INLINE if (!(avctx->flags & AV_CODEC_FLAG_BITEXACT)) { c->try_8x8basis = try_8x8basis_ssse3; } + if (avctx->bits_per_raw_sample <= 8) { + c->draw_edges = draw_edges_ssse3; + } #endif /* HAVE_SSSE3_INLINE */ #if HAVE_SSSE3_EXTERNAL c->add_8x8basis = ff_add_8x8basis_ssse3; diff --git a/tests/checkasm/mpegvideoencdsp.c b/tests/checkasm/mpegvideoencdsp.c index 955cd9f5b7..5fad1d4bb4 100644 --- a/tests/checkasm/mpegvideoencdsp.c +++ b/tests/checkasm/mpegvideoencdsp.c @@ -147,8 +147,8 @@ static void check_draw_edges(MpegvideoEncDSPContext *c) LOCAL_ALIGNED_16(uint8_t, buf0, [BUFSIZE]); LOCAL_ALIGNED_16(uint8_t, buf1, [BUFSIZE]); - declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *buf, ptrdiff_t wrap, int width, int height, - int w, int h, int sides); + declare_func(void, uint8_t *buf, ptrdiff_t wrap, int width, int height, + int w, int h, int sides); for (int isi = 0; isi < FF_ARRAY_ELEMS(input_sizes); isi++) { int input_size = input_sizes[isi]; -- 2.52.0 _______________________________________________ ffmpeg-devel mailing list -- ffmpeg-devel@ffmpeg.org To unsubscribe send an email to ffmpeg-devel-leave@ffmpeg.org