From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 1F4E64A9E6 for ; Tue, 7 May 2024 15:15:44 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 726EB68D73C; Tue, 7 May 2024 18:15:43 +0300 (EEST) Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 7A84F68D483 for ; Tue, 7 May 2024 18:15:36 +0300 (EEST) Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1ec92e355bfso30619395ad.3 for ; Tue, 07 May 2024 08:15:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715094934; x=1715699734; darn=ffmpeg.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=J1RB/B8o49Mkr8lQ0RBzlolNdKV2oKCw+L+J7CbTMT4=; b=a4v2BxSdkQ8qRuHmdXfEIyKWNY5tgvAsysH5vOr+wycXu4UW+BdZzJ8ttRytjEhEqz XvrWLE09CIds/jnWVJSDwP2bmIbH0/1LMRRsWTT+gMbJehE80IYAreoFDK0AzvY2n//G 6cwMXMegV6TY2LtNhkZo73gXQVmPiHQcitIdg7xPsjnN5pdjZMAL9sBomTn3xNMAzbyp ymVBVE2DDtTezavh3LFsvCXnFvWi/6HswHszHEybliYZMfCUKbZTJOqf0pxpYaVzQsr8 inSTB7wOKfkL8aN4LSSS93JPYDO2tyreDRJvn9SZmo4uacrF0vAz1Iw99pnUBj14OG3W UyIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715094934; x=1715699734; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=J1RB/B8o49Mkr8lQ0RBzlolNdKV2oKCw+L+J7CbTMT4=; b=kBzVklXnOrdb/AZ5e9sRNjzVvR461VmGdnodg74lKNLY38zVe4Jbw1f2knU7VNlUUU rBTZFK03kT45dVd3ZN2l3XT5r+sGDLi3ljQGRQnvnwdIxPAcQWkR6Zpnd3vxVtBJRU8E NPXu7NGqA/hT9X9YNhCQYEctcxmL/qBhhqG2v8crSOYGjeJ+tvrKj6lesgcEIQ+FgTSY GvrZXu17ChpTwTJzdbS1im9V1/4ncwMuaxRBM2R0dbxEZ2IUVj5JaMsFONHkGk8lWGFL I6eZEOOwhaYcaAC2ApISQpNeL0+IsvYaYGXON5Yt4uvGnvRg8RwSEydNmH/2VmB3ek1O 5bhg== X-Gm-Message-State: AOJu0YzjCs5TRHPKjtlD4mZZ4dPNZwO5bq4G6lru2Z9zSF8fpDBhSSid uXrFmu42KsxhAwTCGG4P2ucBrBWu7diaobC8CLmgGOHLnmDjJ8yuHcTX2g== X-Google-Smtp-Source: AGHT+IF5Mh/NoaYlIxmqvYD/QTDw5UpPv5y2PZW6I+Td2VNJxjrrrgSoT3lOiXpincMWtSZLgSwXdQ== X-Received: by 2002:a17:902:e54d:b0:1e4:fd4:48d0 with SMTP id n13-20020a170902e54d00b001e40fd448d0mr18554089plf.62.1715094933661; Tue, 07 May 2024 08:15:33 -0700 (PDT) Received: from [192.168.0.10] ([190.194.167.233]) by smtp.gmail.com with ESMTPSA id f2-20020a170902ce8200b001ecf865a019sm10146735plg.224.2024.05.07.08.15.32 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 07 May 2024 08:15:33 -0700 (PDT) Message-ID: <7c083426-dae6-40fd-8117-8ac243a59194@gmail.com> Date: Tue, 7 May 2024 12:15:35 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: ffmpeg-devel@ffmpeg.org References: <20240507002723.1603-1-jamrial@gmail.com> <20240507150205.2039-1-jamrial@gmail.com> <20240507150205.2039-2-jamrial@gmail.com> Content-Language: en-US From: James Almer In-Reply-To: Subject: Re: [FFmpeg-devel] [PATCH 3/3] x86/blockdsp: add sse2 and avx2 versions of fill_block_tab X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 5/7/2024 12:10 PM, Andreas Rheinhardt wrote: > James Almer: >> Signed-off-by: James Almer >> --- >> libavcodec/x86/blockdsp.asm | 33 +++++++++++++++++++++++++++++++++ >> libavcodec/x86/blockdsp_init.c | 13 +++++++++++++ >> 2 files changed, 46 insertions(+) >> >> diff --git a/libavcodec/x86/blockdsp.asm b/libavcodec/x86/blockdsp.asm >> index e380308d4a..cccc9a801a 100644 >> --- a/libavcodec/x86/blockdsp.asm >> +++ b/libavcodec/x86/blockdsp.asm >> @@ -80,3 +80,36 @@ INIT_XMM sse >> CLEAR_BLOCKS 1 >> INIT_YMM avx >> CLEAR_BLOCKS 1 >> + >> +;----------------------------------------- >> +; void ff_fill_block_tab_%1(uint8_t *block, uint8_t value, >> +; ptrdiff_t line_size, int h); >> +;----------------------------------------- >> +%macro FILL_BLOCK_TAB 2 >> +cglobal fill_block_tab_%1, 4, 5, 1, block, value, stride, h, stride3 >> + lea stride3q, [strideq + strideq * 2] >> +%if cpuflag(avx2) >> + movd m0, valued >> + vpbroadcastb m0, m0 >> +%else >> + SPLATB_REG m0, value, x >> +%endif >> +.loop: >> + mov%2 [blockq], m0 >> + mov%2 [blockq + strideq], m0 >> + mov%2 [blockq + strideq * 2], m0 >> + mov%2 [blockq + stride3q], m0 >> + lea blockq, [blockq + strideq * 4] >> + sub hd, 4 >> + jg .loop >> + RET >> +%endmacro >> + >> +INIT_XMM sse2 >> +FILL_BLOCK_TAB 8, q >> +FILL_BLOCK_TAB 16, a >> +%if HAVE_AVX2_EXTERNAL >> +INIT_XMM avx2 >> +FILL_BLOCK_TAB 8, q >> +FILL_BLOCK_TAB 16, a >> +%endif >> diff --git a/libavcodec/x86/blockdsp_init.c b/libavcodec/x86/blockdsp_init.c >> index 996124114f..37f3bb6a84 100644 >> --- a/libavcodec/x86/blockdsp_init.c >> +++ b/libavcodec/x86/blockdsp_init.c >> @@ -29,6 +29,11 @@ void ff_clear_block_avx(int16_t *block); >> void ff_clear_blocks_sse(int16_t *blocks); >> void ff_clear_blocks_avx(int16_t *blocks); >> >> +void ff_fill_block_tab_16_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); >> +void ff_fill_block_tab_8_sse2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); >> +void ff_fill_block_tab_16_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); >> +void ff_fill_block_tab_8_avx2(uint8_t *block, uint8_t value, ptrdiff_t line_size, int h); >> + >> av_cold void ff_blockdsp_init_x86(BlockDSPContext *c) >> { >> #if HAVE_X86ASM >> @@ -38,9 +43,17 @@ av_cold void ff_blockdsp_init_x86(BlockDSPContext *c) >> c->clear_block = ff_clear_block_sse; >> c->clear_blocks = ff_clear_blocks_sse; >> } >> + if (EXTERNAL_SSE2(cpu_flags)) { >> + c->fill_block_tab[0] = ff_fill_block_tab_16_sse2; >> + c->fill_block_tab[1] = ff_fill_block_tab_8_sse2; >> + } >> if (EXTERNAL_AVX_FAST(cpu_flags)) { >> c->clear_block = ff_clear_block_avx; >> c->clear_blocks = ff_clear_blocks_avx; >> } >> + if (EXTERNAL_AVX2(cpu_flags)) { >> + c->fill_block_tab[0] = ff_fill_block_tab_16_avx2; >> + c->fill_block_tab[1] = ff_fill_block_tab_8_avx2; >> + } >> #endif /* HAVE_X86ASM */ >> } > > Benchmarks? blockdsp.fill_block_tab[0]_c: 13.4 blockdsp.fill_block_tab[0]_sse2: 8.4 blockdsp.fill_block_tab[0]_avx2: 7.4 blockdsp.fill_block_tab[1]_c: 7.9 blockdsp.fill_block_tab[1]_sse2: 4.5 blockdsp.fill_block_tab[1]_avx2: 4.0 On an Alder Lake using --cpu-alderlake. If i let gcc compile without cpu specific optimizations, the C version is even slower. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".