From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 12E0D46238 for ; Wed, 10 May 2023 11:41:55 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 8624A68BF5E; Wed, 10 May 2023 14:41:53 +0300 (EEST) Received: from w4.tutanota.de (w4.tutanota.de [81.3.6.165]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 693A368BC6E for ; Wed, 10 May 2023 14:41:46 +0300 (EEST) Received: from tutadb.w10.tutanota.de (unknown [192.168.1.10]) by w4.tutanota.de (Postfix) with ESMTP id 17841106028E for ; Wed, 10 May 2023 11:41:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1683718905; s=s1; d=lynne.ee; h=From:From:To:To:Subject:Subject:Content-Description:Content-ID:Content-Type:Content-Type:Content-Transfer-Encoding:Content-Transfer-Encoding:Cc:Date:Date:In-Reply-To:In-Reply-To:MIME-Version:MIME-Version:Message-ID:Message-ID:Reply-To:References:References:Sender; bh=XK9p1y7QH6VCRtZs5pkhAu+mXJx8ZBE/9OIJEA1PBus=; b=UNuu5g0yZspeEx03Nxq15CI0EC4UrhXu7Hx1qZmztRXhWDVBfl2si5BQk7RnesxV 2slZ3SWORwAJWhojGMNbZLmPcfMTs4BazJVInP7z+NVXQYWDbSmgWHAWq1d98r/JpU5 3wiqEuuDTCyZtIVqKCf1wJfMiIOoCZZBCJyQAoDoS2kgucB9qRnjAxTzJ8Gp1Ds4HVQ rZ9OAF0i+PaVLIzNQiWqghCls6hgjoLOy1WhkZHWWD+hybmK794vBqj1fclglF9skDq YoY0AeSk7b6QLMqQSradufqblrVrAkjSl0Z4Ud7REm0pJDVpfvR/y5T/xseFtALcaSU XFwqvJ3q3g== Date: Wed, 10 May 2023 13:41:45 +0200 (CEST) From: Lynne To: FFmpeg development discussions and patches Message-ID: In-Reply-To: References: <20230509095030.25506-1-arnie.chang@sifive.com> <5295746.1FRWMjKWz7@basile.remlab.net> MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH 0/5] RISC-V: Improve H264 decoding performance using RVV intrinsic X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: May 10, 2023, 10:47 by arnie.chang@sifive.com: > Hi Lynne > > I fully respect the policy and understand the disadvantages of intrinsic > code. > Considering the benefits of the open ISA like RISC-V, > the intrinsic code should still have a better chance of being optimized by > the compiler for hardware variants. > ISA being open or not is irrelevant. Power9 is open and yet compilers still fail at having consistent performance rather than thrashing vectors to stack. Optimizing assembly code for new ISA features is simple with the much more advanced templating system present in assemblers. Plus, we can confirm that it's a net gain rather than a compiler artifact. As advanced compilers are, we cannot even trust them to compile C code correctly. GCC still has issues and miscompiles/misvectorizes our code, so we have to disable tree vectorization. Not that it's a big issue, performance-sensitive code is all assembly for us. > At this moment, the intrinsic implementation is the only thing available. > It would take a significant amount of time to rewrite it in assembly due to > the large amount of functions. > It's precisely because there isn't a lot of code written that this ought to be done now. Rewriting intrinsics or inline assembly is a hard process after being merged, and all sorts of bugs and weird behavior appears when rewriting to assembly. You could start by just disassembling the compiled version and cleaning it up. We've had to do this in the past. > I was wondering if we could treat the intrinsic code as an initial version > for the RISC-V port with the following modification. > - Add an option --enable-rvv-intrinsic to EXPLICITLY enable the > intrinsic optimization, which is disabled by default. > Based on the given conditions, vector supports in GCC and intrinsics > dislike and limits. Disabling it by default seems a reasonable way. > > For those who want to be involved in the optimization of H.264 decoder on > RISC-V can work on the assembly and decide whether to refer to intrinsic > code. > I believe this would be a good starting point for future optimization. > Well, sort of, no. No CPU has support for RVV 1.0 at the moment. There's no reason to hurry with this at all and merge less than desirable code, disabled by default, which hasn't even been tested on actual hardware. There's hardly real hardware on the horizon either. The P670 was allegedly released last year, but even you had to test your code on an FPGA. Even then, the P670 only has 128bit ALUs, which is suboptimal as variable vector code tends to be more latency-bound. The XuanTie C908 is a better candidate that I heard is getting released sooner, and it has 256bit ALUs. I've been wanting to write RVV code for years now, but the hardware simply hasn't been there yet. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".