From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id E51474C006 for ; Sat, 20 Jul 2024 09:22:53 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0E29668D92A; Sat, 20 Jul 2024 12:22:52 +0300 (EEST) Received: from mail0.khirnov.net (red.khirnov.net [176.97.15.12]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A549868D7A6 for ; Sat, 20 Jul 2024 12:22:44 +0300 (EEST) Authentication-Results: mail0.khirnov.net; dkim=pass (2048-bit key; unprotected) header.d=khirnov.net header.i=@khirnov.net header.a=rsa-sha256 header.s=mail header.b=ajD6FpOj; dkim-atps=neutral Received: from localhost (localhost [IPv6:::1]) by mail0.khirnov.net (Postfix) with ESMTP id 5E638240DB7 for ; Sat, 20 Jul 2024 11:22:44 +0200 (CEST) Received: from mail0.khirnov.net ([IPv6:::1]) by localhost (mail0.khirnov.net [IPv6:::1]) (amavis, port 10024) with ESMTP id g0A85MGJOnfN for ; Sat, 20 Jul 2024 11:22:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=khirnov.net; s=mail; t=1721467363; bh=iQ6T3JqnXMgezqj0t9w+4oRIyVZ86xESzVDW0CVnC+A=; h=Subject:From:To:In-Reply-To:References:Date:From; b=ajD6FpOji0KvBr8GjCTNPug08n992ZTn+NCXbwp/MdE8Rrf+MHZITCP8hBuvmloqy eESktnestoeReMdMAAdn0fVpZMS7vgBpJvsAVkZoCsGOspm8dPz5scOcT49YgMKRKj /1Un7W2TCHidyQDAKEFT1cqNo4tS1T+yDkJUJC16IfKzDygqqFh7pGmLTMfuvBQrHd cyp01QUOrfQdMEIgahvoHF+LhUybjoB9xwn32nxNmgOqZCEPRR9PrI6qLVEVWTaPK9 jK7L9VKuKb+lInGBZnQNuHpHcapqVAJB0E8/IJ1FJXilfEthphF+M2bGaVWcno80uG oQDaDGOSjuONw== Received: from lain.khirnov.net (lain.khirnov.net [IPv6:2001:67c:1138:4306::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "lain.khirnov.net", Issuer "smtp.khirnov.net SMTP CA" (verified OK)) by mail0.khirnov.net (Postfix) with ESMTPS id 93E56240695 for ; Sat, 20 Jul 2024 11:22:43 +0200 (CEST) Received: by lain.khirnov.net (Postfix, from userid 1000) id 800B31601B9; Sat, 20 Jul 2024 11:22:43 +0200 (CEST) From: Anton Khirnov To: FFmpeg development discussions and patches In-Reply-To: <20240718174004.GE4991@pb2> References: <20240716171155.31838-1-anton@khirnov.net> <20240716171155.31838-13-anton@khirnov.net> <20240717223238.GW4991@pb2> <172129080994.21847.15080640617406361149@lain.khirnov.net> <20240718174004.GE4991@pb2> Mail-Followup-To: FFmpeg development discussions and patches Date: Sat, 20 Jul 2024 11:22:43 +0200 Message-ID: <172146736349.21847.12937616463646199218@lain.khirnov.net> User-Agent: alot/0.8.1 MIME-Version: 1.0 Subject: Re: [FFmpeg-devel] [PATCH 13/39] lavc/ffv1: drop redundant PlaneContext.quant_table X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: Quoting Michael Niedermayer (2024-07-18 19:40:04) > On Thu, Jul 18, 2024 at 10:20:09AM +0200, Anton Khirnov wrote: > > Quoting Michael Niedermayer (2024-07-18 00:32:38) > > > the data for each decoder task should be together and not scattered around > > > more than needed, reducing cache efficiency > > > > > > putting all this extra code in the inner per pixel loop is not ok > > > especially not for the sake of avoiding a memcpy of a few hundread bytes multiple levels of loops outside > > > > A nice theory, but in practice this patchset makes single-threaded > > decoding about 4% faster overall, on a 1920x1080 10bit sample. That's > > just the ffv1 parts (up to patch 28), full set also improves frame > > threading performance as follows: > > threads improvement > > --------------------------- > > 2 52% (yes really) > > 4 16% > > 8 12% > > I do want the speed improvements, yes. > > But > you compare frame threading when slice threading performed > much better than frame threading prior to the patch If that were true in general, there'd be no reason for frame threading support in ffv1, as it has a higher latency and uses more memory; higher performance is its only advantage. However you added frame threading in a0c0900e470fde0d6db360e555620476c2323895 claiming it is faster, which I can partially confirm even with current master - slice threading saturates at thread count = slice count, while frame threading scales beyond it. Frame threading also improves significantly after this set: threads | slice | frame/before | frame/after ----------------------------------------------- 2 22.6124 43.738 22.0354 4 14.3367 15.115 13.1964 6 14.3850 11.974 10.9745 8 14.3472 9.7229 8.76617 10 14.3579 8.4638 8.6499 12 14.3665 8.4636 8.5735 16 14.2960 7.6926 7.1696 ----------------------------------------------- (values are total decode time in seconds) Note that after this set frame threading is ALWAYS faster than slice threading, for any thread count. > also id like to see the individual changes which look like they should > make teh code slower, to be tested individually. If they make the code slower > they should be dropped I don't think it's meaningful to individually benchmark the patches moving per-slice data into the new per-slice context. I split them to simplify testing and review, but it only makes sense to apply all of them or none, otherwise the code gets more complex. -- Anton Khirnov _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".