From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id E51474C006
	for <ffmpegdev@gitmailbox.com>; Sat, 20 Jul 2024 09:22:53 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0E29668D92A;
	Sat, 20 Jul 2024 12:22:52 +0300 (EEST)
Received: from mail0.khirnov.net (red.khirnov.net [176.97.15.12])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id A549868D7A6
 for <ffmpeg-devel@ffmpeg.org>; Sat, 20 Jul 2024 12:22:44 +0300 (EEST)
Authentication-Results: mail0.khirnov.net; dkim=pass (2048-bit key;
 unprotected) header.d=khirnov.net header.i=@khirnov.net header.a=rsa-sha256
 header.s=mail header.b=ajD6FpOj; dkim-atps=neutral
Received: from localhost (localhost [IPv6:::1])
 by mail0.khirnov.net (Postfix) with ESMTP id 5E638240DB7
 for <ffmpeg-devel@ffmpeg.org>; Sat, 20 Jul 2024 11:22:44 +0200 (CEST)
Received: from mail0.khirnov.net ([IPv6:::1])
 by localhost (mail0.khirnov.net [IPv6:::1]) (amavis, port 10024) with ESMTP
 id g0A85MGJOnfN for <ffmpeg-devel@ffmpeg.org>;
 Sat, 20 Jul 2024 11:22:43 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=khirnov.net; s=mail;
 t=1721467363; bh=iQ6T3JqnXMgezqj0t9w+4oRIyVZ86xESzVDW0CVnC+A=;
 h=Subject:From:To:In-Reply-To:References:Date:From;
 b=ajD6FpOji0KvBr8GjCTNPug08n992ZTn+NCXbwp/MdE8Rrf+MHZITCP8hBuvmloqy
 eESktnestoeReMdMAAdn0fVpZMS7vgBpJvsAVkZoCsGOspm8dPz5scOcT49YgMKRKj
 /1Un7W2TCHidyQDAKEFT1cqNo4tS1T+yDkJUJC16IfKzDygqqFh7pGmLTMfuvBQrHd
 cyp01QUOrfQdMEIgahvoHF+LhUybjoB9xwn32nxNmgOqZCEPRR9PrI6qLVEVWTaPK9
 jK7L9VKuKb+lInGBZnQNuHpHcapqVAJB0E8/IJ1FJXilfEthphF+M2bGaVWcno80uG
 oQDaDGOSjuONw==
Received: from lain.khirnov.net (lain.khirnov.net [IPv6:2001:67c:1138:4306::3])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "lain.khirnov.net", Issuer "smtp.khirnov.net SMTP CA" (verified OK))
 by mail0.khirnov.net (Postfix) with ESMTPS id 93E56240695
 for <ffmpeg-devel@ffmpeg.org>; Sat, 20 Jul 2024 11:22:43 +0200 (CEST)
Received: by lain.khirnov.net (Postfix, from userid 1000)
 id 800B31601B9; Sat, 20 Jul 2024 11:22:43 +0200 (CEST)
From: Anton Khirnov <anton@khirnov.net>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
In-Reply-To: <20240718174004.GE4991@pb2>
References: <20240716171155.31838-1-anton@khirnov.net>
 <20240716171155.31838-13-anton@khirnov.net> <20240717223238.GW4991@pb2>
 <172129080994.21847.15080640617406361149@lain.khirnov.net>
 <20240718174004.GE4991@pb2>
Mail-Followup-To: FFmpeg development discussions and patches
 <ffmpeg-devel@ffmpeg.org>
Date: Sat, 20 Jul 2024 11:22:43 +0200
Message-ID: <172146736349.21847.12937616463646199218@lain.khirnov.net>
User-Agent: alot/0.8.1
MIME-Version: 1.0
Subject: Re: [FFmpeg-devel] [PATCH 13/39] lavc/ffv1: drop redundant
 PlaneContext.quant_table
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/172146736349.21847.12937616463646199218@lain.khirnov.net/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

Quoting Michael Niedermayer (2024-07-18 19:40:04)
> On Thu, Jul 18, 2024 at 10:20:09AM +0200, Anton Khirnov wrote:
> > Quoting Michael Niedermayer (2024-07-18 00:32:38)
> > > the data for each decoder task should be together and not scattered around
> > > more than needed, reducing cache efficiency
> > > 
> > > putting all this extra code in the inner per pixel loop is not ok
> > > especially not for the sake of avoiding a memcpy of a few hundread bytes multiple levels of loops outside
> > 
> > A nice theory, but in practice this patchset makes single-threaded
> > decoding about 4% faster overall, on a 1920x1080 10bit sample. That's
> > just the ffv1 parts (up to patch 28), full set also improves frame
> > threading performance as follows:
> > threads         improvement
> > ---------------------------
> > 2                  52% (yes really)
> > 4                  16%
> > 8                  12%
> 
> I do want the speed improvements, yes.
> 
> But
> you compare frame threading when slice threading performed
> much better than frame threading prior to the patch

If that were true in general, there'd be no reason for frame threading
support in ffv1, as it has a higher latency and uses more memory; higher
performance is its only advantage.

However you added frame threading in
a0c0900e470fde0d6db360e555620476c2323895 claiming it is faster, which I
can partially confirm even with current master - slice threading
saturates at thread count = slice count, while frame threading scales
beyond it. Frame threading also improves significantly after this set:

threads | slice    | frame/before | frame/after
-----------------------------------------------
2         22.6124    43.738         22.0354
4         14.3367    15.115         13.1964
6         14.3850    11.974         10.9745
8         14.3472    9.7229         8.76617
10        14.3579    8.4638         8.6499
12        14.3665    8.4636         8.5735
16        14.2960    7.6926         7.1696
-----------------------------------------------
(values are total decode time in seconds)

Note that after this set frame threading is ALWAYS faster than slice
threading, for any thread count.

> also id like to see the individual changes which look like they should
> make teh code slower, to be tested individually. If they make the code slower
> they should be dropped

I don't think it's meaningful to individually benchmark the patches
moving per-slice data into the new per-slice context. I split them to
simplify testing and review, but it only makes sense to apply all of
them or none, otherwise the code gets more complex.

-- 
Anton Khirnov
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".