From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTP id 768BD4BF5B
	for <ffmpegdev@gitmailbox.com>; Thu, 18 Jul 2024 15:31:28 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 50D9468DA82;
	Thu, 18 Jul 2024 18:31:25 +0300 (EEST)
Received: from mail0.khirnov.net (red.khirnov.net [176.97.15.12])
 by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 5DD0968D83D
 for <ffmpeg-devel@ffmpeg.org>; Thu, 18 Jul 2024 18:31:18 +0300 (EEST)
Authentication-Results: mail0.khirnov.net; dkim=pass (2048-bit key;
 unprotected) header.d=khirnov.net header.i=@khirnov.net header.a=rsa-sha256
 header.s=mail header.b=MkV7/zF0; dkim-atps=neutral
Received: from localhost (localhost [IPv6:::1])
 by mail0.khirnov.net (Postfix) with ESMTP id AF659240DB7
 for <ffmpeg-devel@ffmpeg.org>; Thu, 18 Jul 2024 17:31:17 +0200 (CEST)
Received: from mail0.khirnov.net ([IPv6:::1])
 by localhost (mail0.khirnov.net [IPv6:::1]) (amavis, port 10024) with ESMTP
 id nhtl600_M-_Q for <ffmpeg-devel@ffmpeg.org>;
 Thu, 18 Jul 2024 17:31:16 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=khirnov.net; s=mail;
 t=1721316676; bh=r1regwOP1+207GMv4edNLe15EsHhOfUypPLsfvCuetk=;
 h=Subject:From:To:In-Reply-To:References:Date:From;
 b=MkV7/zF0+GlgDyknh6OeJmrt3eTTwusOJQ6q+CKSSUNzKAclXXfU4COI2fLfiE8dr
 qBE1NsfcgB+6CoEr2PbHFXyvg3xxAlFiWQ4M3GfT4tGUlXq6EsCuUSZcj98+zQ6AUH
 g2Q/A3auVbNCvBZgdkYW7zEwqF5jgjyWl7zI3VSNj+7+ntPHO/hSgdQfLF01Jx15G9
 OrhJ8mTTDaxBqGAtFmwRdFogfAfoPwGwkhu/IHziJKa6Fu64m0RCHKdVQAHWGkoNIn
 rGZBW0Jly+/HBuTuVt6oyS5ZX0CIVhidhkoW3HusF4P80p6+G9/dR4QALoYVuMIBBL
 /Y66GL2vFlgyw==
Received: from lain.khirnov.net (lain.khirnov.net [IPv6:2001:67c:1138:4306::3])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "lain.khirnov.net", Issuer "smtp.khirnov.net SMTP CA" (verified OK))
 by mail0.khirnov.net (Postfix) with ESMTPS id CC3BF240695
 for <ffmpeg-devel@ffmpeg.org>; Thu, 18 Jul 2024 17:31:16 +0200 (CEST)
Received: by lain.khirnov.net (Postfix, from userid 1000)
 id ACE621601B9; Thu, 18 Jul 2024 17:31:16 +0200 (CEST)
From: Anton Khirnov <anton@khirnov.net>
To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
In-Reply-To: <20240718144806.GC4991@pb2>
References: <20240716171155.31838-1-anton@khirnov.net>
 <20240716171155.31838-10-anton@khirnov.net> <20240717224205.GY4991@pb2>
 <172129373901.21847.12392144255635795802@lain.khirnov.net>
 <20240718144806.GC4991@pb2>
Mail-Followup-To: FFmpeg development discussions and patches
 <ffmpeg-devel@ffmpeg.org>
Date: Thu, 18 Jul 2024 17:31:16 +0200
Message-ID: <172131667667.21847.10057209425663694866@lain.khirnov.net>
User-Agent: alot/0.8.1
MIME-Version: 1.0
Subject: Re: [FFmpeg-devel] [PATCH 10/39] lavc/ffv1dec: move the bitreader
 to stack
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/172131667667.21847.10057209425663694866@lain.khirnov.net/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

Quoting Michael Niedermayer (2024-07-18 16:48:06)
> On Thu, Jul 18, 2024 at 11:08:59AM +0200, Anton Khirnov wrote:
> > Quoting Michael Niedermayer (2024-07-18 00:42:05)
> > > all the stuff should be put together close so its efficiently
> > > using CPU caches
> > 
> > Which is why it shares its cacheline with PutBitContext, because the
> > code benefits from having the both in the cache, right? And the 4-byte
> > hole in PutBitContext is there presumably to aerate the cache for
> > smoother data streaming.
> 
> thanks for spoting these, can you fix these ?

I have no interest in optimizing the performance of this code. My
primary goal here is to remove FFV1-specific hacks from the frame
threading code for patch 33/39, which is in turn needed for 38/39.

As a public service, I also spent some effort on making the ffv1 code
easier to understand, but if you insist on keeping the code as it is I
can also just drop its non-compliant frame threading implementation.

> > 
> > More seriously, this is not how caches work. Being close together
> > matters mainly so long as your data fits in a cacheline, beyond that
> > physical proximity matters little. On stack, the bitreader is likely to
> > share the cacheline with other data that is currently needed, thus
> > improving cache utilization.
> 
> caches are complex, and being close does matter.
> having things in seperate allocations risks hitting aliassing cases
> (that is things that cannot be in the cache at the same time)
> so when you have the bitstream, the frame buffer, the context already
> in 3 independant locations adding a few more increases the risk for hitting
> these.
> Also sequential memory access is faster than non sequential, it does
> make sense to put things together in few places than to scatter them
> 
> Its years since ive done hardcore optimization stuff but i dont think
> the principles have changed that much that random access is faster than
> sequential and that caches work fundamentally differently

I don't see how any of these arguments are relevant - I am not moving
the bitreader to a new allocation, but to stack, which is already highly
likely to be in cache.

> > 
> > Another factor that matters in efficient cache use is e.g. not having
> > multiple copies of the same constant data scattered around, which you're
> > objecting to in my other patches.
> 
> copying the actually used small data together per slice
> where its accessed per pixel should improve teh speed per pixel while
> making the per slice code a little slower. now we have 4 slices maybe
> and millions of pixels. Thats why this can give an overall gain

This all sounds like premature optimization, AKA the root of all evil.
As I said above, I intended to make this code more readable, not faster.
Yet somehow it became faster anyway, which suggests this code is not
very optimized. So then arguing whether this or that specific change
adds or removes a few cycles per frame seems like a waste time to me.

-- 
Anton Khirnov
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".