From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 4C31B496BF for ; Thu, 18 Jul 2024 14:48:17 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id EE5C868D957; Thu, 18 Jul 2024 17:48:14 +0300 (EEST) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 028D768D804 for ; Thu, 18 Jul 2024 17:48:07 +0300 (EEST) Received: by mail.gandi.net (Postfix) with ESMTPSA id 57E636000D for ; Thu, 18 Jul 2024 14:48:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=niedermayer.cc; s=gm1; t=1721314087; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BOwj2rpo5Yu0fujrB4AWOhmCC71Eo+jaKrvCElmFk8E=; b=QMEUX5VsHrnZIEBOm6g2AQr2S/acu20lLaqEXHd5tkTecAuHdWAXnXe81q3OqsfThFz2BO 2kMFHqB6BtA0X655niNHgmYyxb/7nfPV0UwSJJQudQD+ANZBbRSWgY6f+SC6jAkm2M4A66 P2PTWDgZ6FbUf7ADS833tjwco/js8Y9eeINiNnGjS5LdOE1a2xgEy3oENHzIgXDtR74yvm sMbsDHo62z60CoScOJkQTxaYu9KYHD9WI0dp1hU+lHbiB7QKc7G5ZV3Qwm2hVjAOHohEaQ +jSmR1izzOEwcn1a2dLbiCsd1YRdB3ydgSzLkw/zMXJneyXQdYXezhr+KrqTLw== Date: Thu, 18 Jul 2024 16:48:06 +0200 From: Michael Niedermayer To: FFmpeg development discussions and patches Message-ID: <20240718144806.GC4991@pb2> References: <20240716171155.31838-1-anton@khirnov.net> <20240716171155.31838-10-anton@khirnov.net> <20240717224205.GY4991@pb2> <172129373901.21847.12392144255635795802@lain.khirnov.net> MIME-Version: 1.0 In-Reply-To: <172129373901.21847.12392144255635795802@lain.khirnov.net> X-GND-Sasl: michael@niedermayer.cc Subject: Re: [FFmpeg-devel] [PATCH 10/39] lavc/ffv1dec: move the bitreader to stack X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: multipart/mixed; boundary="===============5354171201615839967==" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: --===============5354171201615839967== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="wA6bWYZHIXiG7sTu" Content-Disposition: inline --wA6bWYZHIXiG7sTu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 18, 2024 at 11:08:59AM +0200, Anton Khirnov wrote: > Quoting Michael Niedermayer (2024-07-18 00:42:05) > > all the stuff should be put together close so its efficiently > > using CPU caches >=20 > Which is why it shares its cacheline with PutBitContext, because the > code benefits from having the both in the cache, right? And the 4-byte > hole in PutBitContext is there presumably to aerate the cache for > smoother data streaming. thanks for spoting these, can you fix these ? >=20 > More seriously, this is not how caches work. Being close together > matters mainly so long as your data fits in a cacheline, beyond that > physical proximity matters little. On stack, the bitreader is likely to > share the cacheline with other data that is currently needed, thus > improving cache utilization. caches are complex, and being close does matter. having things in seperate allocations risks hitting aliassing cases (that is things that cannot be in the cache at the same time) so when you have the bitstream, the frame buffer, the context already in 3 independant locations adding a few more increases the risk for hitting these. Also sequential memory access is faster than non sequential, it does make sense to put things together in few places than to scatter them Its years since ive done hardcore optimization stuff but i dont think the principles have changed that much that random access is faster than sequential and that caches work fundamentally differently >=20 > Another factor that matters in efficient cache use is e.g. not having > multiple copies of the same constant data scattered around, which you're > objecting to in my other patches. copying the actually used small data together per slice where its accessed per pixel should improve teh speed per pixel while making the per slice code a little slower. now we have 4 slices maybe and millions of pixels. Thats why this can give an overall gain thx [...] --=20 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB No snowflake in an avalanche ever feels responsible. -- Voltaire --wA6bWYZHIXiG7sTu Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iF0EABEKAB0WIQSf8hKLFH72cwut8TNhHseHBAsPqwUCZpkrIgAKCRBhHseHBAsP q8XkAJ41yUjkW3X1v3QCEnDDjng1DyN3TACghTHUfW7nDOw7ScdtBChONcUxbbE= =U6R6 -----END PGP SIGNATURE----- --wA6bWYZHIXiG7sTu-- --===============5354171201615839967== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". --===============5354171201615839967==--