C语言 了解 CPU 缓存和缓存线

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5007377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 07:48:22  来源:igfitidea点击:

Understanding CPU cache and cache line

ccpu-cache

提问by kirbo

I am trying to understand how CPU cache is operating. Lets say we have this configuration (as an example).

我试图了解 CPU 缓存的运行方式。假设我们有这个配置(作为例子)。

  • Cache size 1024 bytes
  • Cache line 32 bytes
  • 1024/32 = 32 cache lines all together.
  • Singel cache line can store 32/4 = 8 ints.
  • 缓存大小 1024 字节
  • 缓存行 32 字节
  • 1024/32 = 32 个缓存行。
  • 单个缓存行可以存储 32/4 = 8 个整数。

1) According to these configuration length of tag should be 32-5=27 bits, and size of index 5 bits (2^5 = 32 addresses for each byte in cache line).

1) 根据这些配置,标签长度应为 32-5=27 位,索引大小为 5 位(2^5 = 缓存行中每个字节的 32 个地址)。

If total cache size is 1024 and there are 32 cache lines, where is tags+indexes are stored? (There is another 4*32 = 128 bytes.) Does it means that actual size of the cache is 1024+128 = 1152?

如果总缓存大小为 1024 并且有 32 条缓存行,那么标签+索引存储在哪里?(还有 4*32 = 128 字节。)是不是意味着缓存的实际大小是 1024+128 = 1152?

2) If cache line is 32 bytes in this example, this means that 32 bytes getting copied in cache whenerever CPU need to get new byte from RAM. Am I right to assume that cache line position of the requested byte will be determined by its adress?

2) 如果在这个例子中缓存线是 32 字节,这意味着每当 CPU 需要从 RAM 获取新字节时,32 字节就会被复制到缓存中。我假设请求字节的缓存行位置将由其地址确定是否正确?

This is what I mean: if CPU requested byte at [FF FF 00 08], then available cache line will be filled with bytes from [FF FF 00 00]to [FF FF 00 1F]. And our requseted single byte will be at position [08].

这就是我的意思:如果 CPU 请求字节 at [FF FF 00 08],则可用缓存行将填充字节[FF FF 00 00][FF FF 00 1F]。并且我们请求的单字节将在位置[08]

3) If previous statement is correct, does it mean that 5 bits that used for index, are technically not needed since all 32 bytes are in the cache line anyway?

3)如果前面的陈述是正确的,这是否意味着用于索引的 5 位在技术上是不需要的,因为所有 32 个字节都在缓存行中?

Please let me know if I got something wrong. Thanks

如果我有什么问题,请告诉我。谢谢

采纳答案by John Ripley

A cache consists of data and tag RAM, arranged as a compromise of access time vs efficiency and physical layout. You're missing an important stat: number of ways (sets). You rarely have 1-way caches, because they perform pathologically badly with simple patterns. Anyway:

缓存由数据和标签 RAM 组成,按照访问时间与效率和物理布局的折衷安排。你错过了一个重要的统计数据:方式(组)的数量。您很少有单向缓存,因为它们在使用简单模式时表现不佳。反正:

1) Yes, tags take extra space. This is part of the design compromise - you don't want it to be a large fraction of the total area, and why line size isn't just 1 byte or 1 word. Also, all tags for an index are simultaneously accessed, and that can affect efficiency and layout if there's a large number of ways. The size is slightly bigger than your estimate. There's usually also a few bits extra bits to mark validity and sometimes hints. More ways and smaller lines needs a larger fraction taken up by tags, so generally lines are large (32+ bytes) and ways are small (4-16).

1) 是的,标签需要额外的空间。这是设计妥协的一部分 - 您不希望它占总面积的很大一部分,以及为什么行大小不只是 1 个字节或 1 个字。此外,索引的所有标签都是同时访问的,如果有很多方法,这会影响效率和布局。尺寸略大于您的估计。通常还有一些额外的位来标记有效性,有时还有提示。更多的路和更小的行需要标签占据更大的部分,所以通常行是大的(32+字节)而路是小的(4-16)。

2) Yes. Some caches also do a "critical word first" fetch, where they start with the word that caused the line fill, then fetch the rest. This reduces the number of cycles the CPU is waiting for the data it actually asked for. Some caches will "write thru" and not allocate a line if you miss on a write, which avoids having to read the entire cache line first, before writing to it (this isn't always a win).

2) 是的。一些缓存还执行“关键词优先”获取,它们从导致行填充的单词开始,然后获取其余的词。这减少了 CPU 等待它实际请求的数据的周期数。某些缓存将“直通”写入,如果您错过写入,则不会分配一行,这样可以避免在写入之前必须先读取整个缓存行(这并不总是成功)。

3) The tags won't store the lower 5 bits as they're not needed to match a cache line. They just index into individual lines.

3) 标签不会存储低 5 位,因为它们不需要匹配缓存线。他们只是索引到单独的行。

Wikipedia has a pretty good, if a bit intense, write-up on caches: http://en.wikipedia.org/wiki/CPU_cache- see "Implementation". There's a diagram of how data and tags are split. Me, I think everyone should learn this stuff because you really can improve performance of code when you know what the underlying machine is actually capable of.

维基百科有一篇关于缓存的文章,虽然有点激烈,但非常好:http: //en.wikipedia.org/wiki/CPU_cache- 请参阅“实施”。有一个数据和标签如何拆分的图表。我,我认为每个人都应该学习这些东西,因为当你知道底层机器的实际能力时,你真的可以提高代码的性能。

回答by bta

  1. The cache metadata is typically not counted as a part of the cache itself. It might not even be stored in the same part of the CPU (it could be in another cache, implemented using special CPU registers, etc).
  2. This depends on whether your CPU will fetch unaligned addresses. If it will only fetch aligned addresses, then the example you gave would be correct. If the CPU fetches unaligned addresses, then it might fetch the range 0xFFFF0008 to 0xFFFF0027.
  3. The index bytes are still useful, even when cache access is aligned. This gives the CPU a shorthand method for referencing a byte within a cache line that it can use in its internal bookkeeping. You could get the same information by knowing the address associated with the cache line and the address associated with the byte, but that's a whole lot more information to carry around.
  1. 缓存元数据通常不计为缓存本身的一部分。它甚至可能不会存储在 CPU 的同一部分(它可能在另一个缓存中,使用特殊的 CPU 寄存器等实现)。
  2. 这取决于您的 CPU 是否会获取未对齐的地址。如果它只会获取对齐的地址,那么您给出的示例将是正确的。如果 CPU 获取未对齐的地址,则它可能获取 0xFFFF0008 到 0xFFFF0027 的范围。
  3. 即使缓存访问是对齐的,索引字节仍然有用。这为 CPU 提供了一种在缓存行中引用字节的速记方法,它可以在其内部簿记中使用。您可以通过了解与缓存行相关联的地址和与字节相关联的地址来获得相同的信息,但要携带的信息要多得多。

Different CPUs implement caching very differently. For the best answer to your question, please give some additional details about the particular CPU (type, model, etc) that you are talking about.

不同的 CPU 实现缓存的方式非常不同。为了最好地回答您的问题,请提供有关您所讨论的特定 CPU(类型、型号等)的其他详细信息。

回答by typo.pl

This is based on my vague memory, you should read books like "Computer Architecture: A Quantitative Approach" by Hennessey and Patterson. Great book.

这是基于我模糊的记忆,您应该阅读 Hennessey 和 Patterson 的“计算机架构:定量方法”之类的书。很棒的书。

Assuming a 32-bit CPU... (otherwise your figures would need to use >4 bytes (maybe <8 bytes since some/most 64-bit CPU don't have all 64 bits of address line used)) for the address.

假设一个 32 位 CPU ......(否则你的数字将需要使用 >4 个字节(可能 <8 个字节,因为一些/大多数 64 位 CPU 没有使用所有 64 位地址线))作为地址。

1) I believe it's at least 4*32 bytes. Depending on the CPU, the chip architects may have decided to keep track of other info besides the full address. But it's usually not considered part of the cache.

1)我相信它至少是 4*32 字节。根据 CPU 的不同,芯片架构师可能已经决定跟踪除完整地址之外的其他信息。但它通常不被视为缓存的一部分。

2) Yes, but how that mapping is done is different. See Wikipedia - CPU cache - associativityThere's the simple direct mapped cache and the more complex associative mapped cache. You want to avoid the case where some code needs two piece of information but the two addresses map to the exact same cache line.

2) 是的,但是映射的完成方式是不同的。请参阅维基百科 - CPU 缓存 - 关联性有简单的直接映射缓存和更复杂的关联映射缓存。您希望避免某些代码需要两条信息但这两个地址映射到完全相同的缓存行的情况。