multithreading 我如何理解读取内存障碍和易失性

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1787450/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 01:05:48  来源:igfitidea点击:

How do I Understand Read Memory Barriers and Volatile

multithreadingvolatilememory-barriers

提问by Jason Kresowaty

Some languages provide a volatilemodifier that is described as performing a "read memory barrier" prior to reading the memory that backs a variable.

一些语言提供了一个volatile修饰符,它被描述为在读取支持变量的内存之前执行“读取内存屏障”。

A read memory barrier is commonly described as a way to ensure that the CPU has performed the reads requested before the barrier before it performs a read requested after the barrier. However, using this definition, it would seem that a stale value could still be read. In other words, performing reads in a certain order does not seem to mean that the main memory or other CPUs must be consulted to ensure that subsequent values read actually reflect the latest in the system at the time of the read barrier or written subsequently after the read barrier.

读取内存屏障通常被描述为一种确保 CPU 在执行屏障之前请求的读取之前执行屏障之后请求的读取的方法。但是,使用此定义,似乎仍然可以读取过时的值。换句话说,以某种顺序执行读取似乎并不意味着必须咨询主存或其他 CPU 以确保后续读取的值实际上反映了读取屏障时系统中的最新值,或者在读取屏障之后随后写入的值。阅读障碍。

So, does volatile really guarantee that an up-to-date value is read or just (gasp!) that the values that are read are at least as up-to-date as the reads before the barrier? Or some other interpretation? What are the practical implications of this answer?

那么, volatile 真的保证读取的是最新值,还是只是(喘气!)读取的值至少与屏障之前的读取一样最新?或者其他的解释?这个答案的实际意义是什么?

回答by tony

There are read barriers and write barriers; acquire barriers and release barriers. And more (io vs memory, etc).

有读障碍和写障碍;获得障碍和释放障碍。还有更多(io vs 内存等)。

The barriers are not there to control "latest" value or "freshness" of the values. They are there to control the relative ordering of memory accesses.

控制“最新”值或“新鲜度”值的障碍并不存在。它们用于控制内存访问的相对顺序。

Write barriers control the order of writes. Because writes to memory are slow (compared to the speed of the CPU), there is usually a write-request queue where writes are posted before they 'really happen'. Although they are queued in order, while inside the queue the writes may be reordered. (So maybe 'queue' isn't the best name...) Unless you use write barriers to prevent the reordering.

写屏障控制写的顺序。因为写入内存很慢(与 CPU 的速度相比),通常有一个写入请求队列,写入在“真正发生”之前发布。尽管它们按顺序排队,但在队列内部时,写入可能会重新排序。(所以也许“队列”不是最好的名字......)除非你使用写屏障来防止重新排序。

Read barriers control the order of reads. Because of speculative execution (CPU looks ahead and loads from memory early) and because of the existence of the write buffer (the CPU will read a value from the write buffer instead of memory if it is there - ie the CPU thinks it just wrote X = 5, then why read it back, just see that it is still waiting to become5 in the write buffer) reads may happen out of order.

读屏障控制读的顺序。由于推测性执行(CPU 提前查看并提前从内存加载)以及写入缓冲区的存在(如果存在,CPU 将从写入缓冲区而不是内存中读取值 - 即 CPU 认为它只是写了 X = 5,那为什么要读回来,只是看写缓冲区还在等待变成5)读取可能会发生乱序。

This is true regardless of what the compiler tries to do with respect to the order of the generated code. ie 'volatile' in C++ won't help here, because it only tells the compiler to output code to re-read the value from "memory", it does NOT tell the CPU how/where to read it from (ie "memory" is many things at the CPU level).

无论编译器试图对生成的代码的顺序做什么,这都是正确的。即 C++ 中的“易失性”在这里无济于事,因为它只告诉编译器输出代码以重新读取“内存”中的值,它不会告诉 CPU 如何/从哪里读取它(即“内存”在 CPU 级别有很多事情)。

So read/write barriers put up blocks to prevent reordering in the read/write queues (the read isn't usually so much of a queue, but the reordering effects are the same).

所以读/写屏障设置了块以防止在读/写队列中重新排序(读取通常不是一个队列,但重新排序的效果是相同的)。

What kinds of blocks? - acquire and/or release blocks.

有哪些类型的块?- 获取和/或释放块。

Acquire - eg read-acquire(x) will add the read of x into the read-queue and flush the queue(not really flush the queue, but add a marker saying don't reorder anything before this read, which is as if the queue was flushed). So later (in code order) reads can be reordered, but not before the read of x.

Acquire - 例如 read-acquire(x) 会将 x 的读取添加到读取队列中并刷新队列(不是真正刷新队列,而是添加一个标记,说在此读取之前不要重新排序任何内容,就好像队列被刷新)。所以稍后(按代码顺序)读取可以重新排序,但不能在读取 x 之前重新排序。

Release - eg write-release(x, 5) will flush (or marker) the queue first, then add the write-request to the write-queue. So earlier writes won't become reordered to happen after x = 5, but note that later writes can be reordered before x = 5.

发布 - 例如 write-release(x, 5) 将首先刷新(或标记)队列,然后将写入请求添加到写入队列。因此,较早的写入不会在 x = 5 之后重新排序,但请注意,稍后的写入可以在 x = 5 之前重新排序。

Note that I paired the read with acquire and write with release because this is typical, but different combinations are possible.

请注意,我将读取与获取和写入与释放配对,因为这是典型的,但不同的组合也是可能的。

Acquire and Release are considered 'half-barriers' or 'half-fences' because they only stop the reordering from going one way.

获取和释放被认为是“半壁垒”或“半栅栏”,因为它们只会阻止重新排序以一种方式进行。

A full barrier (or full fence) applies both an acquire and a release - ie no reordering.

完整屏障(或完整栅栏)适用于获取和释放 - 即没有重新排序。

Typically for lockfree programming, or C# or java 'volatile', what you want/need is read-acquire and write-release.

通常对于无锁编程,或 C# 或 java 'volatile',您想要/需要的是读取-获取和写入-释放。

ie

IE

void threadA()
{
   foo->x = 10;
   foo->y = 11;
   foo->z = 12;
   write_release(foo->ready, true);
   bar = 13;
}
void threadB()
{
   w = some_global;
   ready = read_acquire(foo->ready);
   if (ready)
   {
      q = w * foo->x * foo->y * foo->z;
   }
   else
       calculate_pi();
}

So, first of all, this is a bad way to program threads. Locks would be safer. But just to illustrate barriers...

所以,首先,这是一种糟糕的线程编程方式。锁会更安全。但只是为了说明障碍......

After threadA() is done writing foo, it needs to write foo->ready LAST, really last, else other threads might see foo->ready early and get the wrong values of x/y/z. So we use a write_releaseon foo->ready, which, as mentioned above, effectively 'flushes' the write queue (ensuring x,y,z are committed) then adds the ready=true request to the queue. And then adds the bar=13 request. Note that since we just used a release barrier (not a full) bar=13 may get written before ready. But we don't care! ie we are assuming bar is not changing shared data.

threadA() 写完 foo 后,需要写 foo->ready LAST,真的是最后,否则其他线程可能会提前看到 foo->ready 并得到错误的 x/y/z 值。因此,我们write_release在 foo->ready 上使用 a ,如上所述,它有效地“刷新”了写入队列(确保 x,y,z 已提交),然后将 ready=true 请求添加到队列中。然后添加 bar=13 请求。请注意,由于我们刚刚使用了一个释放屏障(不是完整的) bar=13 可能会在准备好之前被写入。但我们不在乎!即我们假设 bar 没有改变共享数据。

Now threadB() needs to know that when we say 'ready' we really mean ready. So we do a read_acquire(foo->ready). This read is added to the read queue, THEN the queue is flushed. Note that w = some_globalmay also still be in the queue. So foo->ready may be read beforesome_global. But again, we don't care, as it is not part of the important data that we are being so careful about. What we do care about is foo->x/y/z. So they are added to the read queue after the acquire flush/marker, guaranteeing that they are read only after reading foo->ready.

现在 threadB() 需要知道当我们说“准备好”时,我们真正的意思是准备好。所以我们做一个read_acquire(foo->ready). 这个读取被添加到读取队列,然后队列被刷新。请注意,w = some_global也可能仍在队列中。所以 foo->ready 可能会some_global. 但同样,我们不在乎,因为它不是我们如此小心的重要数据的一部分。我们关心的是 foo->x/y/z。所以它们在获取flush/marker之后被加入到读队列中,保证只有在读完foo->ready之后才被读取。

Note also, that this is typically the exact same barriers used for locking and unlocking a mutex/CriticalSection/etc. (ie acquire on lock(), release on unlock() ).

另请注意,这通常与用于锁定和解锁互斥锁/CriticalSection/等的屏障完全相同。(即在 lock() 上获取,在 unlock() 上释放)。

So,

所以,

  • I'm pretty sure this (ie acquire/release) is exactly what MS docs say happens for read/writes of 'volatile' variables in C# (and optionally for MS C++, but this is non-standard). See http://msdn.microsoft.com/en-us/library/aa645755(VS.71).aspxincluding "A volatile read has "acquire semantics"; that is, it is guaranteed to occur prior to any references to memory that occur after it..."

  • I thinkjava is the same, although I'm not as familiar. I suspect it is exactly the same, because you just don't typically need more guarantees than read-acquire/write-release.

  • In your question you were on the right track when thinking that it is really all about relative order - you just had the orderings backwards (ie "the values that are read are at least as up-to-date as the reads before the barrier? " - no, reads before the barrier are unimportant, its reads AFTER the barrier that are guaranteed to be AFTER, vice versa for writes).

  • And please note, as mentioned, reordering happens on both reads and writes, so only using a barrier on one thread and not the other WILL NOT WORK. ie a write-release isn't enough without the read-acquire. ie even if you write it in the right order, it could be read in the wrong order if you didn't use the read barriers to go with the write barriers.

  • And lastly, note that lock-free programming and CPU memory architectures can be actually much more complicated than that, but sticking with acquire/release will get you pretty far.

  • 我很确定这(即获取/释放)正是 MS 文档所说的在 C# 中读取/写入“易失性”变量时发生的情况(以及可选的 MS C++,但这是非标准的)。请参阅http://msdn.microsoft.com/en-us/library/aa645755(VS.71).aspx包括“易失性读取具有“获取语义”;也就是说,它保证在对内存的任何引用之前发生发生在它之后……”

  • 认为java是一样的,虽然我不太熟悉。我怀疑它完全相同,因为您通常不需要比读取-获取/写入-释放更多的保证。

  • 在您的问题中,当您认为这实际上完全与相对顺序有关时,您走在正确的轨道上-您只是将顺序倒过来了(即“读取的值至少与屏障之前的读取一样最新? “ - 不,屏障之前的读取不重要,它在屏障之后的读取保证在之后,反之亦然。

  • 请注意,如上所述,重新排序在读取和写入时都会发生,因此仅在一个线程上使用屏障而不是另一个线程将不起作用。即没有读取获取,写入释放是不够的。即,即使您以正确的顺序编写它,如果您没有使用读取屏障来配合写入屏障,它也可能以错误的顺序被读取。

  • 最后,请注意无锁编程和 CPU 内存架构实际上可能比这复杂得多,但坚持使用获取/释放会让你走得很远。

回答by Nikolai Fetissov

volatilein most programming languages does not imply a real CPU read memory barrier but an order to the compiler not to optimize the reads via caching in a register. This means that the reading process/thread will get the value "eventually". A common technique is to declare a boolean volatileflag to be set in a signal handler and checked in the main program loop.

In contrast CPU memory barriers are directly provided either via CPU instructions or implied with certain assembler mnemonics (such as lockprefix in x86) and are used for example when talking to hardware devices where order of reads and writes to memory-mapped IO registers is important or synchronizing memory access in multi-processing environment.

To answer your question - no, memory barrier does not guarantee "latest" value, but guarantees orderof memory access operations. This is crucial for example in lock-freeprogramming.

Hereis one of the primers on CPU memory barriers.

volatile在大多数编程语言中,并不意味着真正的 CPU 读取内存屏障,而是命令编译器不要通过缓存在寄存器中来优化读取。这意味着读取进程/线程将“最终”获得该值。一种常见的技术是volatile在信号处理程序中声明一个布尔标志,并在主程序循环中检查。

相比之下,CPU 内存屏障直接通过 CPU 指令提供或通过某些汇编助记符(例如lockx86 中的前缀)隐含,例如在与硬件设备交谈时使用,其中读取和写入内存映射 IO 寄存器的顺序很重要或在多处理环境中同步内存访问。

回答您的问题 - 不,内存屏障不保证“最新”值,内存访问操作的顺序。例如,这在无编程中至关重要。

是 CPU 内存屏障的入门读物之一。