C++ 读取互锁变量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/779996/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading interlocked variables
提问by MattJ
Assume:
认为:
A. C++ under WIN32.
A. WIN32下的C++。
B. A properly aligned volatile integer incremented and decremented using InterlockedIncrement()
and InterlockedDecrement()
.
B. 正确对齐的易失性整数使用InterlockedIncrement()
和递增和递减InterlockedDecrement()
。
__declspec (align(8)) volatile LONG _ServerState = 0;
If I want to simply read _ServerState, do I need to read the variable via an InterlockedXXX
function?
如果我只想读取_ServerState,是否需要通过InterlockedXXX
函数读取变量?
For instance, I have seen code such as:
例如,我见过这样的代码:
LONG x = InterlockedExchange(&_ServerState, _ServerState);
and
和
LONG x = InterlockedCompareExchange(&_ServerState, _ServerState, _ServerState);
The goal is to simply read the current value of _ServerState
.
目标是简单地读取 的当前值_ServerState
。
Can't I simply say:
我不能简单地说:
if (_ServerState == some value)
{
// blah blah blah
}
There seems to be some confusion WRT this subject. I understand register-sized reads are atomic in Windows, so I would assume the InterlockedXXX
function is unnecessary.
WRT这个主题似乎有些混乱。我知道寄存器大小的读取在 Windows 中是原子的,所以我认为该InterlockedXXX
函数是不必要的。
Matt J.
马特·J。
Okay, thanks for the responses. BTW, this is Visual C++ 2005 and 2008.
好的,谢谢回复。顺便说一句,这是 Visual C++ 2005 和 2008。
If it's true I should use an InterlockedXXX
function to read the value of _ServerState
, even if just for the sake of clarity, what's the best way to go about that?
如果这是真的,我应该使用一个InterlockedXXX
函数来读取 的值_ServerState
,即使只是为了清楚起见,最好的方法是什么?
LONG x = InterlockedExchange(&_ServerState, _ServerState);
This has the side effect of modifying the value, when all I really want to do is read it. Not only that, but there is a possibility that I could reset the flag to the wrong value if there is a context switch as the value of _ServerState
is pushed on the stack in preparation of calling InterlockedExchange()
.
这具有修改值的副作用,当我真正想做的就是读取它时。不仅如此,如果存在上下文切换,我可能会将标志重置为错误的值,因为 的值_ServerState
被压入堆栈以准备调用InterlockedExchange()
.
LONG x = InterlockedCompareExchange(&_ServerState, _ServerState, _ServerState);
I took this from an example I saw on MSDN.
See http://msdn.microsoft.com/en-us/library/ms686355(VS.85).aspx
我从我在 MSDN 上看到的一个例子中得到了这个。
请参阅http://msdn.microsoft.com/en-us/library/ms686355(VS.85).aspx
All I need is something along the lines:
我所需要的只是一些东西:
lock mov eax, [_ServerState]
In any case, the point, which I thought was clear, is to provide thread-safe access to a flag without incurring the overhead of a critical section. I have seen LONGs used this way via the InterlockedXXX()
family of functions, hence my question.
在任何情况下,我认为很清楚的一点是提供对标志的线程安全访问,而不会产生临界区的开销。我已经看到 LONG 通过InterlockedXXX()
函数系列以这种方式使用,因此我提出了问题。
Okay, we are thinking a good solution to this problem of reading the current value is:
好的,我们正在考虑解决这个读取当前值的问题的好方法是:
LONG Cur = InterlockedCompareExchange(&_ServerState, 0, 0);
回答by Michael Burr
It depends on what you mean by "goal is to simply read the current value of _ServerState" and it depends on what set of tools and the platform you use (you specify Win32 and C++, but not which C++ compiler, and that may matter).
这取决于您所说的“目标是简单地读取 _ServerState 的当前值”的含义,并且取决于您使用的工具集和平台(您指定 Win32 和 C++,但不指定哪个 C++ 编译器,这可能很重要) .
If you simply want to read the value such that the value is uncorrupted (ie., if some other processor is changing the value from 0x12345678 to 0x87654321 your read will get one of those 2 values and not 0x12344321) then simply reading will be OK as long as the variable is :
如果您只是想读取该值以使该值未损坏(即,如果其他处理器正在将值从 0x12345678 更改为 0x87654321,您的读取将获得这两个值之一而不是 0x12344321),那么只需读取即可只要变量是:
- marked
volatile
, - properly aligned, and
- read using a single instruction with a word size that the processor handles atomically
- 标记
volatile
, - 正确对齐,以及
- 使用单个指令读取,其字长由处理器原子处理
None of this is promised by the C/C++ standard, but Windows and MSVC do make these guarantees, and I think that most compilers that target Win32 do as well.
C/C++ 标准没有承诺这些,但 Windows 和 MSVC 确实做出了这些保证,我认为大多数针对 Win32 的编译器也能做到。
However, if you want your read to be synchronized with behavior of the other thread, there's some additional complexity. Say that you have a simple 'mailbox' protocol:
但是,如果您希望您的读取与其他线程的行为同步,则存在一些额外的复杂性。假设您有一个简单的“邮箱”协议:
struct mailbox_struct {
uint32_t flag;
uint32_t data;
};
typedef struct mailbox_struct volatile mailbox;
// the global - initialized before wither thread starts
mailbox mbox = { 0, 0 };
//***************************
// Thread A
while (mbox.flag == 0) {
/* spin... */
}
uint32_t data = mbox.data;
//***************************
//***************************
// Thread B
mbox.data = some_very_important_value;
mbox.flag = 1;
//***************************
The thinking is Thread A will spin waiting for mbox.flag to indicate mbox.data has a valid piece of information. Thread B will write some data into mailbox.data then will set mbox.flag to 1 as a signal that mbox.data is valid.
想法是线程 A 将旋转等待 mbox.flag 指示 mbox.data 具有有效信息。线程 B 将一些数据写入邮箱.data,然后将 mbox.flag 设置为 1 作为 mbox.data 有效的信号。
In this case a simple read in Thread A of mbox.flag might get the value 1 even though a subsequent read of mbox.data in Thread A does not get the value written by Thread B.
在这种情况下,即使在线程 A 中对 mbox.data 的后续读取未获得线程 B 写入的值,在 mbox.flag 的线程 A 中的简单读取也可能获得值 1。
This is because even though the compiler will not reorder the Thread B writes to mbox.data and mbox.flag, the processor and/or cache might. C/C++ guarantees that the compiler will generate code such that Thread B will write to mbox.data before it writes to mbox.flag, but the processor and cache might have a different idea - special handling called 'memory barriers' or 'acquire and release semantics' must be used to ensure ordering below the level of the thread's stream of instructions.
这是因为即使编译器不会重新排序线程 B 写入 mbox.data 和 mbox.flag,处理器和/或缓存可能会。C/C++ 保证编译器将生成代码,以便线程 B 在写入 mbox.flag 之前写入 mbox.data,但处理器和缓存可能有不同的想法 - 称为“内存屏障”或“获取和获取”的特殊处理必须使用“释放语义”来确保在线程的指令流级别以下进行排序。
I'm not sure if compilers other than MSVC make any claims about ordering below the instruction level. However MS does guarantee that for MSVC volatile is enough - MS specifies that volatile writes have release semantics and volatile reads have acquire semantics - though I'm not sure at which version of MSVC this applies - see http://msdn.microsoft.com/en-us/library/12a04hfd.aspx?ppud=4.
我不确定 MSVC 以外的编译器是否对指令级别以下的排序做出任何声明。但是,MS 确实保证对于 MSVC volatile 就足够了-MS 指定 volatile 写入具有释放语义,而 volatile 读取具有获取语义-尽管我不确定这适用于哪个版本的 MSVC-请参阅http://msdn.microsoft.com /en-us/library/12a04hfd.aspx?ppud=4。
I have also seen code like you describe that uses Interlocked APIs to perform simple reads and writes to shared locations. My take on the matter is to use the Interlocked APIs. Lock free inter-thread communication is full of very difficult to understand and subtle pitfalls, and trying to take a shortcut on a critical bit of code that may end up with a very difficult to diagnose bug doesn't seem like a good idea to me. Also, using an Interlocked API screams to anyone maintaining the code, "this is data access that needs to be shared or synchronized with something else - tread carefully!".
我还看到过像您描述的那样使用互锁 API 对共享位置执行简单读取和写入的代码。我对此事的看法是使用互锁 API。无锁的线程间通信充满了非常难以理解和微妙的陷阱,并且试图在关键的代码位上走捷径,最终可能会导致非常难以诊断的错误,这对我来说似乎不是一个好主意. 此外,使用互锁 API 会向维护代码的任何人发出尖叫,“这是需要与其他东西共享或同步的数据访问 -小心行事!”。
Also when using the Interlocked API you're taking the specifics of the hardware and the compiler out of the picture - the platform makes sure all of that stuff is dealt with properly - no more wondering...
此外,在使用互锁 API 时,您将硬件和编译器的细节从图片中剔除 - 平台确保所有这些东西都得到正确处理 - 不再想知道......
Read Herb Sutter's Effective Concurrency articleson DDJ (which happen to be down at the moment, for me at least) for good information on this topic.
阅读Herb Sutter 的关于 DDJ的 Effective Concurrency 文章(至少对我来说目前正在关闭)以获取有关此主题的良好信息。
回答by Sergey
Your way is good:
你的方法很好:
LONG Cur = InterlockedCompareExchange(&_ServerState, 0, 0);
I'm using similar solution:
我正在使用类似的解决方案:
LONG Cur = InterlockedExchangeAdd(&_ServerState, 0);
回答by Bartosz Milewski
Interlocked instructions provide atomicity andinter-processor synchronization. Both writes and reads must be synchronized, so yes, you should be using interlocked instructions to read a value that is shared between threads and not protected by a lock. Lock-free programming (and that's what you're doing) is a very tricky area, so you might consider using locks instead. Unless this is reallyone of your program's bottlenecks that must be optimized?
互锁指令提供原子性和处理器间同步。写入和读取都必须同步,所以是的,您应该使用互锁指令来读取在线程之间共享且不受锁保护的值。无锁编程(这就是你正在做的)是一个非常棘手的领域,所以你可以考虑使用锁来代替。除非这真的是必须优化的程序瓶颈之一?
回答by Sergey D
To anyone who has to revisit this thread I want to add to what was well explained by Bartosz that _InterlockedCompareExchange()
is a good alternative to standard atomic_load()
if standard atomics are not available. Here is the code for atomically reading my_uint32_t_var in C on i86 Win64. atomic_load()
is included as a benchmark:
对于必须重新访问此线程的任何人,我想补充 Bartosz 充分解释的内容,如果标准原子不可用,它_InterlockedCompareExchange()
是标准的一个很好的替代方案atomic_load()
。这是在 i86 Win64 上用 C 原子读取 my_uint32_t_var 的代码。atomic_load()
作为基准包括在内:
long debug_x64_i = std::atomic_load((const std::_Atomic_long *)&my_uint32_t_var);
00000001401A6955 mov eax,dword ptr [rbp+30h]
00000001401A6958 xor edi,edi
00000001401A695A mov dword ptr [rbp-0Ch],eax
debug_x64_i = _InterlockedCompareExchange((long*)&my_uint32_t_var, 0, 0);
00000001401A695D xor eax,eax
00000001401A695F lock cmpxchg dword ptr [rbp+30h],edi
00000001401A6964 mov dword ptr [rbp-0Ch],eax
debug_x64_i = _InterlockedOr((long*)&my_uint32_t_var, 0);
00000001401A6967 prefetchw [rbp+30h]
00000001401A696B mov eax,dword ptr [rbp+30h]
00000001401A696E xchg ax,ax
00000001401A6970 mov ecx,eax
00000001401A6972 lock cmpxchg dword ptr [rbp+30h],ecx
00000001401A6977 jne foo+30h (01401A6970h)
00000001401A6979 mov dword ptr [rbp-0Ch],eax
long release_x64_i = std::atomic_load((const std::_Atomic_long *)&my_uint32_t_var);
00000001401A6955 mov eax,dword ptr [rbp+30h]
release_x64_i = _InterlockedCompareExchange((long*)&my_uint32_t_var, 0, 0);
00000001401A6958 mov dword ptr [rbp-0Ch],eax
00000001401A695B xor edi,edi
00000001401A695D mov eax,dword ptr [rbp-0Ch]
00000001401A6960 xor eax,eax
00000001401A6962 lock cmpxchg dword ptr [rbp+30h],edi
00000001401A6967 mov dword ptr [rbp-0Ch],eax
release_x64_i = _InterlockedOr((long*)&my_uint32_t_var, 0);
00000001401A696A prefetchw [rbp+30h]
00000001401A696E mov eax,dword ptr [rbp+30h]
00000001401A6971 mov ecx,eax
00000001401A6973 lock cmpxchg dword ptr [rbp+30h],ecx
00000001401A6978 jne foo+31h (01401A6971h)
00000001401A697A mov dword ptr [rbp-0Ch],eax
回答by Kirill V. Lyadvinsky
32-bit read operations are already atomic on some32-bit systems (Intel spec says these operations are atomic, but there's no guarantee that this will be true on other x86-compatible platforms). So you shouldn't use this for threads synchronization.
32 位读取操作在某些32 位系统上已经是原子的(英特尔规范说这些操作是原子的,但不能保证在其他 x86 兼容平台上也是如此)。所以你不应该将它用于线程同步。
If you need a flag some sort you should consider using Event
object and WaitForSingleObject
function for that purpose.
如果您需要某种标志,您应该考虑为此目的使用Event
对象和WaitForSingleObject
函数。
回答by Zach Saw
Read is fine. A 32-bit value is always read as a whole as long as it's not split on a cache line. Your align 8 guarantees that it's always within a cache line so you'll be fine.
读书就好。32 位值始终作为一个整体读取,只要它不在缓存行上拆分。你的 align 8 保证它总是在一个缓存行内,所以你会没事的。
Forget about instructions reordering and all that non-sense. Results are always retired in-order. It would be a processor recall otherwise!!!
忘记重新排序的指令和所有无意义的。结果总是按顺序退出。否则将是处理器召回!!!
Even for a dual CPU machine (i.e. shared via the slowest FSBs), you'll still be fine as the CPUs guarantee cache coherency via MESI Protocol. The only thing you're not guaranteed is the value you read may not be the absolute latest. BUT, what isthe latest anyway? That's something you likely won't need to know in most situations if you're not writing back to the location based on the value of that read. Otherwise, you'd have used interlocked ops to handle it in the first place.
即使对于双 CPU 机器(即通过最慢的 FSB 共享),您仍然可以,因为 CPU 通过 MESI 协议保证缓存一致性。您唯一不能保证的是您读取的值可能不是绝对最新的。但是,无论如何,最新的是什么?如果您没有根据读取的值写回该位置,那么在大多数情况下您可能不需要知道这一点。否则,您首先会使用互锁操作来处理它。
In short, you gain nothing by using Interlocked ops on a read (except perhaps reminding the next person maintaining your code to tread carefully - then again, that person may not be qualified to maintain your code to begin with).
简而言之,在读取时使用 Interlocked ops 一无所获(除了提醒下一个维护您代码的人小心行事 - 再说一次,那个人可能没有资格维护您的代码)。
EDIT: In response to a comment left by Adrian McCarthy.
编辑:回应Adrian McCarthy留下的评论。
You're overlooking the effect of compiler optimizations. If the compiler thinks it has the value already in a register, then it's going to re-use that value instead of re-reading it from memory. Also, the compiler may do instruction re-ordering for optimization if it believes there are no observable side effects.
您忽略了编译器优化的效果。如果编译器认为它已经在寄存器中具有该值,那么它将重新使用该值而不是从内存中重新读取它。此外,如果编译器认为没有可观察到的副作用,则它可能会进行指令重新排序以进行优化。
I did not say reading from a non-volatile variable is fine. All the question was asking was if interlocked was required. In fact, the variable in question was clearly declared with volatile
. Or were youoverlooking the effect of the keyword volatile
?
我并没有说从非易失性变量中读取是可以的。所有的问题都是问是否需要互锁。事实上,有问题的变量是用 明确声明的volatile
。还是您忽略了关键字的效果volatile
?
回答by Neeraj Singh
Your initial understanding is basically correct. According to the memory model which Windows requires on all MP platforms it supports (or ever will support), reads from a naturally-aligned variable marked volatile are atomic as long as they are smaller than the size of a machine word. Same with writes. You don't need a 'lock' prefix.
你的初步理解基本正确。根据 Windows 在其支持(或将支持)的所有 MP 平台上所需的内存模型,从标记为 volatile 的自然对齐变量读取是原子的,只要它们小于机器字的大小。与写相同。您不需要“锁定”前缀。
If you do the reads without using an interlock, you are subject to processor reordering. This can even occur on x86, in a limited circumstance: reads from a variable may be moved above writes of a different variable. On pretty much every non-x86 architecture that Windows supports, you are subject to even more complicated reordering if you don't use explicit interlocks.
如果您在不使用互锁的情况下进行读取,则您将受到处理器重新排序的影响。这甚至可能发生在 x86 上,在有限的情况下:从变量读取可能会移动到不同变量的写入之上。在 Windows 支持的几乎所有非 x86 架构上,如果不使用显式互锁,您将面临更复杂的重新排序。
There's also a requirement that if you're using a compare exchange loop, you must mark the variable you're compare exchanging on as volatile. Here's a code example to demonstrate why:
还有一个要求是,如果您使用的是比较交换循环,则必须将要比较交换的变量标记为 volatile。下面是一个代码示例来演示原因:
long g_var = 0; // not marked 'volatile' -- this is an error
bool foo () {
long oldValue;
long newValue;
long retValue;
// (1) Capture the original global value
oldValue = g_var;
// (2) Compute a new value based on the old value
newValue = SomeTransformation(oldValue);
// (3) Store the new value if the global value is equal to old?
retValue = InterlockedCompareExchange(&g_var,
newValue,
oldValue);
if (retValue == oldValue) {
return true;
}
return false;
}
What can go wrong is that the compiler is well within its rights to re-fetch oldValue from g_var at any time if it's not volatile. This 'rematerialization' optimization is great in many cases because it can avoid spilling registers to the stack when register pressure is high.
可能出错的是编译器完全有权随时从 g_var 重新获取 oldValue,如果它不是 volatile 的话。这种“重新实现”优化在许多情况下都很棒,因为它可以避免在寄存器压力很高时将寄存器溢出到堆栈中。
Thus, step (3) of the function would become:
因此,函数的步骤 (3) 将变为:
// (3) Incorrectly store new value regardless of whether the global
// is equal to old.
retValue = InterlockedCompareExchange(&g_var,
newValue,
g_var);
回答by Charlie Martin
you shouldbe okay. It's volatile, so the optimizer shouldn't savage you, and it's a 32-bit value so it should be at least approximately atomic. The one possible surprise is if the instruction pipeline can get around that.
你应该没事。它是不稳定的,所以优化器不应该对你进行攻击,而且它是一个 32 位的值,所以它至少应该是原子的。一个可能的惊喜是指令管道是否可以解决这个问题。
On the other hand, what's the additional cost of using the guarded routines?
另一方面,使用受保护例程的额外成本是多少?
回答by Alphaneo
Current value reading may not need any lock.
当前值读取可能不需要任何锁定。
回答by Doug T.
The Interlocked* functions prevent two different processors from accessing the same piece of memory. In a single processor system you are going to be ok. If you have a dual-core system where you have threads on different cores both accessing this value, you might have problems doing what you think is atomic without the Interlocked*.
Interlocked* 功能可防止两个不同的处理器访问同一块内存。在单处理器系统中,你会没事的。如果您有一个双核系统,其中不同内核上的线程都访问此值,则在没有 Interlocked* 的情况下执行您认为是原子的操作可能会遇到问题。