C++ pthread 互斥的开销?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1277627/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Overhead of pthread mutexes?
提问by cmeerw
I'm trying to make a C++ API (for Linux and Solaris) thread-safe, so that its functions can be called from different threads without breaking internal data structures. In my current approach I'm using pthread mutexes to protect all accesses to member variables. This means that a simple getter function now locks and unlocks a mutex, and I'm worried about the overhead of this, especially as the API will mostly be used in single-threaded apps where any mutex locking seems like pure overhead.
我正在尝试使 C++ API(用于 Linux 和 Solaris)线程安全,以便可以从不同的线程调用其函数而不会破坏内部数据结构。在我目前的方法中,我使用 pthread 互斥锁来保护对成员变量的所有访问。这意味着一个简单的 getter 函数现在可以锁定和解锁互斥锁,我担心它的开销,特别是因为 API 将主要用于单线程应用程序,其中任何互斥锁锁定似乎都是纯粹的开销。
So, I'd like to ask:
所以,我想问一下:
- do you have any experience with performance of single-threaded apps that use locking versus those that don't?
- how expensive are these lock/unlock calls, compared to eg. a simple "return this->isActive" access for a bool member variable?
- do you know better ways to protect such variable accesses?
- 您对使用锁定的单线程应用程序与不使用锁定的单线程应用程序的性能有任何经验吗?
- 与例如相比,这些锁定/解锁调用有多昂贵。对 bool 成员变量的简单“返回 this->isActive”访问?
- 您知道保护此类变量访问的更好方法吗?
回答by cmeerw
All modern thread implementations can handle an uncontended mutex lock entirely in user space (with just a couple of machine instructions) - only when there is contention, the library has to call into the kernel.
所有现代线程实现都可以完全在用户空间中处理无竞争的互斥锁(只需要几条机器指令)——只有当存在争用时,库才必须调用内核。
Another point to consider is that if an application doesn't explicitly link to the pthread library (because it's a single-threaded application), it will only get dummy pthread functions (which don't do any locking at all) - only if the application is multi-threaded (and links to the pthread library), the full pthread functions will be used.
要考虑的另一点是,如果应用程序没有显式链接到 pthread 库(因为它是一个单线程应用程序),它只会获得虚拟 pthread 函数(根本不进行任何锁定)-仅当应用程序是多线程的(并链接到 pthread 库),将使用完整的 pthread 函数。
And finally, as others have already pointed out, there is no point in protecting a getter method for something like isActive with a mutex - once the caller gets a chance to look at the return value, the value might already have been changed (as the mutex is only locked inside the getter method).
最后,正如其他人已经指出的那样,使用互斥锁保护 isActive 之类的 getter 方法是没有意义的 - 一旦调用者有机会查看返回值,该值可能已经更改(因为互斥锁仅锁定在 getter 方法中)。
回答by BillT
"A mutex requires an OS context switch. That is fairly expensive. "
“互斥体需要操作系统上下文切换。这是相当昂贵的。”
- This is not true on Linux, where mutexes are implemented using something called futex'es. Acquiring an uncontested (i.e., not already locked) mutex is, as cmeerw points out, a matter of a few simple instructions, and is typically in the area of 25 nanoseconds w/current hardware.
- 在 Linux 上情况并非如此,其中互斥锁是使用一种称为 futex'es 的东西来实现的。正如 cmeerw 指出的那样,获取一个无争议的(即尚未锁定的)互斥锁是几个简单指令的问题,并且通常在 25 纳秒范围内(使用当前硬件)。
For more info: Futex
欲了解更多信息: Futex
回答by JDonner
This is a bit off-topic but you seem to be new to threading - for one thing, only lock where threads can overlap. Then, try to minimize those places. Also, instead of trying to lock every method, think of what the thread is doing (overall) with an object and make that a single call, and lock that. Try to get your locks as high up as possible (this again increases efficiency and may /help/ to avoid deadlocking). But locks don't 'compose', you have to mentally at least cross-organize your code by where the threads are and overlap.
这有点离题,但您似乎对线程不熟悉 - 一方面,只锁定线程可以重叠的位置。然后,尽量减少这些地方。此外,与其尝试锁定每个方法,不如考虑线程正在对对象执行(总体)什么,并进行一次调用,然后锁定它。尽量让你的锁尽可能高(这再次提高了效率并且可能 /help/ 以避免死锁)。但是锁不会“组合”,您必须至少在精神上通过线程所在的位置和重叠来交叉组织您的代码。
回答by Peter Cardona
I did a similar library and didn't have any trouble with lock performance. (I can't tell you exactly how they're implemented, so I can't say conclusively that it's not a big deal.)
我做了一个类似的库,并且在锁定性能方面没有任何问题。(我不能确切地告诉你它们是如何实现的,所以我不能肯定地说这没什么大不了的。)
I'd go for getting it right first (i.e. use locks) then worry about performance. I don't know of a better way; that's what mutexes were built for.
我会先把它弄好(即使用锁)然后担心性能。我不知道更好的方法;这就是互斥锁的目的。
An alternative for single thread clients would be to use the preprocessor to build a non-locked vs locked version of your library. E.g.:
单线程客户端的另一种选择是使用预处理器来构建库的非锁定版本与锁定版本。例如:
#ifdef BUILD_SINGLE_THREAD
inline void lock () {}
inline void unlock () {}
#else
inline void lock () { doSomethingReal(); }
inline void unlock () { doSomethingElseReal(); }
#endif
Of course, that adds an additional build to maintain, as you'd distribute both single and multithread versions.
当然,这会增加一个额外的构建来维护,因为您将分发单线程和多线程版本。
回答by gbjbaanb
I can tell you from Windows, that a mutex is a kernel object and as such incurs a (relatively) significant locking overhead. To get a better performing lock, when all you need is one that works in threads, is to use a critical section. This would not work across processes, just the threads in a single process.
我可以从 Windows 告诉您,互斥锁是内核对象,因此会产生(相对)显着的锁定开销。为了获得性能更好的锁,当您只需要一个在线程中工作的锁时,就是使用临界区。这不能跨进程工作,只能跨单个进程中的线程工作。
However.. linux is quite a different beast to multi-process locking. I know that a mutex is implemented using the atomic CPU instructions and only apply to a process - so they would have the same performance as a win32 critical section - ie be very fast.
然而.. linux 与多进程锁定完全不同。我知道互斥锁是使用原子 CPU 指令实现的,并且只适用于一个进程 - 因此它们将具有与 win32 临界区相同的性能 - 即非常快。
Of course, the fastest locking is not to have any at all, or to use them as little as possible (but if your lib is to be used in a heavily threaded environment, you will want to lock for as short a time as possible: lock, do something, unlock, do something else, then lock again is better than holding the lock across the whole task - the cost of locking isn't in the time taken to lock, but the time a thread sits around twiddling its thumbs waiting for another thread to release a lock it wants!)
当然,最快的锁定是根本没有任何锁定,或者尽可能少地使用它们(但如果您的库要在线程密集的环境中使用,您将希望锁定尽可能短的时间:锁定,做某事,解锁,做其他事情,然后再次锁定比在整个任务中保持锁定更好 - 锁定的成本不在于锁定所花费的时间,而是线程坐在周围摆弄拇指等待的时间另一个线程释放它想要的锁!)
回答by jalf
A mutex requires an OS context switch. That is fairly expensive. The CPU can still do it hundreds of thousands of times per second without too much trouble, but it is a lot more expensive than nothaving the mutex there. Putting it on everyvariable access is probably overkill.
互斥体需要操作系统上下文切换。那是相当昂贵的。CPU 仍然可以每秒执行数十万次而不会有太多麻烦,但比没有互斥锁要昂贵得多。把它放在每个变量访问上可能是矫枉过正。
It also probably is not what you want. This kind of brute-force locking tends to lead to deadlocks.
它也可能不是您想要的。这种蛮力锁定往往会导致死锁。
do you know better ways to protect such variable accesses?
您知道保护此类变量访问的更好方法吗?
Design your application so that as little data as possible is shared. Some sections of code should be synchronized, probably with a mutex, but only those that are actually necessary. And typically not individualvariable accesses, but tasks containing groups of variable accesses that must be performed atomically. (perhaps you need to set your is_active
flag along with some other modifications. Does it make sense to set that flag and make no further changes to the object?)
设计您的应用程序,以便共享尽可能少的数据。代码的某些部分应该同步,可能使用互斥锁,但仅限于那些真正需要的部分。通常不是单个变量访问,而是包含必须以原子方式执行的变量访问组的任务。(也许您需要设置您的is_active
标志以及其他一些修改。设置该标志并且不对对象进行进一步更改是否有意义?)
回答by user413894
I was curious about the expense of using a pthred_mutex_lock/unlock
.
I had a scenario where I needed to either copy anywhere from 1500-65K bytes without using
a mutex or to use a mutex and do a single write of a pointer to the data needed.
我很好奇使用pthred_mutex_lock/unlock
. 我有一个场景,我需要在不使用互斥锁的情况下复制 1500-65K 字节的任何位置,或者使用互斥锁并执行指向所需数据的指针的单次写入。
I wrote a short loop to test each
我写了一个短循环来测试每个
gettimeofday(&starttime, NULL)
COPY DATA
gettimeofday(&endtime, NULL)
timersub(&endtime, &starttime, &timediff)
print out timediff data
or
或者
ettimeofday(&starttime, NULL)
pthread_mutex_lock(&mutex);
gettimeofday(&endtime, NULL)
pthread_mutex_unlock(&mutex);
timersub(&endtime, &starttime, &timediff)
print out timediff data
If I was copying less than 4000 or so bytes, then the straight copy operation took less time. If however I was copying more than 4000 bytes, then it was less costly to do the mutex lock/unlock.
如果我复制的字节数少于 4000 字节左右,那么直接复制操作花费的时间更少。但是,如果我复制了 4000 多个字节,那么执行互斥锁/解锁的成本会更低。
The timing on the mutex lock/unlock ran between 3 and 5 usec long including the time for the gettimeofday for the currentTime which took about 2 usec
互斥锁/解锁的时间在 3 到 5 微秒之间运行,包括 currentTime 的 gettimeofday 时间,大约需要 2 微秒
回答by Gunther Piez
For member variable access, you should use read/write locks, which have slightly less overhead and allow multiple concurrent reads without blocking.
对于成员变量访问,您应该使用读/写锁,它的开销略少,并且允许多个并发读取而不会阻塞。
In many cases you can use atomic builtins, if your compiler provides them (if you are using gcc or icc __sync_fetch*() and the like), but they are notouriously hard to handle correctly.
在许多情况下,您可以使用原子内置函数,如果您的编译器提供它们(如果您使用 gcc 或 icc __sync_fetch*() 等),但众所周知,它们很难正确处理。
If you can guarantee the access being atomic (for example on x86 an dword read or write is always atomic, if it is aligned, but not a read-modify-write), you can often avoid locks at all and use volatile instead, but this is non portable and requires knowledge of the hardware.
如果你能保证访问是原子的(例如在 x86 上,双字读或写总是原子的,如果它是对齐的,但不是读-修改-写),你通常可以完全避免锁定并使用 volatile,但是这是不可移植的,需要硬件知识。
回答by mox1
Well a suboptimal but simple approach is to place macros around your mutex locks and unlocks. Then have a compiler / makefile option to enable / disable threading.
好吧,一个次优但简单的方法是在互斥锁和解锁周围放置宏。然后有一个编译器/makefile 选项来启用/禁用线程。
Ex.
前任。
#ifdef THREAD_ENABLED
#define pthread_mutex_lock(x) ... //actual mutex call
#endif
#ifndef THREAD_ENABLED
#define pthread_mutex_lock(x) ... //do nothing
#endif
Then when compiling do a gcc -DTHREAD_ENABLED
to enable threading.
然后在编译时执行 agcc -DTHREAD_ENABLED
以启用线程。
Again I would NOT use this method in any large project. But only if you want something fairly simple.
同样,我不会在任何大型项目中使用这种方法。但前提是你想要一些相当简单的东西。