Linux 为什么 pthread mutex 被认为比 futex “慢”?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6364314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why is a pthread mutex considered "slower" than a futex?
提问by Jason
Why are POSIX mutexes considered heavier or slower than futexes? Where is the overhead coming from in the pthread mutex type? I've heard that pthread mutexes are based on futexes, and when uncontested, do not make any calls into the kernel. It seems then that a pthread mutex is merely a "wrapper" around a futex.
为什么 POSIX 互斥体被认为比 futex 更重或更慢?pthread 互斥类型的开销来自哪里?我听说 pthread 互斥锁是基于 futexes 的,在没有争议的情况下,不要对内核进行任何调用。似乎 pthread 互斥锁只是围绕 futex 的“包装器”。
Is the overhead simply in the function-wrapper call and the need for the mutex function to "setup" the futex (i.e., basically the setup of the stack for the pthread mutex function call)? Or are there some extra memory barrier steps taking place with the pthread mutex?
开销是否仅仅是函数包装器调用中的开销以及互斥函数“设置”futex 的需要(即,基本上是为 pthread 互斥函数调用设置堆栈)?或者 pthread 互斥体是否发生了一些额外的内存屏障步骤?
采纳答案by Nektarios
Because they stay in userspace as much as possible, which means they require fewer system calls, which is inherently faster because the context switch between user and kernel mode is expensive.
因为它们尽可能多地留在用户空间,这意味着它们需要更少的系统调用,这本身就更快,因为用户模式和内核模式之间的上下文切换很昂贵。
I assume you're talking about kernelthreads when you talk about POSIX threads. It's entirely possible to have an entirely userspace implementation of POSIX threads which require no system calls but have other issues of their own.
当您谈论 POSIX 线程时,我假设您在谈论内核线程。完全有可能拥有 POSIX 线程的完全用户空间实现,它不需要系统调用,但有自己的其他问题。
My understanding is that a futex is halfway between a kernel POSIX thread and a userspace POSIX thread.
我的理解是 futex 介于内核 POSIX 线程和用户空间 POSIX 线程之间。
回答by ninjalj
Futexes were created to improve the performance of pthread mutexes. NPTL uses futexes, LinuxThreads predated futexes, which I think is where the "slower" consideration comes. NPTL mutexes may have some additional overhead, but it shouldn't be much.
Futex 的创建是为了提高 pthread 互斥锁的性能。NPTL 使用 futex,LinuxThreads 早于 futex,我认为这是“较慢”考虑的来源。NPTL 互斥体可能有一些额外的开销,但应该不会太多。
Edit:The actual overhead basically consists on:
编辑:实际开销主要包括:
- selecting the correct algorithm for the mutex type (normal, recursive, adaptive, error-checking; normal, robust, priority-inheritance, priority-protected), where the code heavily hints to the compiler that we are likely using a normal mutex (so it should convey that to the CPU's branch prediction logic),
- and a write of the current owner of the mutex if we manage to take it which should normally be fast, since it resides in the same cache-line as the actual lock which we have just taken, unless the lock is heavily contended and some other CPU accessed the lock between the time we took it and when we attempted to write the owner (this write is unneeded for normal mutexes, but needed for error-checking and recursive mutexes).
- 为互斥类型选择正确的算法(正常、递归、自适应、错误检查;正常、健壮、优先级继承、优先级保护),其中代码向编译器大量提示我们可能使用正常互斥(所以它应该将其传达给 CPU 的分支预测逻辑),
- 以及如果我们设法获取互斥锁的当前所有者的写入,这通常应该很快,因为它与我们刚刚获取的实际锁驻留在同一缓存行中,除非锁被严重争用和其他一些CPU 在我们获取锁和我们尝试写入所有者之间访问锁(此写入对于普通互斥锁是不需要的,但对于错误检查和递归互斥锁是必需的)。
So, a few cycles (typical case) to a few cycles + a branch misprediction + an additional cache miss (very worst case).
因此,几个周期(典型情况)到几个周期 + 一个分支预测错误 + 一个额外的缓存未命中(非常糟糕的情况)。
回答by David Schwartz
The short answer to your question is that futexes are known to be implemented about as efficiently as possible, while a pthread mutex may or may not be. At minimum, a pthread mutex has overhead associated with determining the type of mutex and futexes do not. So a futex will almost always be at least as efficient as a pthread mutex, until and unless someone thinks up some structure lighter than a futex and then releases a pthreads implementation that uses that for its default mutex.
对您的问题的简短回答是,众所周知,futexes 的实现效率尽可能高,而 pthread 互斥锁可能会也可能不会。至少,pthread 互斥锁具有与确定互斥锁类型相关的开销,而 futex 则没有。因此,futex 几乎总是至少与 pthread 互斥锁一样有效,直到并且除非有人想出比 futex 更轻的结构,然后发布将其用作默认互斥锁的 pthreads 实现。
回答by user696732
On AMD64 a futex is 4 bytes, while a NPTL pthread_mutex_t is 56 bytes! Yes, there is a significant overhead.
在 AMD64 上,futex 是 4 个字节,而 NPTL pthread_mutex_t 是 56 个字节!是的,有很大的开销。
回答by Mark Veltzer
Technically speaking pthread mutexes are not slower or faster than futexes. pthread is just a standard API, so whether they are slow or fast depends on the implementation of that API.
从技术上讲,pthread 互斥锁并不比 futex 慢或快。pthread 只是一个标准的 API,所以它们是慢还是快取决于该 API的实现。
Specifically in Linux pthread mutexes are implemented as futexes and are therefore fast. Actually, you don't want to use the futex API itself as it is very hard to use, does not have the appropriate wrapper functions in glibc and requires coding in assembly which would be non portable. Fortunately for us the glibc maintainers already coded all of this for us under the hood of the pthread mutex API.
特别是在 Linux pthread 中,互斥锁被实现为 futex,因此速度很快。实际上,您不想使用 futex API 本身,因为它很难使用,在 glibc 中没有适当的包装函数,并且需要在汇编中进行编码,这将是不可移植的。幸运的是,glibc 维护者已经在 pthread mutex API 的幕后为我们编写了所有这些代码。
Now, because most operating systems did not implement futexesthen programmers usually mean by pthread mutex is the performance you get from usual implementation of pthread mutexes, which is, slower.
现在,因为大多数操作系统没有实现 futex,所以程序员通常所说的 pthread mutex 是你从 pthread mutex 的通常实现中获得的性能,即较慢。
So it's a statistical fact that in most operating systems that are POSIX compliant the pthread mutex is implemented in kernel space and is slower than a futex. In Linux they have the same performance. It could be that there are other operating systems where pthread mutexes are implemented in user space (in the uncontended case) and therefore have better performance but I am only aware of Linux at this point.
因此,在大多数符合 POSIX 的操作系统中,pthread 互斥锁是在内核空间中实现的,并且比 futex 慢,这是一个统计事实。在 Linux 中,它们具有相同的性能。可能还有其他操作系统在用户空间中实现了 pthread 互斥锁(在无争用的情况下),因此具有更好的性能,但我目前只知道 Linux。