Linux 中的线程与进程
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/807506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Threads vs Processes in Linux
提问by user17918
I've recently heard a few people say that in Linux, it is almost always better to use processes instead of threads, since Linux is very efficient in handling processes, and because there are so many problems (such as locking) associated with threads. However, I am suspicious, because it seems like threads could give a pretty big performance gain in some situations.
我最近听到一些人说在 Linux 中,使用进程而不是线程几乎总是更好,因为 Linux 在处理进程方面非常有效,并且因为有很多与线程相关的问题(例如锁定)。但是,我很怀疑,因为在某些情况下线程似乎可以提供相当大的性能提升。
So my question is, when faced with a situation that threads and processes could both handle pretty well, should I use processes or threads? For example, if I were writing a web server, should I use processes or threads (or a combination)?
所以我的问题是,当面临线程和进程都可以很好处理的情况时,我应该使用进程还是线程?例如,如果我正在编写一个 Web 服务器,我应该使用进程还是线程(或两者的组合)?
回答by eduffy
I'd have to agree with what you've been hearing. When we benchmark our cluster (xhpl
and such), we always get significantly better performance with processes over threads. </anecdote>
我不得不同意你所听到的。当我们对集群(xhpl
等等)进行基准测试时,我们总是通过线程上的进程获得显着更好的性能。</anecdote>
回答by Yuval Adam
If you need to share resources, you really should use threads.
如果你需要共享资源,你真的应该使用线程。
Also consider the fact that context switches between threads are much less expensive than context switches between processes.
还要考虑这样一个事实,即线程之间的上下文切换比进程之间的上下文切换要便宜得多。
I see no reason to explicitly go with separate processes unless you have a good reason to do so (security, proven performance tests, etc...)
我认为没有理由明确使用单独的流程,除非您有充分的理由这样做(安全性、经过验证的性能测试等...)
回答by Adam Rosenfield
That depends on a lot of factors. Processes are more heavy-weight than threads, and have a higher startup and shutdown cost. Interprocess communication (IPC) is also harder and slower than interthread communication.
这取决于很多因素。进程比线程更重,启动和关闭成本更高。进程间通信 (IPC) 也比线程间通信更难和更慢。
Conversely, processes are safer and more secure than threads, because each process runs in its own virtual address space. If one process crashes or has a buffer overrun, it does not affect any other process at all, whereas if a thread crashes, it takes down all of the other threads in the process, and if a thread has a buffer overrun, it opens up a security hole in all of the threads.
相反,进程比线程更安全,更安全,因为每个进程都运行在自己的虚拟地址空间中。如果一个进程崩溃或缓冲区溢出,它根本不会影响任何其他进程,而如果一个线程崩溃,它会关闭进程中的所有其他线程,如果一个线程有缓冲区溢出,它就会打开所有线程中的安全漏洞。
So, if your application's modules can run mostly independently with little communication, you should probably use processes if you can afford the startup and shutdown costs. The performance hit of IPC will be minimal, and you'll be slightly safer against bugs and security holes. If you need every bit of performance you can get or have a lot of shared data (such as complex data structures), go with threads.
因此,如果您的应用程序模块可以在几乎没有通信的情况下大部分独立运行,那么如果您负担得起启动和关闭成本,您可能应该使用进程。IPC 的性能影响将是最小的,并且您在抵御错误和安全漏洞方面会稍微安全一些。如果您需要获得或拥有大量共享数据(例如复杂数据结构)的每一点性能,请使用线程。
回答by hlovdal
The decision between thread/process depends a little bit on what you will be using it to. One of the benefits with a process is that it has a PID and can be killed without also terminating the parent.
线程/进程之间的决定在一定程度上取决于您将使用它做什么。进程的好处之一是它有一个 PID,可以在不终止父进程的情况下被终止。
For a real world example of a web server, apache 1.3 used to only support multiple processes, but in in 2.0 they added an abstractionso that you can swtch between either. Commentsseemstoagree that processes are more robust but threads can give a little bit better performance (except for windows where performance for processes sucks and you only want to use threads).
对于 Web 服务器的真实示例,apache 1.3 过去仅支持多个进程,但在 2.0 中,他们添加了一个抽象,以便您可以在两者之间切换。评论似乎以同意,工艺都比较稳健,但线程可以给一点点更好的性能(除了其中的工艺性能很烂的窗口,你只需要使用线程)。
回答by Robert
How tightly coupled are your tasks?
你的任务有多紧密?
If they can live independently of each other, then use processes. If they rely on each other, then use threads. That way you can kill and restart a bad process without interfering with the operation of the other tasks.
如果它们可以彼此独立存在,则使用流程。如果它们相互依赖,则使用线程。这样您就可以终止并重新启动坏进程,而不会干扰其他任务的运行。
回答by dmckee --- ex-moderator kitten
Others have discussed the considerations.
其他人已经讨论了这些考虑因素。
Perhaps the important difference is that in Windows processes are heavy and expensive compared to threads, and in Linux the difference is much smaller, so the equation balances at a different point.
也许重要的区别在于,与线程相比,Windows 中的进程既繁重又昂贵,而在 Linux 中,差异要小得多,因此等式在不同的点上保持平衡。
回答by MarkR
Linux (and indeed Unix) gives you a third option.
Linux(实际上是Unix)为您提供了第三种选择。
Option 1 - processes
选项 1 - 流程
Create a standalone executable which handles some part (or all parts) of your application, and invoke it separately for each process, e.g. the program runs copies of itself to delegate tasks to.
创建一个独立的可执行文件来处理应用程序的某些部分(或所有部分),并为每个进程单独调用它,例如程序运行自身的副本以将任务委派给。
Option 2 - threads
选项 2 - 线程
Create a standalone executable which starts up with a single thread and create additional threads to do some tasks
创建一个独立的可执行文件,它以单个线程启动并创建其他线程来执行某些任务
Option 3 - fork
选项 3 - 叉子
Only available under Linux/Unix, this is a bit different. A forked process really is its own process with its own address space - there is nothing that the child can do (normally) to affect its parent's or siblings address space (unlike a thread) - so you get added robustness.
仅在 Linux/Unix 下可用,这有点不同。一个分叉的进程实际上是它自己的进程,拥有自己的地址空间——子进程(通常)无法影响其父进程或兄弟进程的地址空间(与线程不同)——因此您获得了额外的健壮性。
However, the memory pages are not copied, they are copy-on-write, so less memory is usually used than you might imagine.
但是,内存页不是复制的,它们是写时复制的,因此通常使用的内存比您想象的要少。
Consider a web server program which consists of two steps:
考虑一个包含两个步骤的 Web 服务器程序:
- Read configuration and runtime data
- Serve page requests
- 读取配置和运行时数据
- 服务页面请求
If you used threads, step 1 would be done once, and step 2 done in multiple threads. If you used "traditional" processes, steps 1 and 2 would need to be repeated for each process, and the memory to store the configuration and runtime data duplicated. If you used fork(), then you can do step 1 once, and then fork(), leaving the runtime data and configuration in memory, untouched, not copied.
如果您使用线程,则步骤 1 将完成一次,步骤 2 将在多个线程中完成。如果您使用“传统”流程,则需要为每个流程重复步骤 1 和 2,并且需要复制用于存储配置和运行时数据的内存。如果您使用 fork(),那么您可以执行一次步骤 1,然后执行 fork(),将运行时数据和配置保留在内存中,不受影响,不复制。
So there are really three choices.
所以真的只有三个选择。
回答by ephemient
Linux uses a 1-1 threading model, with (to the kernel) no distinction between processes and threads -- everything is simply a runnable task. *
Linux 使用 1-1 线程模型,(对于内核)进程和线程之间没有区别——一切都只是一个可运行的任务。*
On Linux, the system call clone
clones a task, with a configurable level of sharing, among which are:
在 Linux 上,系统调用会clone
克隆一个具有可配置共享级别的任务,其中包括:
CLONE_FILES
: share the same file descriptor table (instead of creating a copy)CLONE_PARENT
: don't set up a parent-child relationship between the new task and the old (otherwise, child'sgetppid()
= parent'sgetpid()
)CLONE_VM
: share the same memory space (instead of creating a COWcopy)
CLONE_FILES
: 共享相同的文件描述符表(而不是创建副本)CLONE_PARENT
: 不要在新任务和旧任务之间建立父子关系(否则 child'sgetppid()
= parent'sgetpid()
)CLONE_VM
: 共享相同的内存空间(而不是创建一个COW副本)
fork()
calls clone(
least sharing)
and pthread_create()
calls clone(
most sharing)
. **
fork()
通话clone(
最少共享)
和pthread_create()
通话clone(
最多共享)
。**
fork
ing costs a tiny bit more than pthread_create
ing because of copying tables and creating COW mappings for memory, but the Linux kernel developers have tried (and succeeded) at minimizing those costs.
fork
pthread_create
由于复制表和为内存创建 COW 映射,ing 的成本比ing 略高,但 Linux 内核开发人员已尝试(并成功)将这些成本降至最低。
Switching between tasks, if they share the same memory space and various tables, will be a tiny bit cheaper than if they aren't shared, because the data may already be loaded in cache. However, switching tasks is still very fast even if nothing is shared -- this is something else that Linux kernel developers try to ensure (and succeed at ensuring).
在任务之间切换,如果它们共享相同的内存空间和不同的表,将比不共享时便宜一点,因为数据可能已经加载到缓存中。然而,即使没有共享任何东西,切换任务仍然非常快——这是 Linux 内核开发人员试图确保(并成功确保)的另一件事。
In fact, if you are on a multi-processor system, notsharing may actually be beneficial to performance: if each task is running on a different processor, synchronizing shared memory is expensive.
事实上,如果您在一个多处理器系统上,不共享实际上可能对性能有益:如果每个任务都在不同的处理器上运行,那么同步共享内存的成本很高。
* Simplified. CLONE_THREAD
causes signals delivery to be shared (which needs CLONE_SIGHAND
, which shares the signal handler table).
* 简化。 CLONE_THREAD
导致共享信号传递(需要CLONE_SIGHAND
,共享信号处理程序表)。
** Simplified. There exist both SYS_fork
and SYS_clone
syscalls, but in the kernel, the sys_fork
and sys_clone
are both very thin wrappers around the same do_fork
function, which itself is a thin wrapper around copy_process
. Yes, the terms process
, thread
, and task
are used rather interchangeably in the Linux kernel...
** 简化。SYS_fork
和SYS_clone
系统调用都存在,但在内核中,sys_fork
和sys_clone
都是围绕同一个do_fork
函数的非常薄的包装器,它本身是围绕copy_process
. 是的,术语process
、thread
和task
在 Linux 内核中可以互换使用......
回答by KeyserSoze
To complicate matters further, there is such a thing as thread-local storage, and Unix shared memory.
更复杂的是,还有诸如线程本地存储和 Unix 共享内存之类的东西。
Thread-local storage allows each thread to have a separate instance of global objects. The only time I've used it was when constructing an emulation environment on linux/windows, for application code that ran in an RTOS. In the RTOS each task was a process with it's own address space, in the emulation environment, each task was a thread (with a shared address space). By using TLS for things like singletons, we were able to have a separate instance for each thread, just like under the 'real' RTOS environment.
线程本地存储允许每个线程拥有一个单独的全局对象实例。我唯一一次使用它是在 linux/windows 上构建仿真环境时,用于在 RTOS 中运行的应用程序代码。在 RTOS 中,每个任务都是一个拥有自己地址空间的进程,在仿真环境中,每个任务都是一个线程(具有共享地址空间)。通过将 TLS 用于诸如单例之类的事情,我们能够为每个线程拥有一个单独的实例,就像在“真实”RTOS 环境下一样。
Shared memory can (obviously) give you the performance benefits of having multiple processes access the same memory, but at the cost/risk of having to synchronize the processes properly. One way to do that is have one process create a data structure in shared memory, and then send a handle to that structure via traditional inter-process communication (like a named pipe).
共享内存可以(显然)为您提供多个进程访问同一内存的性能优势,但代价是必须正确同步进程。一种方法是让一个进程在共享内存中创建一个数据结构,然后通过传统的进程间通信(如命名管道)向该结构发送一个句柄。
回答by robert.berger
Once upon a time there was Unix and in this good old Unix there was lots of overhead for processes, so what some clever people did was to create threads, which would share the same address space with the parent process and they only needed a reduced context switch, which would make the context switch more efficient.
从前有 Unix,在这个很好的老 Unix 中,进程有很多开销,所以一些聪明的人所做的是创建线程,这些线程将与父进程共享相同的地址空间,他们只需要减少上下文switch,这将使上下文切换更有效。
In a contemporary Linux (2.6.x) there is not much difference in performance between a context switch of a process compared to a thread (only the MMU stuff is additional for the thread). There is the issue with the shared address space, which means that a faulty pointer in a thread can corrupt memory of the parent process or another thread within the same address space.
在当代 Linux (2.6.x) 中,与线程相比,进程的上下文切换之间的性能没有太大差异(只有线程的 MMU 内容是额外的)。共享地址空间存在问题,这意味着线程中的错误指针可能会破坏父进程或同一地址空间内另一个线程的内存。
A process is protected by the MMU, so a faulty pointer will just cause a signal 11 and no corruption.
进程受 MMU 保护,因此错误的指针只会导致信号 11 而不会损坏。
I would in general use processes (not much context switch overhead in Linux, but memory protection due to MMU), but pthreads if I would need a real-time scheduler class, which is a different cup of tea all together.
我通常会使用进程(Linux 中的上下文切换开销不大,但是由于 MMU 的内存保护),但是如果我需要实时调度程序类,则使用 pthreads,这是另一杯茶。
Why do you think threads are have such a big performance gain on Linux? Do you have any data for this, or is it just a myth?
为什么你认为线程在 Linux 上有如此大的性能提升?你有这方面的任何数据,还是只是一个神话?