为什么在 Windows 上创建一个新进程比在 Linux 上更昂贵?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47845/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 05:12:27  来源:igfitidea点击:

Why is creating a new process more expensive on Windows than Linux?

windowslinuxperformance

提问by Readonly

I've heard that creating a new process on a Windows box is more expensive than on Linux. Is this true? Can somebody explain the technical reasons for why it's more expensive and provide any historical reasons for the design decisions behind those reasons?

我听说在 Windows 机器上创建一个新进程比在 Linux 上更昂贵。这是真的?有人可以解释为什么它更贵的技术原因,并为这些原因背后的设计决策提供任何历史原因吗?

回答by Johannes Passing

mweerden: NT has been designed for multi-user from day one, so this is not really a reason. However, you are right about that process creation plays a less important role on NT than on Unix as NT, in contrast to Unix, favors multithreading over multiprocessing.

mweerden:NT 从一开始就是为多用户设计的,所以这不是真正的原因。但是,您是对的,进程创建在 NT 上的作用不如在 Unix 上重要,因为 NT 与 Unix 相比,更喜欢多线程而不是多处理。

Rob, it is true that fork is relatively cheap when COW is used, but as a matter of fact, fork is mostly followed by an exec. And an exec has to load all images as well. Discussing the performance of fork therefore is only part of the truth.

Rob,的确,在使用 COW 时 fork 相对便宜,但事实上,fork 后面大多是 exec。执行官还必须加载所有图像。因此,讨论 fork 的性能只是事实的一部分。

When discussing the speed of process creation, it is probably a good idea to distinguish between NT and Windows/Win32. As far as NT (i.e. the kernel itself) goes, I do not think process creation (NtCreateProcess) and thread creation (NtCreateThread) is significantly slower as on the average Unix. There might be a little bit more going on, but I do not see the primary reason for the performance difference here.

在讨论进程创建速度时,最好区分 NT 和 Windows/Win32。就 NT(即内核本身)而言,我不认为进程创建 (NtCreateProcess) 和线程创建 (NtCreateThread) 比一般的 Unix 慢得多。可能还有更多的事情发生,但我没有看到这里性能差异的主要原因。

If you look at Win32, however, you'll notice that it adds quite a bit of overhead to process creation. For one, it requires the CSRSS to be notified about process creation, which involves LPC. It requires at least kernel32 to be loaded additionally, and it has to perform a number of additional bookkeeping work items to be done before the process is considered to be a full-fledged Win32 process. And let's not forget about all the additional overhead imposed by parsing manifests, checking if the image requires a compatbility shim, checking whether software restriction policies apply, yada yada.

但是,如果您查看 Win32,您会注意到它为进程创建增加了相当多的开销。一方面,它要求 CSRSS 收到有关流程创建的通知,这涉及 LPC。它至少需要额外加载 kernel32,并且在该进程被视为成熟的 Win32 进程之前,它必须执行一些额外的簿记工作项目。让我们不要忘记解析清单所带来的所有额外开销,检查图像是否需要兼容性垫片,检查软件限制策略是否适用,yada yada。

That said, I see the overall slowdown in the sum of all those little things that have to be done in addition to the raw creation of a process, VA space, and initial thread. But as said in the beginning -- due to the favoring of multithreading over multitasking, the only software that is seriously affected by this additional expense is poorly ported Unix software. Although this sitatuion changes when software like Chrome and IE8 suddenly rediscover the benefits of multiprocessing and begin to frequently start up and teardown processes...

也就是说,除了进程的原始创建、VA 空间和初始线程之外,我看到所有必须完成的小事情总和的总体放缓。但正如开头所说——由于多线程比多任务更受青睐,唯一受到这种额外费用严重影响的软件是移植不良的 Unix 软件。尽管当 Chrome 和 IE8 等软件突然重新发现多处理的好处并开始频繁启动和拆卸进程时,这种情况发生了变化……

回答by Rob Walker

Unix has a 'fork' system call which 'splits' the current process into two, and gives you a second process that is identical to the first (modulo the return from the fork call). Since the address space of the new process is already up and running this is should be cheaper than calling 'CreateProcess' in Windows and having it load the exe image, associated dlls, etc.

Unix 有一个“fork”系统调用,它将当前进程“拆分”为两个,并为您提供与第一个相同的第二个进程(以 fork 调用的返回为模)。由于新进程的地址空间已经启动并运行,这应该比在 Windows 中调用“CreateProcess”并让它加载 exe 映像、关联的 dll 等更便宜。

In the fork case the OS can use 'copy-on-write' semantics for the memory pages associated with both new processes to ensure that each one gets their own copy of the pages they subsequently modify.

在 fork 情况下,操作系统可以对与两个新进程关联的内存页面使用“写时复制”语义,以确保每个进程都获得他们随后修改的页面的自己的副本。

回答by Chris Smith

Adding to what JP said: most of the overhead belongs to Win32 startup for the process.

添加JP所说的:大部分开销属于该过程的Win32启动。

The Windows NT kernel actually does support COW fork. SFU(Microsoft's UNIX environment for Windows) uses them. However, Win32 does not support fork. SFU processes are not Win32 processes. SFU is orthogonal to Win32: they are both environment subsystems built on the same kernel.

Windows NT 内核实际上确实支持 COW fork。SFU(Microsoft 的 Windows UNIX 环境)使用它们。但是,Win32 不支持 fork。SFU 进程不是 Win32 进程。SFU 与 Win32 正交:它们都是构建在同一内核上的环境子系统。

In addition to the out-of-process LPC calls to CSRSS, in XP and later there is an out of process call to the application compatibility engine to find the program in the application compatibility database. This step causes enough overhead that Microsoft provides a group policy option to disable the compatibility engine on WS2003for performance reasons.

除了对 的进程外 LPC 调用外CSRSS,在 XP 及更高版本中,还有对应用程序兼容性引擎的进程外调用,以在应用程序兼容性数据库中查找程序。此步骤会产生足够的开销,因此出于性能原因,Microsoft 提供了一个组策略选项来禁用 WS2003 上的兼容性引擎

The Win32 runtime libraries (kernel32.dll, etc.) also do a lot of registry reads and initialization on startup that don't apply to UNIX, SFU or native processes.

Win32 运行时库(kernel32.dll 等)还会在启动时执行大量不适用于 UNIX、SFU 或本机进程的注册表读取和初始化。

Native processes (with no environment subsystem) are very fast to create. SFU does a lot less than Win32 for process creation, so its processes are also fast to create.

本机进程(没有环境子系统)的创建速度非常快。SFU 在进程创建方面比 Win32 少很多,因此它的进程创建速度也很快。

UPDATE FOR 2019: add LXSS: Windows Subsystem for Linux

2019 年更新:添加 LXSS:适用于 Linux 的 Windows 子系统

Replacing SFU for Windows 10 is the LXSS environment subsystem. It is 100% kernel mode and does not require any of that IPC that Win32 continues to have. Syscall for these processes is directed directly to lxss.sys/lxcore.sys, so the fork() or other process creating call only costs 1 system call for the creator, total. [A data area called the instance] keeps track of all LX processes, threads, and runtime state.

替换 Windows 10 的 SFU 是 LXSS 环境子系统。它是 100% 内核模式,不需要 Win32 继续拥有的任何 IPC。这些进程的系统调用直接指向 lxss.sys/lxcore.sys,因此 fork() 或其他进程创建调用仅花费创建者的 1 个系统调用,总计。[称为实例的数据区域] 跟踪所有 LX 进程、线程和运行时状态。

LXSS processes are based on native processes, not Win32 processes. All the Win32 specific stuff like the compatibility engine aren't engaged at all.

LXSS 进程基于本机进程,而不是 Win32 进程。所有 Win32 特定的东西,如兼容性引擎,根本没有参与。

回答by VolkerK

In addition to the answer of Rob Walker: Nowadays you have things like the Native POSIX Thread Library - if you want. But for a long time the only way to "delegate" the work in the unix world was to use fork() (and it's still prefered in many, many circumstances). e.g. some kind of socket server

除了 Rob Walker 的回答:如今,如果您愿意,您还可以使用 Native POSIX Thread Library 之类的东西。但是很长一段时间以来,在 unix 世界中“委托”工作的唯一方法是使用 fork() (并且在许多情况下仍然首选)。例如某种套接字服务器

socket_accept()
fork()
if (child)
    handleRequest()
else
    goOnBeingParent()
因此,fork 的实施必须很快,并且随着时间的推移已经实施了大量优化。微软认可 CreateThread 甚至纤程,而不是创建新进程和使用进程间通信。我认为将 CreateProcess 与 fork 进行比较并不“公平”,因为它们不可互换。将 fork/exec 与 CreateProcess 进行比较可能更合适。

回答by mweerden

The key to this matter is the historical usage of both systems, I think. Windows (and DOS before that) have originally been single-user systems for personalcomputers. As such, these systems typically don't have to create a lot of processes all the time; (very) simply put, a process is only created when this one lonely user requests it (and we humans don't operate very fast, relatively speaking).

我认为,这件事的关键是这两个系统的历史使用情况。Windows(以及之前的 DOS)最初是个人计算机的单用户系统。因此,这些系统通常不必一直创建大量流程;(非常)简单地说,只有当这个孤独的用户请求它时才会创建一个进程(相对而言,我们人类的操作速度不是很快)。

Unix-based systems have originally been multi-user systems and servers. Especially for the latter it is not uncommon to have processes (e.g. mail or http daemons) that split off processes to handle specific jobs (e.g. taking care of one incoming connection). An important factor in doing this is the cheap forkmethod (that, as mentioned by Rob Walker (47865), initially uses the same memory for the newly created process) which is very useful as the new process immediately has all the information it needs.

基于 Unix 的系统最初是多用户系统和服务器。特别是对于后者,将进程(例如邮件或 http 守护进程)拆分为处理特定作业(例如处理一个传入连接)的进程并不少见。这样做的一个重要因素是廉价fork方法(正如 Rob Walker ( 47865)所提到的,最初为新创建的进程使用相同的内存)非常有用,因为新进程立即拥有它需要的所有信息。

It is clear that at least historically the need for Unix-based systems to have fast process creation is far greater than for Windows systems. I think this is still the case because Unix-based systems are still very process oriented, while Windows, due to its history, has probably been more thread oriented (threads being useful to make responsive applications).

很明显,至少在历史上,基于 Unix 的系统对快速创建进程的需求远远大于 Windows 系统。我认为情况仍然如此,因为基于 Unix 的系统仍然非常面向进程,而 Windows 由于其历史,可能更面向线程(线程对于制作响应式应用程序很有用)。

Disclaimer: I'm by no means an expert on this matter, so forgive me if I got it wrong.

免责声明:我绝不是这方面的专家,所以如果我弄错了,请原谅我。

回答by DigitalRoss

The short answer is "software layers and components".

简短的回答是“软件层和组件”。

The windows SW architecture has a couple of additional layers and components that don't exist on Unix or are simplified and handled inside the kernel on Unix.

windows SW 架构有几个额外的层和组件,这些层和组件在 Unix 上不存在,或者在 Unix 上的内核中进行了简化和处理。

On Unix, fork and exec are direct calls to the kernel.

在 Unix 上,fork 和 exec 是对内核的直接调用。

On Windows, the kernel API is not used directly, there is win32 and certain other components on top of it, so process creation must go through extra layers and then the new process must start up or connect to those layers and components.

在 Windows 上,不直接使用内核 API,它上面有 win32 和某些其他组件,因此进程创建必须经过额外的层,然后新进程必须启动或连接到这些层和组件。

For quite some time researchers and corporations have attempted to break up Unix in a vaguely similar way, usually basing their experiments on the Mach kernel; a well-known example is OS X.. Every time they try, though, it gets so slow they end up at least partially merging the pieces back into the kernel either permanently or for production shipments.

很长一段时间以来,研究人员和公司都试图以一种模糊相似的方式分解 Unix,通常将他们的实验基于Mach 内核;一个众所周知的例子是OS X.。然而,每次他们尝试时,它都会变得如此缓慢,以至于他们最终至少将部分永久或用于生产出货的部分合并回内核。

回答by Tim Williscroft

Uh, there seems to be a lot of "it's better this way" sort of justification going on.

呃,似乎有很多“这样更好”的理由。

I think people could benefit from reading "Showstopper"; the book about the development of Windows NT.

我认为人们可以从阅读“Showstopper”中受益;关于 Windows NT 开发的书。

The whole reason the services run as DLL's in one process on Windows NT was that they were too slow as separate processes.

这些服务在 Windows NT 上的一个进程中作为 DLL 运行的全部原因是它们作为单独的进程太慢了。

If you got down and dirty you'd find that the library loading strategy is the problem.

如果您感到沮丧和肮脏,您会发现库加载策略是问题所在。

On Unices ( in general) the Shared libraries (DLL's) code segments are actually shared.

在 Unices 上(通常),共享库 (DLL) 代码段实际上是共享的。

Windows NT loads a copy of the DLL per process, becauase it manipulates the library code segment (and executable code segment) after loading. (Tells it where is your data ?)

Windows NT 为每个进程加载 DLL 的副本,因为它在加载后操作库代码段(和可执行代码段)。(告诉它你的数据在哪里?)

This results in code segments in libraries that are not reusable.

这会导致库中的代码段不可重用。

So, the NT process create is actually pretty expensive. And on the down side, it makes DLL's no appreciable saving in memory, but a chance for inter-app dependency problems.

因此,NT 进程创建实际上非常昂贵。不利的一面是,它不会显着节省 DLL 的内存,但可能会出现应用间依赖问题。

Sometimes it pays in engineering to step back and say, "now, if we were going to design this to really suck, what would it look like?"

有时在工程中退一步说,“现在,如果我们要把它设计得非常糟糕,它会是什么样子?”

I worked with an embedded system that was quite temperamental once upon a time, and one day looked at it and realized it was a cavity magnetron, with the electronics in the microwave cavity. We made it much more stable (and less like a microwave) after that.

我曾经使用过一个非常喜怒无常的嵌入式系统,有一天看着它并意识到它是一个腔磁控管,电子设备在微波腔中。在那之后,我们使它更加稳定(不像微波炉)。

回答by ctrl-alt-delor

As there seems to be some justification of MS-Windows in some of the answers e.g.

因为在某些答案中似乎有一些 MS-Windows 的理由,例如

  • “NT kernel and Win32, are not the same thing. If you program to NT kernel then it is not so bad” — True, but unless you are writing a Posix subsystem, then who cares. You will be writing to win32.
  • “It is not fair to compare fork, with ProcessCreate, as they do different things, and Windows does not have fork“ — True, So I will compare like with like. However I will also compare fork, because it has many many use cases, such as process isolation (e.g. each tab of a web browser runs in a different process).
  • “NT内核和Win32,不是一回事。如果你对 NT 内核进行编程,那么它并没有那么糟糕”——没错,但除非你正在编写 Posix 子系统,否则谁在乎。您将写入 win32。
  • “将 fork 与 ProcessCreate 进行比较是不公平的,因为它们做不同的事情,而 Windows 没有 fork”——是的,所以我会比较 like 和 like。但是我也会比较 fork,因为它有很多用例,例如进程隔离(例如,Web 浏览器的每个选项卡都在不同的进程中运行)。

Now let us look at the facts, what is the difference in performance?

现在让我们看看事实,性能上有什么不同?

Data summerised from http://www.bitsnbites.eu/benchmarking-os-primitives/.
Because bias is inevitable, when summarising, I did it in favour of MS-Windows
Hardware for most tests i7 8 core 3.2GHz. Except Raspberry-Pi running Gnu/Linux

数据汇总自http://www.bitsnbites.eu/benchmarking-os-primitives/
因为偏见是不可避免的,所以在总结时,我在大多数测试中支持 MS-Windows
硬件 i7 8 核 3.2GHz。除了运行 Gnu/Linux 的 Raspberry-Pi

A comparison of various basic operations, on Gnu/Linux, Apple-Mac, and Microsoft's Windows (smaller is better)

Gnu/Linux、Apple-Mac 和 Microsoft 的 Windows 上各种基本操作的比较(越小越好)

A comparison of MS-Windows process create vs Linux

MS-Windows 进程创建与 Linux 的比较

Notes: On linux, forkis faster that MS-Window's preferred method CreateThread.

注意:在 linux 上,fork比 MS-Window 的首选方法更快CreateThread

Numbers for process creation type operations (because it is hard to see the value for Linux in the chart).

进程创建类型操作的数字(因为很难在图表中看到 Linux 的值)。

In order of speed, fastest to slowest (numbers are time, small is better).

按照速度,从最快到最慢(数字是时间,越小越好)。

  • Linux CreateThread 12
  • Mac CreateThread 15
  • Linux Fork 19
  • Windows CreateThread 25
  • Linux CreateProcess (fork+exec) 45
  • Mac Fork 105
  • Mac CreateProcess (fork+exec) 453
  • Raspberry-Pi CreateProcess (fork+exec) 501
  • Windows CreateProcess 787
  • Windows CreateProcess With virus scanner 2850
  • Windows Fork (simulate with CreateProcess + fixup) grater than 2850
  • Linux 创建线程 12
  • Mac 创建线程 15
  • Linux 叉 19
  • Windows 创建线程 25
  • Linux CreateProcess (fork+exec) 45
  • 麦克叉 105
  • Mac CreateProcess (fork + exec) 453
  • 树莓派 CreateProcess (fork+exec) 501
  • Windows 创建进程 787
  • 带有病毒扫描程序的 Windows CreateProcess 2850
  • Windows Fork(用 CreateProcess + fixup 模拟)大于 2850

Numbers for other measurements

其他测量的编号

  • Creating a file.
    • Linux 13
    • Mac 113
    • Windows 225
    • Raspberry-Pi (with slow SD card) 241
    • Windows with defender and virus scanner etc 12950
  • Allocating memory
    • Linux 79
    • Windows 93
    • Mac 152
  • 创建文件。
    • Linux 13
    • 麦克 113
    • 视窗 225
    • 树莓派(带慢速 SD 卡) 241
    • 带有防御程序和病毒扫描程序等的 Windows 12950
  • 分配内存
    • Linux 79
    • 视窗 93
    • 麦克 152

回答by hacksoncode

It's also worth noting that the security model in Windows is vastly more complicated than in unix-based OSs, which adds a lot of overhead during process creation. Yet another reason why multithreading is preferred to multiprocessing in Windows.

还值得注意的是,Windows 中的安全模型比基于 Unix 的操作系统复杂得多,这在进程创建过程中增加了很多开销。在 Windows 中,多线程优于多处理的另一个原因。

回答by gabr

All that plus there's the fact that on the Win machine most probably an antivirus software will kick in during the CreateProcess... That's usually the biggest slowdown.

除此之外,还有一个事实是,在 Win 机器上,杀毒软件很可能会在 CreateProcess 期间启动……这通常是最大的减速。