Linux 为什么我必须为子进程`wait()`？

Question

提问by Robby75

Even though the linux man page for wait 1explains very well that you need to wait()for child processes for them no to turn into zombies, it does not tell why at all.

尽管等待1的 linux 手册页很好地解释了您需要wait()让子进程不要变成僵尸，但它根本没有说明原因。

I planned my program (which is my first multithreaded one, so excuse my naivity) around a for(;;)ever loop that starts child processes which get exec()ed away and are sure to terminate on their own.

我围绕一个for(;;)ever循环计划了我的程序（这是我的第一个多线程程序，所以请原谅我的天真），该循环启动子进程，这些子进程被exec()删除并且肯定会自行终止。

I cannot use wait(NULL)because that makes parallel computation impossible, therefore I'll probably have to add a process table that stores the child pids and have to use waitpid- not immideately, but after some time has passed - which is a problem, because the running time of the children varies from few microseconds to several minutes. If I use waitpidtoo early, my parent process will get blocked, when I use it too late, I get overwhelmed by zombies and cannot fork()anymore, which is not only bad for my process, but can cause unexpected problems on the whole system.

我不能使用，wait(NULL)因为这使得并行计算变得不可能，因此我可能不得不添加一个存储子 pid 的进程表并且必须使用waitpid- 不是立即使用，而是经过一段时间后 - 这是一个问题，因为运行时间的孩子从几微秒到几分钟不等。waitpid过早使用，父进程会被阻塞，过晚使用，僵尸不堪重负，不能再用fork()了，这不仅对我的进程不利，还会导致整个系统出现意想不到的问题。

I'll probably have to program some logic of using some maximum number of children and block the parent when that number is reached - but that should be not necessary because most of the children terminate quickly. The other solution that I can think of (creating a two-tiered parent process that spawns concurrent children which in turn concurrently spawn and waitfor grandchildren) is too complicated for me right now. Possibly I could also find a non-blocking function to check for the children and use waitpidonly when they have terminated.

我可能必须编写一些逻辑，使用某个最大数量的子级并在达到该数量时阻止父级 - 但这应该没有必要，因为大多数子级很快终止。我能想到的另一个解决方案（创建一个生成并发子进程的两层父进程，这些子进程又同时生成子进程和wait孙子进程）现在对我来说太复杂了。可能我还可以找到一个非阻塞函数来检查孩子并waitpid仅在他们终止时使用。

Nevertheless the question:

然而问题是：

Why does Linux keep zombies at all? Why do I have to wait for my children? Is this to enforce discipline on parent processes? In decades of using Linux I have never got anything useful out of zombie processes, I don't quite get the usefulness of zombies as a "feature".

为什么 Linux 会保留僵尸？为什么我要等我的孩子？这是为了对父进程强制执行纪律吗？在使用 Linux 的几十年里，我从来没有从僵尸进程中得到任何有用的东西，我不太明白僵尸作为一个“特性”的用处。

If the answer is that parent processes need to have a way to find out what happened to their children, then for god's sake there is no reason to count zombies as normal processes and forbid the creation of non-zombie processes just because there are too many zombies. On the system I'm currently developing for I can only spawn 400 to 500 processes before everything grinds to halt (it's a badly maintained CentOS system running on the cheapest VServer I could find - but still 400 zombies are less than a few kB of information)

如果答案是父进程需要有办法找出他们的子进程发生了什么，那么看在上帝的份上，没有理由将僵尸进程视为正常进程并仅仅因为数量太多就禁止创建非僵尸进程僵尸。在我目前正在开发的系统上，在一切都停止之前，我只能产生 400 到 500 个进程（这是一个维护不善的 CentOS 系统，运行在我能找到的最便宜的 VServer 上 - 但仍有 400 个僵尸不到几 kB 的信息)

Answer 1

采纳答案by Christopher Neylan

I'll probably have to add a process table that stores the child pids and have to use waitpid - not immideately, but after some time has passed - which is a problem, because the running time of the children varies from few microseconds to several minutes. If I use waitpid too early, my parent process will get blocked

我可能不得不添加一个进程表来存储子 pid 并且必须使用 waitpid - 不是立即，而是经过一段时间后 - 这是一个问题，因为子进程的运行时间从几微秒到几分钟不等. 如果我过早使用 waitpid，我的父进程将被阻塞

Check out the documentation for waitpid. You can tell waitpidto NOT block (i.e., return immediately if there are no children to reap) using the WNOHANGoption. Moreover, you don't need to give waitpida PID. You can specify -1, and it will wait for anychild. So calling waitpidas below fits your no-blocking constraint and no-saving-pids constraint:

查看的文档waitpid。您可以waitpid使用该WNOHANG选项告诉NOT 阻止（即，如果没有要收割的孩子，则立即返回）。此外，您不需要提供waitpidPID。您可以指定-1，它将等待任何孩子。因此waitpid，如下调用适合您的无阻塞约束和无保存 pids 约束：

waitpid( -1, &status, WNOHANG );

If you reallydon't want to properly handle process creation, then you can give the reaping responsibility to initby forking twice, reaping the child, and giving the execto the grandchild:

如果你真的不想正确处理进程创建，那么你可以init通过两次分叉，收获孩子，并给exec孙子来承担收获责任：

pid_t temp_pid, child_pid;
temp_pid = fork();
if( temp_pid == 0 ){
    child_pid = fork();
    if( child_pid == 0 ){
        // exec()
        error( EXIT_FAILURE, errno, "failed to exec :(" );
    } else if( child_pid < 0 ){
        error( EXIT_FAILURE, errno, "failed to fork :(" );
    }
    exit( EXIT_SUCCESS );
} else if( temp_pid < 0 ){
    error( EXIT_FAILURE, errno, "failed to fork :(" );
} else {
    wait( temp_pid );
}

In the above code snippet, the child process forks its own child, immediately exists, and then is immediately reaped by the parent. The grandchild is orphaned, adopted by init, and will be reaped automatically.

在上面的代码片段中，子进程fork了自己的子进程，立即存在，然后立即被父进程收割。孙子是孤儿，被收养init，将自动收割。

Why does Linux keep zombies at all? Why do I have to wait for my children? Is this to enforce discipline on parent processes? In decades of using Linux I have never got anything useful out of zombie processes, I don't quite get the usefulness of zombies as a "feature". If the answer is that parent processes need to have a way to find out what happened to their children, then for god's sake there is no reason to count zombies as normal processes and forbid the creation of non-zombie processes just because there are too many zombies.

为什么 Linux 会保留僵尸？为什么我要等我的孩子？这是为了对父进程强制执行纪律吗？在使用 Linux 的几十年里，我从来没有从僵尸进程中得到任何有用的东西，我不太明白僵尸作为一个“特性”的用处。如果答案是父进程需要有办法找出他们的子进程发生了什么，那么看在上帝的份上，没有理由将僵尸进程视为正常进程并仅仅因为数量太多就禁止创建非僵尸进程僵尸。

How else do you propose one may efficiently retrieve the exit code of a process? The problem is that the mapping of PID <=> exit code (et al.) must be one to one. If the kernel released the PID of a process as soon as it exits, reaped or not, and then a new process inherits that same PID and exits, how would you handle storing two codes for one PID? How would an interested process retrieve the exit code for the first process? Don't assume that no onecares about exit codes simply because you don't. What you consider to be a nuisance/bug is widely considered useful and clean.

您还建议如何有效地检索进程的退出代码？问题是 PID <=> 退出代码（等）的映射必须是一对一的。如果内核在进程退出时立即释放其 PID，无论是否收割，然后一个新进程继承相同的 PID 并退出，您将如何处理为一个 PID 存储两个代码？感兴趣的进程如何检索第一个进程的退出代码？不要仅仅因为您不关心退出代码就假设没有人关心退出代码。你认为是麻烦/错误的东西被广泛认为是有用和干净的。

On the system I'm currently developing for I can only spawn 400 to 500 processes before everything grinds to halt (it's a badly maintained CentOS system running on the cheapest VServer I could find - but still 400 zombies are less than a few kB of information)

在我目前正在开发的系统上，在一切都停止之前，我只能产生 400 到 500 个进程（这是一个维护不善的 CentOS 系统，运行在我能找到的最便宜的 VServer 上 - 但仍有 400 个僵尸不到几 kB 的信息)

Something about making a widely accepted kernel behavior a scapegoat for what are clearly frustrations over a badly-maintained/cheap system doesn't seem right.

将被广泛接受的内核行为作为明显对维护不善/廉价系统感到沮丧的替罪羊似乎是不正确的。

Typically, your maximum number of processes is limited only by your memory. You can see your limit with:

通常，最大进程数仅受内存限制。您可以通过以下方式查看您的限额：

cat /proc/sys/kernel/threads-max

Answer 2

回答by Greg Hewgill

When a program exits, it returns a return codeto the kernel. A zombie process is simply a place to hold the return code until the parent can obtain it. The wait()call lets the kernel know that the return code for that pid is no longer needed, and the zombie is removed.

当程序退出时，它向内核返回一个返回码。僵尸进程只是一个存放返回码的地方，直到父进程可以获得它。该wait()调用让内核知道不再需要该 pid 的返回码，并且僵尸被移除。

Answer 3

回答by Ben Hymanson

Your reasoning is backwards: The kernel keeps zombies because they store the state that you can retrieve with wait()and related system calls.

你的推理是倒退的：内核保持僵尸是因为它们存储了你可以用wait()和相关系统调用检索的状态。

The proper way to handle asynchronous child termination is to have a SIGCHLDhandler which does the wait()to clean up the child processes.

处理异步子进程的正确方法是使用一个SIGCHLD处理程序wait()来清理子进程。

Answer 4

回答by Vlad

In order to provide you with "exitcode" of the process the system should preserve the "process database" for you. Such database with just an exit code is called "zombie". You may use separate process that will be periodically querying "zombie processes" for their "exitcode" thus effectively freeing this memory. The same will be true for Windows and other operating systems. Linux isn't special here. You don't need to wait for process, just ask its "exit code" after the process finished.

为了向您提供进程的“退出代码”，系统应该为您保留“进程数据库”。这种只有退出代码的数据库被称为“僵尸”。您可以使用单独的进程，这些进程将定期查询“僵尸进程”的“退出代码”，从而有效地释放此内存。Windows 和其他操作系统也是如此。Linux 在这里并不特别。您无需等待进程，只需在进程完成后询问其“退出代码”即可。

Answer 5

回答by osexp2003

Although keeping dead pid in process table is basically for providing it's exit code to its parent later,

尽管在进程表中保留死 pid 基本上是为了稍后将其退出代码提供给其父进程，

I have to complain that there are some baddesign there(but already became history and unchangeable).

我不得不抱怨那里有一些糟糕的设计（但已经成为历史并且无法改变）。

1. Can not pre-declare that `i_don_care_status_of( pid )`

1. 不能预先声明 `i_don_care_status_of( pid )`

On Windows OS, we have a close( processHandle )to achieve this effect.

在 Windows 操作系统上，我们有一个close( processHandle )来实现这个效果。

HANDLE aProcessHandle = CreateProcess(.....);
CloseHandle( aProcessHandle )

To overcoming this, there are some non-perfect methods (from Wiki):

为了克服这个问题，有一些不完美的方法（来自Wiki）：

On modern UNIX-like systems (that comply with SUSv3 specification in this respect), the following special case applies: if the parent explicitly ignores SIGCHLD by setting its handler to SIG_IGN (rather than simply ignoring the signal by default) or has the SA_NOCLDWAIT flag set, all child exit status information will be discarded and no zombie processes will be left.[1]

在现代类 UNIX 系统（在这方面符合 SUSv3 规范）上，以下特殊情况适用：如果父级通过将其处理程序设置为 SIG_IGN（而不是简单地默认忽略信号）显式忽略 SIGCHLD 或具有 SA_NOCLDWAIT 标志设置后，所有子退出状态信息都将被丢弃，并且不会留下僵尸进程。 [1]

2. No reference-counter based handling of pid.

2. 没有基于引用计数器的 pid 处理。

When a process is dead, if there are no reference to the pid, then kernel can remove it immediately.

当一个进程死亡时，如果没有对 pid 的引用，那么内核可以立即删除它。

3. Can not get exit code of unrelated pid

3.无法获取无关pid的退出码

Only parent can get exit code of a pid, this is ridiculous. There are no reliableway to wait for a unrelated pid.

只有父级才能获得 pid 的退出代码，这太荒谬了。没有可靠的方法来等待不相关的 pid。

(Use NETLINK + PROC_CONNECTOR can listen exit event of any pid asynchronously).

（使用 NETLINK + PROC_CONNECTOR 可以异步监听任何 pid 的退出事件）。

On Windows, it can be done by WaitForSingleObject

在 Windows 上，它可以通过 WaitForSingleObject

HANDLE aProcessHandle = OpenProcess( pid... );
WaitForSingleObject(aProcessHandle, ...);

These shortcomings are apparently there, but Unix/Linux's design is very simple, so we have to bare it.

这些缺点显然是存在的，但是Unix/Linux的设计非常简单，所以我们不得不裸露它。

Linux 为什么我必须为子进程`wait()`？

提问by Robby75

采纳答案by Christopher Neylan

回答by Greg Hewgill

回答by Ben Hymanson

回答by Vlad

回答by osexp2003

1. Can not pre-declare that `i_don_care_status_of( pid )`

1. 不能预先声明 `i_don_care_status_of( pid )`

2. No reference-counter based handling of pid.

2. 没有基于引用计数器的 pid 处理。

3. Can not get exit code of unrelated pid

3.无法获取无关pid的退出码

相关推荐

最近更新

标签

Linux 为什么我必须为子进程`wait()`？

提问by Robby75

采纳答案by Christopher Neylan

回答by Greg Hewgill

回答by Ben Hymanson

回答by Vlad

回答by osexp2003

1. Can not pre-declare that i_don_care_status_of( pid )

1. 不能预先声明 i_don_care_status_of( pid )

2. No reference-counter based handling of pid.

2. 没有基于引用计数器的 pid 处理。

3. Can not get exit code of unrelated pid

3.无法获取无关pid的退出码

相关推荐

Linux 我怎样才能得到我的主函数返回的内容？

在 C# 中部署控制台应用程序的方法

Linux eclipse中Android插件安装问题

C# 从 get 返回一个只读变量；放;

相关推荐

最近更新

标签

1. Can not pre-declare that `i_don_care_status_of( pid )`

1. 不能预先声明 `i_don_care_status_of( pid )`