bash 为什么由 cron 产生的进程最终会失效？

Question

提问by John Zwinck

I have some processes showing up as <defunct>in top(and ps). I've boiled things down from the real scripts and programs.

我有一些过程显示为<defunct>在top（和ps）。我从真实的脚本和程序中总结了一些东西。

In my crontab:

在我的crontab：

* * * * * /tmp/launcher.sh /tmp/tester.sh

The contents of launcher.sh(which is of course marked executable):

的内容launcher.sh（当然标记为可执行）：

#!/bin/bash
# the real script does a little argument processing here
"$@"

The contents of tester.sh(which is of course marked executable):

的内容tester.sh（当然标记为可执行）：

#!/bin/bash
sleep 27 & # the real script launches a compiled C program in the background

psshows the following:

ps显示以下内容：

user       24257 24256  0 18:32 ?        00:00:00 [launcher.sh] <defunct>
user       24259     1  0 18:32 ?        00:00:00 sleep 27

Note that tester.shdoes not appear--it has exited after launching the background job.

请注意，tester.sh它没有出现——它在启动后台作业后已退出。

Why does launcher.shstick around, marked <defunct>? It only seems to do this when launched by cron--not when I run it myself.

为什么launcher.sh坚持，标记<defunct>？它似乎只在由 -- 启动cron时执行此操作，而不是在我自己运行时执行。

Additional note: launcher.shis a common script in the system this runs on, which is not easily modified. The other things (crontab, tester.sh, even the program that I run instead of sleep) can be modiified much more easily.

附加说明：launcher.sh是系统中的常用脚本，不易修改。其他事情（crontab, tester.sh，甚至是我运行的程序而不是sleep）可以更容易地修改。

Answer 1

采纳答案by DigitalRoss

Because they haven't been the subject of a wait(2)system call.

因为它们不是wait(2)系统调用的主题。

Since someone may wait for these processes in the future, the kernel can't completely get rid of them or it won't be able to execute the waitsystem call because it won't have the exit status or evidence of its existence any more.

由于将来有人可能会等待这些进程，因此内核无法完全摆脱它们，或者无法执行wait系统调用，因为它不再具有退出状态或存在的证据。

When you start one from the shell, your shell is trapping SIGCHLD and doing various wait operations anyway, so nothing stays defunct for long.

当您从 shell 启动一个时，您的 shell 会捕获 SIGCHLD 并执行各种等待操作，因此没有任何东西会长时间失效。

But cron isn't in a wait state, it is sleeping, so the defunct child may stick around for a while until cron wakes up.

但是 cron 不是处于等待状态，而是处于睡眠状态，因此死掉的孩子可能会停留一段时间，直到 cron 醒来。

Update: Responding to comment... Hmm. I did manage to duplicate the issue:

更新： 回应评论......嗯。我确实设法复制了这个问题：

 PPID   PID  PGID  SESS COMMAND
    1  3562  3562  3562 cron
 3562  1629  3562  3562  \_ cron
 1629  1636  1636  1636      \_ sh <defunct>
    1  1639  1636  1636 sleep

So, what happened was, I think:

所以，发生的事情是，我想：

cron forks and cron child starts shell
shell (1636) starts sid and pgid 1636 and starts sleep
shell exits, SIGCHLD sent to cron 3562
signal is ignored or mishandled
shell turns zombie. Note that sleep is reparented to init, so when the sleep exits init will get the signal and clean up. I'm still trying to figure out when the zombie gets reaped. Probably with no active children cron 1629 figures out it can exit, at that point the zombie will be reparented to init and get reaped. So now we wonder about the missing SIGCHLD that cron should have processed.
- It isn't necessarily vixie cron's fault. As you can see here, libdaemon installs a SIGCHLD handlerduring daemon_fork(), and this could interfere with signal delivery on a quick exit by intermediate 1629
  Now, I don't even know if vixie cron on my Ubuntu system is even built with libdaemon, but at least I have a new theory. :-)

cron fork 和 cron child 启动 shell
shell (1636) 启动 sid 和 pgid 1636 并开始睡眠
shell 退出，SIGCHLD 发送到 cron 3562
信号被忽略或处理不当
贝壳变成僵尸。请注意，睡眠被重新分配给 init，因此当睡眠退出时，init 将获得信号并进行清理。我仍在试图弄清楚僵尸何时会被收割。可能没有活动的孩子 cron 1629 计算出它可以退出，此时僵尸将被重新设置为 init 并获得收获。所以现在我们想知道 cron 应该处理的丢失的 SIGCHLD 。
- 这不一定是 vixie cron 的错。正如您在此处看到的，libdaemon在期间安装了一个 SIGCHLD 处理程序daemon_fork()，这可能会干扰中间 1629 快速退出时的信号传递
  现在，我什至不知道我的 Ubuntu 系统上的 vixie cron 是否是用 libdaemon 构建的，但至少我有一个新理论。:-)

Answer 2

回答by hp4

to my opinion it's caused by process CROND (spawned by crond for every task) waiting for input on stdin which is piped to the stdout/stderr of the command in the crontab. This is done because cron is able to send resulting output via mail to the user.

在我看来，这是由进程 CROND（由 crond 为每个任务产生）等待 stdin 上的输入引起的，该输入通过管道传输到 crontab 中命令的 stdout/stderr。这样做是因为 cron 能够通过邮件将结果输出发送给用户。

So CROND is waiting for EOF till the user command and all it's spawned child processes have closed the pipe. If this is done CROND continues with the wait-statement and then the defunct user command disappears.

所以 CROND 正在等待 EOF 直到用户命令和它产生的所有子进程都关闭了管道。如果这样做，CROND 继续等待语句，然后失效的用户命令消失。

So I think you have to explicitly disconnect every spawned subprocess in your script form the pipe (e.g. by redirecting it to a file or /dev/null.

因此，我认为您必须从管道中明确断开脚本中每个衍生的子进程（例如，通过将其重定向到文件或 /dev/null。

so the following line should work in crontab :

所以以下行应该在 crontab 中工作：

* * * * * ( /tmp/launcher.sh /tmp/tester.sh &>/dev/null & )

Answer 3

回答by bstpierre

I suspect that cron is waiting for all subprocesses in the session to terminate. See wait(2) with respect to negative pid arguments. You can see the SESS with:

我怀疑 cron 正在等待会话中的所有子进程终止。有关负 pid 参数，请参阅 wait(2)。您可以通过以下方式查看 SESS：

ps faxo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm

Here's what I see (edited):

这是我看到的（已编辑）：

STAT  EUID  RUID TT       TPGID  SESS  PGRP  PPID   PID %CPU COMMAND
Ss       0     0 ?           -1  3197  3197     1  3197  0.0 cron
S        0     0 ?           -1  3197  3197  3197 18825  0.0  \_ cron
Zs    1000  1000 ?           -1 18832 18832 18825 18832  0.0      \_ sh <defunct>
S     1000  1000 ?           -1 18832 18832     1 18836  0.0 sleep

Notice that the sh and the sleep are in the same SESS.

注意 sh 和 sleep 在同一个 SESS 中。

Use the command setsid(1). Here's tester.sh:

使用命令setsid(1)。这是 tester.sh：

#!/bin/bash
setsid sleep 27 # the real script launches a compiled C program in the background

Notice you don't need &, setsid puts it in the background.

注意你不需要&，setsid 把它放在后台。

Answer 4

回答by Teddy

I'd recommend that you solve the problem by simply not having two separate processes: Have launcher.shdo this on its last line:

我建议您通过简单地不使用两个单独的进程来解决问题：launcher.sh在最后一行执行此操作：

exec "$@"

This will eliminate the superfluous process.

这将消除多余的过程。

Answer 5

回答by Datageek

I found this question while I was looking for a solution with a similar issue. Unfortunately answers in this question didn't solve my problem.

我在寻找具有类似问题的解决方案时发现了这个问题。不幸的是，这个问题的答案并没有解决我的问题。

Killing defunct process is not an option as you need to find and kill its parent process. I ended up killing the defunct processes in the following way:

终止失效进程不是一种选择，因为您需要找到并终止其父进程。我最终通过以下方式杀死了已失效的进程：

ps -ef | grep '<defunct>' | grep -v grep | awk '{print "kill -9 ",}' | sh

In "grep ''" you can narrow down the search to a specific defunct process you are after.

在“grep ''”中，您可以将搜索范围缩小到您所追求的特定已失效进程。

Answer 6

回答by user377713

I have tested the same problem so many times. And finally I've got the solution. Just specify the '/bin/bash' before the bash script as shown below.

我已经测试了很多次同样的问题。最后我得到了解决方案。只需在 bash 脚本之前指定“/bin/bash”，如下所示。

* * * * * /bin/bash /tmp/launcher.sh /tmp/tester.sh

bash 为什么由 cron 产生的进程最终会失效？

提问by John Zwinck

采纳答案by DigitalRoss

回答by hp4

回答by bstpierre

回答by Teddy

回答by Datageek

回答by user377713

相关推荐

最近更新

标签

bash 为什么由 cron 产生的进程最终会失效？

提问by John Zwinck

采纳答案by DigitalRoss

回答by hp4

回答by bstpierre

回答by Teddy

回答by Datageek

回答by user377713

相关推荐

bash 将浮点变量转换为整数？

在 Bash 脚本中获取当前目录名称（没有完整路径）

如何通过 bash 脚本检测来自 ant/maven 的构建错误？

在 Bash 脚本中，如果发生某种情况，如何退出整个脚本？

相关推荐

最近更新

标签