如果进程终止，如何编写 bash 脚本以重新启动进程？

Question

提问by Tom

I have a python script that'll be checking a queue and performing an action on each item:

我有一个 python 脚本，它将检查队列并对每个项目执行操作：

# checkqueue.py
while True:
  check_queue()
  do_something()

How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):

我如何编写一个 bash 脚本来检查它是否正在运行，如果没有，则启动它。大致如下伪代码（或者它应该做类似的事情ps | grep？）：

# keepalivescript.sh
if processidfile exists:
  if processid is running:
     exit, all ok

run checkqueue.py
write processid to processidfile

I'll call that from a crontab:

我将从 crontab 调用它：

# crontab
*/5 * * * * /path/to/keepalivescript.sh

Answer 1

回答by lhunath

Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.

避免使用 PID 文件、cron 或其他任何试图评估不是其子进程的进程。

There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.

在 UNIX 中，您只能等待您的孩子，这是有充分理由的。任何试图解决该问题的方法（ps 解析、pgrep、存储 PID 等）都存在缺陷并且存在漏洞。只是说没有。

Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that startsyour process can reliably wait for it to end. In bash, this is absolutely trivial.

相反，您需要将监控您的流程的流程作为流程的父级。这是什么意思？这意味着只有启动进程的进程才能可靠地等待它结束。在 bash 中，这绝对是微不足道的。

until myserver; do
    echo "Server 'myserver' crashed with exit code $?.  Respawning.." >&2
    sleep 1
done

The above piece of bash code runs myserverin an untilloop. The first line starts myserverand waits for it to end. When it ends, untilchecks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not0, untilwill run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.

上述片的bash代码运行的myserver在一个until循环。第一行开始myserver并等待它结束。当它结束时，until检查其退出状态。如果退出状态为0，则表示它正常结束（这意味着您要求它以某种方式关闭，并且它成功关闭）。在这种情况下，我们不想重新启动它（我们只是要求它关闭！）。如果退出状态不是0，until将运行循环体，它会在 STDERR 上发出错误消息并在 1 秒后重新启动循环（回到第 1 行）。

Why do we wait a second? Because if something's wrong with the startup sequence of myserverand it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1takes away the strain from that.

为什么我们要等一下？因为如果的启动顺序出现问题myserver并立即崩溃，您将有一个非常密集的循环，不断重启和崩溃。在sleep 1从带走的应变。

Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserverand restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an @rebootrule. Open your cron rules with crontab:

现在您需要做的就是启动这个 bash 脚本（可能是异步的），它会myserver根据需要监视并重新启动它。如果您想在启动时启动监视器（使服务器“幸存”重新启动），您可以使用@reboot规则在用户的 cron(1) 中安排它。使用以下命令打开您的 cron 规则crontab：

crontab -e

Then add a rule to start your monitor script:

然后添加一个规则来启动你的监控脚本：

@reboot /usr/local/bin/myservermonitor

Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserverstart at a certain init level and be respawned automatically.

或者; 查看 inittab(5) 和 /etc/inittab。您可以在其中添加一行以myserver从某个初始级别开始并自动重新生成。

Edit.

编辑。

Let me add some information on why notto use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.

让我添加一些关于为什么不使用 PID 文件的信息。虽然它们很受欢迎；它们也非常有缺陷，你没有理由不以正确的方式去做。

Consider this:

考虑一下：

PID recycling (killing the wrong process):
- /etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
- A while later: foodies somehow.
- A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
- You notice foo's gone: /etc/init.d/foo/restartreads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..
What if you don't even have write access or are in a read-only environment?
It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.

PID回收（杀死错误的进程）：
- /etc/init.d/foo start: 开始foo，将fooPID写入/var/run/foo.pid
- 一段时间后：foo不知何故死了。
- 一段时间后：任何开始（称为bar）的随机进程都采用随机 PID，想象它采用foo的是旧的 PID。
- 你注意到foo已经消失了：/etc/init.d/foo/restart读取/var/run/foo.pid，检查它是否还活着，找到bar，认为它是foo，杀死它，开始一个新的foo.
PID 文件变得陈旧。您需要过于复杂（或者我应该说，非平凡）的逻辑来检查 PID 文件是否过时，并且任何此类逻辑再次容易受到1..
如果您甚至没有写访问权限或处于只读环境中怎么办？
这是毫无意义的过度复杂化；看看我上面的例子有多简单。完全没有必要把它复杂化。

See also: Are PID-files still flawed when doing it 'right'?

另请参阅：PID 文件在“正确”执行时是否仍然存在缺陷？

By the way; even worse than PID files is parsing ps!Don't ever do this.

顺便一提; 比 PID 文件更糟糕的是解析ps！永远不要这样做。

psis very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
Parsing psleads to a LOT of false positives. Take the ps aux | grep PIDexample, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.

ps非常不便携。虽然您几乎可以在每个 UNIX 系统上找到它；如果你想要非标准输出，它的参数会有很大差异。并且标准输出仅供人类使用，不适用于脚本解析！
解析ps会导致很多误报。就拿ps aux | grep PID例如，现在想象有人开始的过程与一些地方的说法，恰好是作为PID你盯着你的守护进程一样！想象一下，两个人开始了一个 X 会话，而你正在寻找 X 来杀死你的会话。这只是各种各样的坏事。

If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.

如果您不想自己管理流程；有一些非常好的系统可以充当您流程的监视器。例如，查看runit。

Answer 2

回答by Bernd

Have a look at monit (http://mmonit.com/monit/). It handles start, stop and restart of your script and can do health checks plus restarts if necessary.

看看 monit ( http://mmonit.com/monit/)。它处理脚本的启动、停止和重新启动，并且可以进行健康检查以及必要时重新启动。

Or do a simple script:

或者做一个简单的脚本：

while true
do
/your/script
sleep 1
done

Answer 3

回答by vartec

The easiest way to do it is using flock on file. In Python script you'd do

最简单的方法是使用 flock on file。在 Python 脚本中你会做

lf = open('/tmp/script.lock','w')
if(fcntl.flock(lf, fcntl.LOCK_EX|fcntl.LOCK_NB) != 0): 
   sys.exit('other instance already running')
lf.write('%d\n'%os.getpid())
lf.flush()

In shell you can actually test if it's running:

在 shell 中，您实际上可以测试它是否正在运行：

if [ `flock -xn /tmp/script.lock -c 'echo 1'` ]; then 
   echo 'it's not running'
   restart.
else
   echo -n 'it's already running with PID '
   cat /tmp/script.lock
fi

But of course you don't have to test, because if it's already running and you restart it, it'll exit with 'other instance already running'

但是当然你不必测试，因为如果它已经在运行并且你重新启动它，它会退出 'other instance already running'

When process dies, all it's file descriptors are closed and all locks are automatically removed.

当进程终止时，它的所有文件描述符都将关闭，并且所有锁都会自动删除。

Answer 4

回答by clofresh

You should use monit, a standard unix tool that can monitor different things on the system and react accordingly.

您应该使用 monit，这是一种标准的 Unix 工具，可以监控系统上的不同内容并做出相应的反应。

From the docs: http://mmonit.com/monit/documentation/monit.html#pid_testing

来自文档：http: //mmonit.com/monit/documentation/monit.html#pid_testing

check process checkqueue.py with pidfile /var/run/checkqueue.pid
       if changed pid then exec "checkqueue_restart.sh"

You can also configure monit to email you when it does do a restart.

您还可以将 monit 配置为在重新启动时向您发送电子邮件。

Answer 5

回答by soulmerge

if ! test -f $PIDFILE || ! psgrep `cat $PIDFILE`; then
    restart_process
    # Write PIDFILE
    echo $! >$PIDFILE
fi

Answer 6

回答by Daniel Bradley

I'm not sure how portable it is across operating systems, but you might check if your system contains the 'run-one' command, i.e. "man run-one". Specifically, this set of commands includes 'run-one-constantly', which seems to be exactly what is needed.

我不确定它跨操作系统的可移植性如何，但您可以检查您的系统是否包含“运行一”命令，即“人运行一”。具体来说，这组命令包括'run-one-constantly'，这似乎正是所需要的。

From man page:

从手册页：

run-one-constantly COMMAND [ARGS]

run-one-constantly 命令 [ARGS]

Note: obviously this could be called from within your script, but also it removes the need for having a script at all.

注意：显然这可以从您的脚本中调用，但它也完全不需要拥有脚本。

Answer 7

回答by Kevin Wright

I've used the following script with great success on numerous servers:

我在许多服务器上使用了以下脚本并取得了巨大成功：

pid=`jps -v | grep $INSTALLATION | awk '{print }'`
echo $INSTALLATION found at PID $pid 
while [ -e /proc/$pid ]; do sleep 0.1; done

notes:

笔记：

It's looking for a java process, so I can use jps, this is much more consistent across distributions than ps
$INSTALLATIONcontains enough of the process path that's it's totally unambiguous
Use sleep while waiting for the process to die, avoid hogging resources :)

它正在寻找一个 java 进程，所以我可以使用 jps，这比 ps 跨发行版更加一致
$INSTALLATION包含足够的流程路径，这是完全明确的
在等待进程死亡时使用睡眠，避免占用资源:)

This script is actually used to shut down a running instance of tomcat, which I want to shut down (and wait for) at the command line, so launching it as a child process simply isn't an option for me.

这个脚本实际上是用来关闭正在运行的 tomcat 实例，我想在命令行关闭（并等待）它，所以将它作为子进程启动对我来说根本不是一个选择。

Answer 8

回答by BitDEVil2K16

I use this for my npm Process

我将它用于我的 npm 进程

#!/bin/bash
for (( ; ; ))
do
date +"%T"
echo Start Process
cd /toFolder
sudo process
date +"%T"
echo Crash
sleep 1
done

如果进程终止，如何编写 bash 脚本以重新启动进程？

提问by Tom

回答by lhunath

回答by Bernd

回答by vartec

回答by clofresh

回答by soulmerge

回答by Daniel Bradley

回答by Kevin Wright

回答by BitDEVil2K16

相关推荐

最近更新

标签

如果进程终止，如何编写 bash 脚本以重新启动进程？

提问by Tom

回答by lhunath

回答by Bernd

回答by vartec

回答by clofresh

回答by soulmerge

回答by Daniel Bradley

回答by Kevin Wright

回答by BitDEVil2K16

相关推荐

bash 顶部的进程命令

如何从 Bash 脚本检查程序是否存在？

如何在 bash 或 Perl 中将数字与范围进行比较？

使用 Bash 批量重命名文件

相关推荐

最近更新

标签