Linux 在崩溃时自动重启应用程序 - 守护进程
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7376537/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Linux automatically restarting application on crash - Daemons
提问by user623879
I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
我有一个运行嵌入式 linux 的系统,它连续运行至关重要。基本上,它是一个与传感器通信并将该数据中继到数据库和 Web 客户端的过程。
If a crash occurs, how do I restart the application automatically?
如果发生崩溃,如何自动重启应用程序?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?
此外,还有几个线程在进行轮询(例如套接字和 uart 通信)。如何确保没有线程挂断或意外退出?是否有易于使用的线程友好的看门狗?
采纳答案by Michael Aaron Safyan
The gist of it is:
它的要点是:
- You need to detect if the program is still running and not hung.
- You need to (re)start the program if the program is not running or is hung.
- 您需要检测程序是否仍在运行并且没有挂起。
- 如果程序没有运行或挂起,您需要(重新)启动程序。
There are a number of different ways to do #1, but two that come to mind are:
有许多不同的方法可以做到#1,但想到的两个是:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
侦听 UNIX 域套接字,以处理状态请求。然后外部应用程序可以查询该应用程序是否仍然正常。如果在某个超时时间内没有得到响应,则可以假设被查询的应用程序已死锁或已死。
定期接触具有预选路径的文件。外部应用程序可以查看文件的时间戳,如果它是陈旧的,那么它可以假设应用程序已死或死锁。
With respect to #2, killing the previous PID and using fork+exec to launch a new process is typical. You might also consider making your application that runs "continuously", into an application that runs once, but then use "cron" or some other application to continuously rerun that single-run application.
关于#2,杀死以前的PID并使用fork+exec来启动一个新进程是典型的。您还可以考虑将“连续”运行的应用程序变成运行一次的应用程序,然后使用“cron”或其他一些应用程序来连续重新运行该单次运行的应用程序。
Unfortunately, watchdog timers and getting out of deadlock are non-trivial issues. I don't know of any generic way to do it, and the few that I've seen are pretty ugly and not 100% bug-free. However, tsancan help detect potential deadlock scenarios and other threading issues with static analysis.
不幸的是,看门狗定时器和摆脱僵局不是小问题。我不知道有什么通用的方法可以做到这一点,而且我见过的少数方法非常丑陋,而且不是 100% 没有错误。但是,tsan可以通过静态分析帮助检测潜在的死锁场景和其他线程问题。
回答by user771921
You can seamlessly restart your process as it dies with fork
and waitpid
as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
您可以无缝地重新启动您的进程,因为它会终止fork
并waitpid
如本答案所述。它不会消耗任何大量资源,因为操作系统将共享内存页面。
Which leaves only the problem of detecting a hung process. You can use any of the solutions pointed out by Michael Aaron Safyan for this, but a yet easier solution would be to use the alarm
syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm
(i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
That way, no extra programs needed, and only portable POSIX stuff used.
只剩下检测挂起进程的问题。您可以使用 Michael Aaron Safyan 指出的任何解决方案,但更简单的解决方案是alarm
重复使用系统调用,让信号终止进程(相应地使用 sigaction)。只要您继续调用alarm
(即只要您的程序正在运行),它就会继续运行。一旦你不这样做,信号就会触发。
这样,不需要额外的程序,只使用可移植的 POSIX 东西。
回答by Mariz Melo
You could create a CRON jobto check if the process is running with start-stop-daemonfrom time to time.
您可以创建一个CRON 作业来检查进程是否不时使用start-stop-daemon运行。
回答by sayyed mohsen zahraee
use this script for running your application
使用此脚本运行您的应用程序
#!/bin/bash
while ! /path/to/program #This will wait for the program to exit successfully.
do
echo “restarting” # Else it will restart.
done
you can also put this script on your /etc/init.d/
in other to start as daemon
你也可以把这个脚本放在你/etc/init.d/
的另一个上作为守护进程启动