Linux 是什么杀死了我的进程,为什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/726690/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What killed my process and why?
提问by sbq
My application runs as a background process on Linux. It is currently started at the command line in a Terminal window.
我的应用程序在 Linux 上作为后台进程运行。它目前在终端窗口的命令行中启动。
Recently a user was executing the application for a while and it died mysteriously. The text:
最近,一个用户正在执行该应用程序一段时间后,它神秘地死了。文本:
Killed
被杀
was on the terminal. This happened two times. I asked if someone at a different Terminal used the kill command to kill the process? No.
在终端上。这发生了两次。我问是否有人在不同的终端使用 kill 命令来终止进程?不。
Under what conditions would Linux decide to kill my process? I believe the shell displayed "killed" because the process died after receiving the kill(9) signal. If Linux sent the kill signal should there be a message in a system log somewhere that explains why it was killed?
在什么情况下 Linux 会决定终止我的进程?我相信 shell 显示“已终止”是因为进程在收到 kill(9) 信号后死亡。如果 Linux 发送了 kill 信号,系统日志中是否应该有一条消息解释为什么它被杀死了?
采纳答案by dwc
If the user or sysadmin did not kill the program the kernel may have. The kernel would only kill a process under exceptional circumstances such as extreme resource starvation (think mem+swap exhaustion).
如果用户或系统管理员没有杀死内核可能拥有的程序。内核只会在特殊情况下杀死进程,例如极端资源匮乏(想想 mem+swap 耗尽)。
回答by Tom Ritter
The user has the ability to kill his own programs, using kill or Control+C, but I get the impression that's not what happened, and that the user complained to you.
用户可以使用 kill 或 Control+C 杀死自己的程序,但我的印象并不是发生了什么,而是用户向您抱怨。
root has the ability to kill programs of course, but if someone has root on your machine and is killing stuff you have bigger problems.
root 当然有能力杀死程序,但是如果有人在你的机器上有 root 并且正在杀死一些东西,你就会遇到更大的问题。
If you are not the sysadmin, the sysadmin may have set up quotas on CPU, RAM, ort disk usage and auto-kills processes that exceed them.
如果您不是系统管理员,系统管理员可能已经设置了 CPU、RAM、ort 磁盘使用量的配额并自动杀死超过它们的进程。
Other than those guesses, I'm not sure without more info about the program.
除了这些猜测之外,我不确定是否没有关于该程序的更多信息。
回答by Lawrence Dol
We have had recurring problems under Linux at a customer site (Red Hat, I think), with OOMKiller (out-of-memory killer) killing both our principle application (i.e. the reason the server exists) and it's data base processes.
我们在客户站点(我认为是 Red Hat)的 Linux 下反复出现问题,OOMKiller(内存不足杀手)杀死了我们的主要应用程序(即服务器存在的原因)及其数据库进程。
In each case OOMKiller simply decided that the processes were using to much resources... the machine wasn't even about to fail for lack of resources. Neither the application nor it's database has problems with memory leaks (or any other resource leak).
在每种情况下,OOMKiller 只是简单地认为进程使用了大量资源......机器甚至不会因为缺乏资源而出现故障。应用程序及其数据库都没有内存泄漏(或任何其他资源泄漏)问题。
I am not a Linux expert, but I rather gathered it's algorithm for deciding when to kill something and what to kill is complex. Also, I was told (I can't speak as to the accuracy of this) that OOMKiller is baked into the Kernel and you can't simply not run it.
我不是 Linux 专家,但我更愿意收集它的算法,用于决定何时杀死某些东西以及杀死什么是复杂的。另外,有人告诉我(我不能说它的准确性)OOMKiller 被嵌入到内核中,你不能简单地不运行它。
回答by Adam Jaskiewicz
This looks like a good article on the subject: Taming the OOM killer.
这看起来是一篇关于这个主题的好文章:驯服 OOM 杀手。
The gist is that Linux overcommitsmemory. When a process asks for more space, Linux will give it that space, even if it is claimed by another process, under the assumption that nobody actually uses all of the memory they ask for. The process will get exclusive use of the memory it has allocated when it actually uses it, not when it asks for it. This makes allocation quick, and might allow you to "cheat" and allocate more memory than you really have. However, once processes start using this memory, Linux might realize that it has been too generous in allocating memory it doesn't have, and will have to kill off a process to free some up. The process to be killed is based on a score taking into account runtime (long-running processes are safer), memory usage (greedy processes are less safe), and a few other factors, including a value you can adjust to make a process less likely to be killed. It's all described in the article in a lot more detail.
要点是 Linux过度使用记忆。当一个进程请求更多空间时,Linux 将给它该空间,即使它被另一个进程占用,前提是没有人实际使用他们请求的所有内存。进程将在实际使用时独占使用已分配的内存,而不是在请求时独占使用。这使得分配更快,并且可能允许您“欺骗”并分配比您实际拥有的更多的内存。然而,一旦进程开始使用这些内存,Linux 可能会意识到它在分配它没有的内存方面过于慷慨,并且必须杀死一个进程以释放一些内存。要杀死的进程基于一个分数,考虑到运行时(长时间运行的进程更安全)、内存使用(贪婪的进程更不安全)和其他一些因素,包括一个您可以调整的值,以使进程不太可能被杀死。这一切都在文章中进行了更详细的描述。
Edit: And here is another articlethat explains pretty well how a process is chosen (annotated with some kernel code examples). The great thing about this is that it includes some commentary on the reasoningbehind the various badness()
rules.
编辑:这是另一篇文章,它很好地解释了如何选择进程(用一些内核代码示例注释)。这样做的好处在于,它包含了对各种规则背后的推理的一些评论badness()
。
回答by oldman
In an lsf environment (interactive or otherwise) if the application exceeds memory utilization beyond some preset threshold by the admins on the queue or the resource request in submit to the queue the processes will be killed so other users don't fall victim to a potential run away. It doesn't always send an email when it does so, depending on how its set up.
在 lsf 环境(交互式或其他方式)中,如果应用程序的内存利用率超过了队列管理员或提交到队列的资源请求的某个预设阈值,则进程将被终止,因此其他用户不会成为潜在的受害者逃跑。它并不总是在发送电子邮件时发送电子邮件,具体取决于其设置方式。
One solution in this case is to find a queue with larger resources or define larger resource requirements in the submission.
这种情况下的一种解决方案是找到具有更大资源的队列或在提交中定义更大的资源需求。
You may also want to review man ulimit
您可能还想查看 man ulimit
Although I don't remember ulimit
resulting in Killed
its been a while since I needed that.
虽然我不记得ulimit
导致Killed
它有一段时间了,因为我需要它。
回答by Christian Ammer
The PAM module to limit resourcescaused exactly the results you described: My process died mysteriously with the text Killedon the console window. No log output, neither in syslognor in kern.log. The topprogram helped me to discover that exactly after one minute of CPU usage my process gets killed.
限制资源的PAM 模块导致了您所描述的结果:我的进程神秘地死亡,控制台窗口上显示了Killed文本。没有日志输出,无论是在syslog还是在kern.log。该顶部程序帮助我发现CPU的使用率正好一分钟后我的进程就会被杀死。
回答by poordeveloper
I encountered this problem lately. Finally, I found my processes were killed just after Opensuse zypper update was called automatically. To disable zypper update solved my problem.
我最近遇到了这个问题。最后,我发现我的进程在 Opensuse zypper update 被自动调用后就被杀死了。禁用 zypper 更新解决了我的问题。
回答by Ravindranath Akila
Try:
尝试:
dmesg -T| grep -E -i -B100 'killed process'
Where -B100
signifies the number of lines before the kill happened.
哪里-B100
表示杀死发生之前的行数。
Omit -Ton Mac OS.
在 Mac OS 上省略-T。
回答by Carl
As dwc and Adam Jaskiewicz have stated, the culprit is likely the OOM Killer. However, the next question that follows is: How do I prevent this?
正如 dwc 和 Adam Jaskiewicz 所说,罪魁祸首很可能是 OOM 杀手。但是,接下来的问题是:如何防止这种情况发生?
There are several ways:
有几种方式:
- Give your system more RAM if you can (easy if its a VM)
- Make sure the OOM killer chooses a different process.
- Disable the OOM Killer
- Choose a Linux distro which ships with the OOM Killer disabled.
- 如果可以,为您的系统提供更多 RAM(如果它是 VM,则很容易)
- 确保 OOM 杀手选择不同的进程。
- 禁用 OOM 杀手
- 选择一个禁用了 OOM Killer 的 Linux 发行版。
I found (2) to be especially easy to implement, thanks to this article.
多亏了这篇文章,我发现 (2) 特别容易实现。
回答by fche
A tool like systemtap (or a tracer) can monitor kernel signal-transmission logic and report. e.g., https://sourceware.org/systemtap/examples/process/sigmon.stp
像 systemtap(或跟踪器)这样的工具可以监控内核信号传输逻辑并报告。例如,https://sourceware.org/systemtap/examples/process/sigmon.stp
# stap .../sigmon.stp -x 31994 SIGKILL
SPID SNAME RPID RNAME SIGNUM SIGNAME
5609 bash 31994 find 9 SIGKILL
The filtering if
block in that script can be adjusted to taste, or eliminated to trace systemwide signal traffic. Causes can be further isolated by collecting backtraces (add a print_backtrace()
and/or print_ubacktrace()
to the probe, for kernel- and userspace- respectively).
该if
脚本中的过滤块可以根据喜好进行调整,或消除以跟踪系统范围的信号流量。可以通过收集回溯来进一步隔离原因(分别为内核和用户空间添加print_backtrace()
和/或print_ubacktrace()
探针)。