Linux 上的进程的 kill -9 怎么可能不起作用?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/694720/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How is it possible that kill -9 for a process on Linux has no effect?
提问by Aaron Digulla
I'm writing a plugin to highlight text strings automatically as you visit a web site. It's like the highlight search results but automatic and for many words; it could be used for people with allergies to make words really stand out, for example, when they browse a food site.
我正在编写一个插件来在您访问网站时自动突出显示文本字符串。它就像突出显示搜索结果,但自动且适用于许多单词;它可以用于过敏症患者,例如在浏览食品网站时,让文字真正脱颖而出。
But I have problem. When I try to close an empty, fresh FF window, it somehow blocks the whole process. When I kill the process, all the windows vanish, but the Firefox process stays alive (parent PID is 1, doesn't listen to any signals, has lots of resources open, still eats CPU, but won't budge).
但我有问题。当我尝试关闭一个空的、新鲜的 FF 窗口时,它以某种方式阻止了整个过程。当我终止进程时,所有窗口都消失了,但 Firefox 进程保持活动状态(父 PID 为 1,不听任何信号,有大量资源打开,仍然占用 CPU,但不会让步)。
So two questions:
所以两个问题:
How is it even possible for a process not to listen to kill -9 (neither as user nor as root)?
Is there anything I can do but a reboot?
一个进程怎么可能不听kill -9(既不是用户也不是root)?
除了重启还有什么我可以做的吗?
[EDIT] This is the offending process:
[编辑] 这是违规过程:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
digulla 16688 4.3 4.2 784476 345464 pts/14 D Mar28 75:02 /opt/firefox-3.0/firefox-bin
Same with ps -ef | grep firefox
同 ps -ef | grep firefox
UID PID PPID C STIME TTY TIME CMD
digulla 16688 1 4 Mar28 pts/14 01:15:02 /opt/firefox-3.0/firefox-bin
It's the only process left. As you can see, it's not a zombie, it's running! It doesn't listen to kill -9, no matter if I kill by PID or name! If I try to connect with strace
, then the strace
also hangs and can't be killed. There is no output, either. My guess is that FF hangs in some kernel routine but which?
这是剩下的唯一过程。如您所见,它不是僵尸,它在奔跑!它不听 kill -9,无论我是按 PID 还是按名称杀死!如果我尝试与 连接strace
,则strace
也会挂起并且无法被杀死。也没有输出。我的猜测是 FF 挂在某些内核例程中,但是哪个?
[EDIT2] Based on feedback by sigjuice:
[EDIT2] 根据 sigjuice 的反馈:
ps axopid,comm,wchan
can show you in which kernel routine a process hangs. In my case, the offending plugin was the Beagle Indexer (openSUSE 11.1). After disabling the plugin, FF was a quick and happy fox again.
可以显示进程挂在哪个内核例程中。就我而言,有问题的插件是 Beagle Indexer (openSUSE 11.1)。禁用插件后,FF 又变成了一只快活的狐狸。
采纳答案by Dave Sherohman
As noted in comments to the OP, a process status (STAT
) of D
indicates that the process is in an "uninterruptible sleep" state. In real-world terms, this generally means that it's waiting on I/O and can't/won't do anything - including dying - until that I/O operation completes.
如对 OP 的评论所述,进程状态 ( STAT
)D
表示进程处于“不间断睡眠”状态。在现实世界中,这通常意味着它正在等待 I/O 并且不能/不会做任何事情 - 包括死亡 - 直到 I/O 操作完成。
Processes in a D
state will normally only be there for a fraction of a second before the operation completes and they return to R
/S
. In my experience, if a process gets stuck in D
, it's most often trying to communicate with an unreachable NFS or other remote filesystem, trying to access a failing hard drive, or making use of some piece of hardware by way of a flaky device driver. In such cases, the only way to recover and allow the process to die is to either get the fs/drive/hardware back up and running so the I/O can complete or to give up and reboot the system. In the specific case of NFS, the mount may also eventually time out and return from the I/O operation (with a failure code), but this is dependent on the mount options and it's very common for NFS mounts to be set to wait forever.
处于某个D
状态的进程通常只会在操作完成之前停留几分之一秒,然后返回到R
/ S
。根据我的经验,如果一个进程陷入困境D
,最常见的情况是尝试与无法访问的 NFS 或其他远程文件系统进行通信,尝试访问出现故障的硬盘驱动器,或通过脆弱的设备驱动程序使用某些硬件。在这种情况下,恢复并允许进程终止的唯一方法是让 fs/驱动器/硬件备份并运行,以便 I/O 可以完成,或者放弃并重新启动系统。在 NFS 的特定情况下,挂载也可能最终超时并从 I/O 操作返回(带有失败代码),但这取决于挂载选项,并且将 NFS 挂载设置为永远等待是很常见的.
This is distinct from a zombie process, which will have a status of Z
.
这与僵尸进程不同,僵尸进程的状态为Z
。
回答by karim79
sudo killall -9 firefox
Should work
应该管用
EDIT: [PID] changed to firefox
编辑:[PID] 改为 firefox
回答by karim79
ps -ef | grep firefox; and you can see 3 process, kill them all.
ps -ef | grep火狐;您可以看到 3 个进程,将它们全部杀死。
回答by John Feminella
Double-check that the parent-id is really 1. If not, and this is firefox
, first try sudo killall -9 firefox-bin
. After that, try killing the specific process IDs individually with sudo killall -9 [process-id]
.
仔细检查 parent-id 是否真的是 1。如果不是,这是firefox
,首先尝试sudo killall -9 firefox-bin
。之后,尝试使用sudo killall -9 [process-id]
.
How is it even possible for a process not to listen to kill -9 (neiter as user nor as root)?
一个进程怎么可能不听 kill -9(既不是用户也不是 root)?
If a process has gone <defunct>
and then becomes a zombiewith a parent of 1, you can't kill it manually; only init
can. Zombie processes are already dead and gone - they've lost the ability to be killed as they are no longer processes, only a process table entry and its associated exit code, waiting to be collected. You need to kill the parent, and you can't kill init
for obvious reasons.
如果一个进程消失了<defunct>
,然后变成了一个父进程为 1的僵尸进程,你不能手动杀死它;只能init
。僵尸进程已经死了——它们已经失去了被杀死的能力,因为它们不再是进程,只有一个进程表条目及其相关的退出代码,等待收集。你需要杀死父母,而且你不能init
因为显而易见的原因而杀死。
But see herefor more general information. A reboot will kill everything, naturally.
但请参阅此处了解更多一般信息。重启自然会杀死一切。
回答by Eric Holmberg
You can also do a pstree and kill the parent. This makes sure that you get the entire offending process tree and not just the leaf.
您也可以执行 pstree 并杀死父级。这确保您获得整个违规进程树而不仅仅是叶子。
回答by Georg Sch?lly
Is it possible, that this process is restarted (for example by init) just at the time you kill it?
这个进程是否有可能在您杀死它时重新启动(例如通过 init)?
You can check this easily. If the PID is the same after kill -9 PID
then the process wasn't killed, but if it has changed the process has been restarted.
您可以轻松检查这一点。如果之后 PID 相同,kill -9 PID
则该进程不会被终止,但如果它已更改,则该进程已重新启动。
回答by NGI
I lately get trapped into a pitfall of Double Forkand had landed to this page before finally finding my answer. The symptoms are identical even if the problem is not the same:
我最近陷入了双叉的陷阱,并在终于找到我的答案之前登陆了这个页面。即使问题不相同,症状也相同:
- WYKINWYT :What You Kill Is Not What You Thought
- WYKINWYT :你杀的不是你想的
The minimal test code is shown below based on an example for an SNMP Daemon
下面显示了基于 SNMP 守护程序示例的最小测试代码
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
int main(int argc, char* argv[])
{
//We omit the -f option (do not Fork) to reproduce the problem
char * options[]={"/usr/local/sbin/snmpd",/*"-f","*/-d","--master=agentx", "-Dagentx","--agentXSocket=tcp:localhost:1706", "udp:10161", (char*) NULL};
pid_t pid = fork();
if ( 0 > pid ) return -1;
switch(pid)
{
case 0:
{ //Child launches SNMP daemon
execv(options[0],options);
exit(-2);
break;
}
default:
{
sleep(10); //Simulate "long" activity
kill(pid,SIGTERM);//kill what should be child,
//i.e the SNMP daemon I assume
printf("Signal sent to %d\n",pid);
sleep(10); //Simulate "long" operation before closing
waitpid(pid);
printf("SNMP should be now down\n");
getchar();//Blocking (for observation only)
break;
}
}
printf("Bye!\n");
}
During the first phase the main process (7699) launches the SNMP daemon (7700) but we can see that this one is now Defunct/Zombie. Beside we can see another process (7702) with the options we specified
在第一阶段,主进程 (7699) 启动 SNMP 守护进程 (7700),但我们可以看到这个进程现在是Defunct/Zombie。在旁边我们可以看到另一个进程 (7702) 带有我们指定的选项
[nils@localhost ~]$ ps -ef | tail
root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0]
root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1]
root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2]
root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2]
root 7698 729 0 23:11 ? 00:00:00 sleep 60
nils 7699 2832 0 23:11 pts/0 00:00:00 ./main
nils 7700 7699 0 23:11 pts/0 00:00:00 [snmpd] <defunct>
nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils 7727 3706 0 23:11 pts/1 00:00:00 ps -ef
nils 7728 3706 0 23:11 pts/1 00:00:00 tail
After the 10 sec simulated we will try to kill the only process we know (7700). What we succeed at last with waitpid(). But Process 7702 is still here
在 10 秒模拟之后,我们将尝试杀死我们知道的唯一进程 (7700)。我们最终通过waitpid()取得了成功。但是进程7702还在
[nils@localhost ~]$ ps -ef | tail
root 7431 2 0 23:00 ? 00:00:00 [kworker/u256:1]
root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0]
root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1]
root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2]
root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2]
root 7698 729 0 23:11 ? 00:00:00 sleep 60
nils 7699 2832 0 23:11 pts/0 00:00:00 ./main
nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils 7751 3706 0 23:12 pts/1 00:00:00 ps -ef
nils 7752 3706 0 23:12 pts/1 00:00:00 tail
After giving a character to the getchar() function our main process terminates but the SNMP daemon with the pid 7002 is still here
在给 getchar() 函数一个字符后,我们的主进程终止了,但是带有 pid 7002 的 SNMP 守护进程仍然在这里
[nils@localhost ~]$ ps -ef | tail
postfix 7399 1511 0 22:58 ? 00:00:00 pickup -l -t unix -u
root 7431 2 0 23:00 ? 00:00:00 [kworker/u256:1]
root 7439 2 0 23:00 ? 00:00:00 [kworker/1:0]
root 7494 2 0 23:03 ? 00:00:00 [kworker/0:1]
root 7544 2 0 23:08 ? 00:00:00 [kworker/0:2]
root 7605 2 0 23:10 ? 00:00:00 [kworker/1:2]
root 7698 729 0 23:11 ? 00:00:00 sleep 60
nils 7702 1 0 23:11 ? 00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils 7765 3706 0 23:12 pts/1 00:00:00 ps -ef
nils 7766 3706 0 23:12 pts/1 00:00:00 tail
Conclusion
结论
The fact that we ignored the double forkmechanism made us think that the kill action did not succeed. But in fact we simply killed the wrong process !!
我们忽略了双叉机制的事实让我们认为杀死动作没有成功。但实际上我们只是杀了错误的进程!!
By adding the -foption ( Do Not (Double) Fork ) all go as expected
通过添加-f选项( Do Not (Double) Fork )一切都按预期进行