Linux 上的进程的 kill -9 怎么可能不起作用？

Question

提问by Aaron Digulla

I'm writing a plugin to highlight text strings automatically as you visit a web site. It's like the highlight search results but automatic and for many words; it could be used for people with allergies to make words really stand out, for example, when they browse a food site.

我正在编写一个插件来在您访问网站时自动突出显示文本字符串。它就像突出显示搜索结果，但自动且适用于许多单词；它可以用于过敏症患者，例如在浏览食品网站时，让文字真正脱颖而出。

But I have problem. When I try to close an empty, fresh FF window, it somehow blocks the whole process. When I kill the process, all the windows vanish, but the Firefox process stays alive (parent PID is 1, doesn't listen to any signals, has lots of resources open, still eats CPU, but won't budge).

但我有问题。当我尝试关闭一个空的、新鲜的 FF 窗口时，它以某种方式阻止了整个过程。当我终止进程时，所有窗口都消失了，但 Firefox 进程保持活动状态（父 PID 为 1，不听任何信号，有大量资源打开，仍然占用 CPU，但不会让步）。

So two questions:

所以两个问题：

How is it even possible for a process not to listen to kill -9 (neither as user nor as root)?
Is there anything I can do but a reboot?

一个进程怎么可能不听kill -9（既不是用户也不是root）？
除了重启还有什么我可以做的吗？

[EDIT] This is the offending process:

[编辑] 这是违规过程：

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
digulla  16688  4.3  4.2 784476 345464 pts/14  D    Mar28  75:02 /opt/firefox-3.0/firefox-bin

Same with ps -ef | grep firefox

同 ps -ef | grep firefox

UID        PID  PPID  C STIME TTY          TIME CMD
digulla  16688     1  4 Mar28 pts/14   01:15:02 /opt/firefox-3.0/firefox-bin

It's the only process left. As you can see, it's not a zombie, it's running! It doesn't listen to kill -9, no matter if I kill by PID or name! If I try to connect with strace, then the stracealso hangs and can't be killed. There is no output, either. My guess is that FF hangs in some kernel routine but which?

这是剩下的唯一过程。如您所见，它不是僵尸，它在奔跑！它不听 kill -9，无论我是按 PID 还是按名称杀死！如果我尝试与连接strace，则strace也会挂起并且无法被杀死。也没有输出。我的猜测是 FF 挂在某些内核例程中，但是哪个？

[EDIT2] Based on feedback by sigjuice:

[EDIT2] 根据 sigjuice 的反馈：

ps axopid,comm,wchan

can show you in which kernel routine a process hangs. In my case, the offending plugin was the Beagle Indexer (openSUSE 11.1). After disabling the plugin, FF was a quick and happy fox again.

可以显示进程挂在哪个内核例程中。就我而言，有问题的插件是 Beagle Indexer (openSUSE 11.1)。禁用插件后，FF 又变成了一只快活的狐狸。

Answer 1

采纳答案by Dave Sherohman

As noted in comments to the OP, a process status (STAT) of Dindicates that the process is in an "uninterruptible sleep" state. In real-world terms, this generally means that it's waiting on I/O and can't/won't do anything - including dying - until that I/O operation completes.

如对 OP 的评论所述，进程状态 ( STAT)D表示进程处于“不间断睡眠”状态。在现实世界中，这通常意味着它正在等待 I/O 并且不能/不会做任何事情 - 包括死亡 - 直到 I/O 操作完成。

Processes in a Dstate will normally only be there for a fraction of a second before the operation completes and they return to R/S. In my experience, if a process gets stuck in D, it's most often trying to communicate with an unreachable NFS or other remote filesystem, trying to access a failing hard drive, or making use of some piece of hardware by way of a flaky device driver. In such cases, the only way to recover and allow the process to die is to either get the fs/drive/hardware back up and running so the I/O can complete or to give up and reboot the system. In the specific case of NFS, the mount may also eventually time out and return from the I/O operation (with a failure code), but this is dependent on the mount options and it's very common for NFS mounts to be set to wait forever.

处于某个D状态的进程通常只会在操作完成之前停留几分之一秒，然后返回到R/ S。根据我的经验，如果一个进程陷入困境D，最常见的情况是尝试与无法访问的 NFS 或其他远程文件系统进行通信，尝试访问出现故障的硬盘驱动器，或通过脆弱的设备驱动程序使用某些硬件。在这种情况下，恢复并允许进程终止的唯一方法是让 fs/驱动器/硬件备份并运行，以便 I/O 可以完成，或者放弃并重新启动系统。在 NFS 的特定情况下，挂载也可能最终超时并从 I/O 操作返回（带有失败代码），但这取决于挂载选项，并且将 NFS 挂载设置为永远等待是很常见的.

This is distinct from a zombie process, which will have a status of Z.

这与僵尸进程不同，僵尸进程的状态为Z。

Answer 2

回答by karim79

sudo killall -9 firefox

Should work

应该管用

EDIT: [PID] changed to firefox

编辑：[PID] 改为 firefox

Answer 3

回答by karim79

ps -ef | grep firefox; and you can see 3 process, kill them all.

ps -ef | grep火狐；您可以看到 3 个进程，将它们全部杀死。

Answer 4

回答by John Feminella

Double-check that the parent-id is really 1. If not, and this is firefox, first try sudo killall -9 firefox-bin. After that, try killing the specific process IDs individually with sudo killall -9 [process-id].

仔细检查 parent-id 是否真的是 1。如果不是，这是firefox，首先尝试sudo killall -9 firefox-bin。之后，尝试使用sudo killall -9 [process-id].

How is it even possible for a process not to listen to kill -9 (neiter as user nor as root)?

一个进程怎么可能不听 kill -9（既不是用户也不是 root）？

If a process has gone <defunct>and then becomes a zombiewith a parent of 1, you can't kill it manually; only initcan. Zombie processes are already dead and gone - they've lost the ability to be killed as they are no longer processes, only a process table entry and its associated exit code, waiting to be collected. You need to kill the parent, and you can't kill initfor obvious reasons.

如果一个进程消失了<defunct>，然后变成了一个父进程为 1的僵尸进程，你不能手动杀死它；只能init。僵尸进程已经死了——它们已经失去了被杀死的能力，因为它们不再是进程，只有一个进程表条目及其相关的退出代码，等待收集。你需要杀死父母，而且你不能init因为显而易见的原因而杀死。

But see herefor more general information. A reboot will kill everything, naturally.

但请参阅此处了解更多一般信息。重启自然会杀死一切。

Answer 5

回答by Eric Holmberg

You can also do a pstree and kill the parent. This makes sure that you get the entire offending process tree and not just the leaf.

您也可以执行 pstree 并杀死父级。这确保您获得整个违规进程树而不仅仅是叶子。

Answer 6

回答by Georg Sch?lly

Is it possible, that this process is restarted (for example by init) just at the time you kill it?

这个进程是否有可能在您杀死它时重新启动（例如通过 init）？

You can check this easily. If the PID is the same after kill -9 PIDthen the process wasn't killed, but if it has changed the process has been restarted.

您可以轻松检查这一点。如果之后 PID 相同，kill -9 PID则该进程不会被终止，但如果它已更改，则该进程已重新启动。

Answer 7

回答by NGI

I lately get trapped into a pitfall of Double Forkand had landed to this page before finally finding my answer. The symptoms are identical even if the problem is not the same:

我最近陷入了双叉的陷阱，并在终于找到我的答案之前登陆了这个页面。即使问题不相同，症状也相同：

WYKINWYT :What You Kill Is Not What You Thought

WYKINWYT :你杀的不是你想的

The minimal test code is shown below based on an example for an SNMP Daemon

下面显示了基于 SNMP 守护程序示例的最小测试代码

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

int main(int argc, char* argv[])
{
    //We omit the -f option (do not Fork) to reproduce the problem
    char * options[]={"/usr/local/sbin/snmpd",/*"-f","*/-d","--master=agentx", "-Dagentx","--agentXSocket=tcp:localhost:1706",  "udp:10161", (char*) NULL};

    pid_t pid = fork();
    if ( 0 > pid ) return -1;

    switch(pid)
    {
        case 0: 
        {   //Child launches SNMP daemon
            execv(options[0],options);
            exit(-2);
            break;
        }
        default: 
        {
            sleep(10); //Simulate "long" activity

            kill(pid,SIGTERM);//kill what should be child, 
                              //i.e the SNMP daemon I assume
            printf("Signal sent to %d\n",pid);

            sleep(10); //Simulate "long" operation before closing
            waitpid(pid);
            printf("SNMP should be now down\n");

            getchar();//Blocking (for observation only)
            break;
        }
    }
    printf("Bye!\n");
}

During the first phase the main process (7699) launches the SNMP daemon (7700) but we can see that this one is now Defunct/Zombie. Beside we can see another process (7702) with the options we specified

在第一阶段，主进程 (7699) 启动 SNMP 守护进程 (7700)，但我们可以看到这个进程现在是Defunct/Zombie。在旁边我们可以看到另一个进程 (7702) 带有我们指定的选项

[nils@localhost ~]$ ps -ef | tail
root       7439      2  0 23:00 ?        00:00:00 [kworker/1:0]
root       7494      2  0 23:03 ?        00:00:00 [kworker/0:1]
root       7544      2  0 23:08 ?        00:00:00 [kworker/0:2]
root       7605      2  0 23:10 ?        00:00:00 [kworker/1:2]
root       7698    729  0 23:11 ?        00:00:00 sleep 60
nils       7699   2832  0 23:11 pts/0    00:00:00 ./main
nils       7700   7699  0 23:11 pts/0    00:00:00 [snmpd] <defunct>
nils       7702      1  0 23:11 ?        00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils       7727   3706  0 23:11 pts/1    00:00:00 ps -ef
nils       7728   3706  0 23:11 pts/1    00:00:00 tail

After the 10 sec simulated we will try to kill the only process we know (7700). What we succeed at last with waitpid(). But Process 7702 is still here

在 10 秒模拟之后，我们将尝试杀死我们知道的唯一进程 (7700)。我们最终通过waitpid()取得了成功。但是进程7702还在

[nils@localhost ~]$ ps -ef | tail
root       7431      2  0 23:00 ?        00:00:00 [kworker/u256:1]
root       7439      2  0 23:00 ?        00:00:00 [kworker/1:0]
root       7494      2  0 23:03 ?        00:00:00 [kworker/0:1]
root       7544      2  0 23:08 ?        00:00:00 [kworker/0:2]
root       7605      2  0 23:10 ?        00:00:00 [kworker/1:2]
root       7698    729  0 23:11 ?        00:00:00 sleep 60
nils       7699   2832  0 23:11 pts/0    00:00:00 ./main
nils       7702      1  0 23:11 ?        00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils       7751   3706  0 23:12 pts/1    00:00:00 ps -ef
nils       7752   3706  0 23:12 pts/1    00:00:00 tail

After giving a character to the getchar() function our main process terminates but the SNMP daemon with the pid 7002 is still here

在给 getchar() 函数一个字符后，我们的主进程终止了，但是带有 pid 7002 的 SNMP 守护进程仍然在这里

[nils@localhost ~]$ ps -ef | tail
postfix    7399   1511  0 22:58 ?        00:00:00 pickup -l -t unix -u
root       7431      2  0 23:00 ?        00:00:00 [kworker/u256:1]
root       7439      2  0 23:00 ?        00:00:00 [kworker/1:0]
root       7494      2  0 23:03 ?        00:00:00 [kworker/0:1]
root       7544      2  0 23:08 ?        00:00:00 [kworker/0:2]
root       7605      2  0 23:10 ?        00:00:00 [kworker/1:2]
root       7698    729  0 23:11 ?        00:00:00 sleep 60
nils       7702      1  0 23:11 ?        00:00:00 /usr/local/sbin/snmpd -Lo -d --master=agentx -Dagentx --agentXSocket=tcp:localhost:1706 udp:10161
nils       7765   3706  0 23:12 pts/1    00:00:00 ps -ef
nils       7766   3706  0 23:12 pts/1    00:00:00 tail

Conclusion

结论

The fact that we ignored the double forkmechanism made us think that the kill action did not succeed. But in fact we simply killed the wrong process !!

我们忽略了双叉机制的事实让我们认为杀死动作没有成功。但实际上我们只是杀了错误的进程！！

By adding the -foption ( Do Not (Double) Fork ) all go as expected

通过添加-f选项（ Do Not (Double) Fork ）一切都按预期进行

Linux 上的进程的 kill -9 怎么可能不起作用？

提问by Aaron Digulla

采纳答案by Dave Sherohman

回答by karim79

回答by karim79

回答by John Feminella

回答by Eric Holmberg

回答by Georg Sch?lly

回答by NGI

相关推荐

最近更新

标签

Linux 上的进程的 kill -9 怎么可能不起作用？

提问by Aaron Digulla

采纳答案by Dave Sherohman

回答by karim79

回答by karim79

回答by John Feminella

回答by Eric Holmberg

回答by Georg Sch?lly

回答by NGI

相关推荐

如何向 C# 控制台应用程序添加计时器

Linux 如何从 C++ 程序运行 bash 脚本

Linux C：运行系统命令并获取输出？

C# 使用代码删除 WPF 中的绑定

相关推荐

最近更新

标签