Linux 进程状态
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1475683/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Linux Process States
提问by Blair
In Linux, what happens to the state of a process when it needs to read blocks from a disk? Is it blocked? If so, how is another process chosen to execute?
在 Linux 中,当需要从磁盘读取块时,进程的状态会发生什么变化?被屏蔽了吗?如果是这样,如何选择另一个进程来执行?
采纳答案by Tim Post
While waiting for read()
or write()
to/from a file descriptor return, the process will be put in a special kind of sleep, known as "D" or "Disk Sleep". This is special, because the process can not be killed or interrupted while in such a state. A process waiting for a return from ioctl() would also be put to sleep in this manner.
在等待文件描述符返回read()
或write()
从文件描述符返回时,进程将进入一种特殊的睡眠状态,称为“D”或“磁盘睡眠”。这是特殊的,因为在这种状态下无法杀死或中断进程。等待 ioctl() 返回的进程也会以这种方式进入睡眠状态。
An exception to this is when a file (such as a terminal or other character device) is opened in O_NONBLOCK
mode, passed when its assumed that a device (such as a modem) will need time to initialize. However, you indicated block devices in your question. Also, I have never tried an ioctl()
that is likely to block on a fd opened in non blocking mode (at least not knowingly).
一个例外是当文件(例如终端或其他字符设备)以O_NONBLOCK
模式打开时,当它假定设备(例如调制解调器)需要时间来初始化时传递。但是,您在问题中指出了块设备。另外,我从未尝试过ioctl()
可能会阻塞以非阻塞模式打开的 fd(至少不是故意的)。
How another process is chosen depends entirely on the scheduler you are using, as well as what other processes might have done to modify their weights within that scheduler.
如何选择另一个进程完全取决于您使用的调度程序,以及其他进程在该调度程序中修改它们的权重可能会做什么。
Some user space programs under certain circumstances have been known to remain in this state forever, until rebooted. These are typically grouped in with other "zombies", but the term would not be correct as they are not technically defunct.
众所周知,某些用户空间程序在某些情况下会永远保持这种状态,直到重新启动。这些通常与其他“僵尸”归为一组,但该术语不正确,因为它们在技术上并未失效。
回答by derobert
Assuming your process is a single thread, and that you're using blocking I/O, your process will block waiting for the I/O to complete. The kernel will pick another process to run in the meantime based on niceness, priority, last run time, etc. If there are no other runnable processes, the kernel won't run any; instead, it'll tell the hardware the machine is idle (which will result in lower power consumption).
假设您的进程是一个单线程,并且您正在使用阻塞 I/O,您的进程将阻塞等待 I/O 完成。内核会根据niceness、优先级、上次运行时间等选择另一个进程同时运行。如果没有其他可运行的进程,内核将不会运行;相反,它会告诉硬件机器处于空闲状态(这将导致功耗降低)。
Processes that are waiting for I/O to complete typically show up in state D in, e.g., ps
and top
.
等待 I/O 完成的进程通常出现在状态 D 中,例如,ps
和top
。
回答by Martin v. L?wis
Yes, tasks waiting for IO are blocked, and other tasks get executed. Selecting the next task is done by the Linux scheduler.
是的,等待 IO 的任务被阻塞,其他任务被执行。选择下一个任务由Linux 调度程序完成。
回答by Benno
Generally the process will block. If the read operation is on a file descriptor marked as non-blocking or if the process is using asynchronous IO it won't block. Also if the process has other threads that aren't blocked they can continue running.
一般进程会阻塞。如果读取操作在标记为非阻塞的文件描述符上,或者如果进程使用异步 IO,它不会阻塞。此外,如果进程有其他未被阻塞的线程,它们可以继续运行。
The decision as to which process runs next is up to the schedulerin the kernel.
接下来运行哪个进程的决定取决于内核中的调度程序。
回答by user224579
A process performing I/O will be put in D state (uninterruptable sleep), which frees the CPU until there is a hardware interrupt which tells the CPU to return to executing the program. See man ps
for the other process states.
执行 I/O 的进程将进入D 状态(不可中断睡眠),这会释放 CPU,直到出现硬件中断告诉 CPU 返回执行程序。有关man ps
其他进程状态,请参阅。
Depending on your kernel, there is a process scheduler, which keeps track of a runqueue of processes ready to execute. It, along with a scheduling algorithm, tells the kernel which process to assign to which CPU. There are kernel processes and user processes to consider. Each process is allocated a time-slice, which is a chunk of CPU time it is allowed to use. Once the process uses all of its time-slice, it is marked as expired and given lower priority in the scheduling algorithm.
根据您的内核,有一个进程调度程序,它跟踪准备执行的进程的运行队列。它与调度算法一起告诉内核将哪个进程分配给哪个 CPU。有内核进程和用户进程需要考虑。每个进程都分配了一个时间片,这是允许使用的 CPU 时间块。一旦进程使用了它的所有时间片,它就会被标记为过期并在调度算法中给予较低的优先级。
In the 2.6 kernel, there is a O(1) time complexity scheduler, so no matter how many processes you have up running, it will assign CPUs in constant time. It is more complicated though, since 2.6 introduced preemption and CPU load balancing is not an easy algorithm. In any case, it's efficient and CPUs will not remain idle while you wait for the I/O.
在2.6 内核中,有一个O(1) 时间复杂度调度程序,因此无论您运行了多少个进程,它都会在恒定时间内分配 CPU。但是它更复杂,因为 2.6 引入了抢占和 CPU 负载平衡不是一个简单的算法。在任何情况下,它都是高效的,并且在您等待 I/O 时 CPU 不会保持空闲状态。
回答by MarkR
Yes, the task gets blocked in the read() system call. Another task which is ready runs, or if no other tasks are ready, the idle task (for that CPU) runs.
是的,该任务在 read() 系统调用中被阻塞。另一个准备好的任务运行,或者如果没有其他任务准备好,空闲任务(针对该 CPU)运行。
A normal, blocking disc read causes the task to enter the "D" state (as others have noted). Such tasks contribute to the load average, even though they're not consuming the CPU.
正常的、阻塞的磁盘读取会导致任务进入“D”状态(正如其他人所指出的)。此类任务会影响平均负载,即使它们不消耗 CPU。
Some other types of IO, especially ttys and network, do not behave quite the same - the process ends up in "S" state and can be interrupted and doesn't count against the load average.
一些其他类型的 IO,尤其是 ttys 和网络,行为并不完全相同 - 进程最终处于“S”状态并且可以被中断并且不计入平均负载。
回答by zerodeux
When a process needs to fetch data from a disk, it effectively stops running on the CPU to let other processes run because the operation might take a long time to complete – at least 5ms seek time for a disk is common, and 5ms is 10 million CPU cycles, an eternity from the point of view of the program!
当一个进程需要从磁盘获取数据时,它实际上会停止在 CPU 上运行,让其他进程运行,因为该操作可能需要很长时间才能完成——磁盘的寻道时间至少为 5ms 是常见的,5ms 是 1000 万CPU 周期,从程序的角度来看是永恒的!
From the programmer point of view (also said "in userspace"), this is called a blocking system call. If you call write(2)
(which is a thin libc wrapper around the system call of the same name), your process does not exactly stop at that boundary; it continues, in the kernel, running the system call code. Most of the time it goes all the way up to a specific disk controller driver (filename → filesystem/VFS → block device → device driver), where a command to fetch a block on disk is submitted to the proper hardware, which is a very fast operation most of the time.
从程序员的角度(也称为“在用户空间”),这称为阻塞系统调用。如果你调用write(2)
(这是一个围绕同名系统调用的瘦 libc 包装器),你的进程不会完全停在那个边界;它继续在内核中运行系统调用代码。大多数情况下,它一直到特定的磁盘控制器驱动程序(文件名→文件系统/VFS→块设备→设备驱动程序),其中将获取磁盘块的命令提交给适当的硬件,这是一个非常大多数情况下快速操作。
THEN the process is put in sleep state(in kernel space, blocking is called sleeping – nothing is ever 'blocked' from the kernel point of view). It will be awakened once the hardware has finally fetched the proper data, then the process will be marked as runnableand will be scheduled. Eventually, the scheduler will run the process.
然后进程进入睡眠状态(在内核空间中,阻塞被称为睡眠——从内核的角度来看,没有任何东西被“阻塞”)。一旦硬件最终获取了正确的数据,它将被唤醒,然后该进程将被标记为可运行并被调度。最终,调度程序将运行该进程。
Finally, in userspace, the blocking system callreturns with proper status and data, and the program flow goes on.
最后,在用户空间,阻塞系统调用返回正确的状态和数据,程序流程继续。
It is possible to invoke most I/O system calls in non-blocking mode(see O_NONBLOCK
in open(2)
and fcntl(2)
). In this case, the system calls return immediately and only report submitting the disk operation. The programmer will have to explicitly check at a later time whether the operation completed, successfully or not, and fetch its result (e.g., with select(2)
). This is called asynchronous or event-based programming.
它可以调用大部分的I / O系统调用非阻塞模式(见O_NONBLOCK
中open(2)
和fcntl(2)
)。在这种情况下,系统调用立即返回,只报告提交磁盘操作。程序员稍后必须明确检查操作是否完成、成功与否,并获取其结果(例如,使用select(2)
)。这称为异步或基于事件的编程。
Most answers here mentioning the D state(which is called TASK_UNINTERRUPTIBLE
in the Linux state names) are incorrect. The Dstate is a special sleep mode which is only triggered in a kernel space code path, when that code path can't be interrupted(because it would be too complex to program), with the expectation that it would block only for a very short time. I believe that most "D states" are actually invisible; they are very short lived and can't be observed by sampling tools such as 'top'.
这里提到D 状态(TASK_UNINTERRUPTIBLE
在 Linux 状态名称中称为)的大多数答案都不正确。在d状态是一种特殊的睡眠模式,这是只有在内核空间的代码路径,当代码路径引发不能被中断(因为这将是太复杂,程序),并期望它只会阻止了很短时间。我相信大多数“D 状态”实际上是不可见的;它们的寿命很短,无法通过诸如“top”之类的采样工具观察到。
You can encounter unkillable processes in the D state in a few situations. NFS is famous for that, and I've encountered it many times. I think there's a semantic clash between some VFS code paths, which assume to always reach local disks and fast error detection (on SATA, an error timeout would be around a few 100 ms), and NFS, which actually fetches data from the network which is more resilient and has slow recovery (a TCP timeout of 300 seconds is common). Read this articlefor the cool solution introduced in Linux 2.6.25 with the TASK_KILLABLE
state. Before this era there was a hack where you could actually send signals to NFS process clients by sending a SIGKILL to the kernel thread rpciod
, but forget about that ugly trick.…
在某些情况下,您可能会遇到处于 D 状态的无法终止的进程。NFS 因这个而闻名,我遇到过很多次。我认为一些 VFS 代码路径之间存在语义冲突,它们假设总是到达本地磁盘和快速错误检测(在 SATA 上,错误超时大约为 100 毫秒)和 NFS,它实际上从网络获取数据更具弹性且恢复速度较慢(TCP 超时 300 秒很常见)。阅读本文以了解在 Linux 2.6.25 中引入的超酷解决方案TASK_KILLABLE
。在这个时代之前,你可以通过向内核线程发送 SIGKILL 来实际向 NFS 进程客户端发送信号rpciod
,但忘记那个丑陋的技巧......
回答by Valerio Di Giampietro
As already explained by others, processes in "D" state (uninterruptible sleep) are responsible for the hang of ps process. To me it has happened many times with RedHat 6.x and automounted NFS home directories.
正如其他人已经解释的那样,处于“D”状态(不间断睡眠)的进程负责 ps 进程的挂起。对我来说,它在 RedHat 6.x 和自动挂载的 NFS 主目录中发生过很多次。
To list processes in D state you can use the following commands:
要列出处于 D 状态的进程,您可以使用以下命令:
cd /proc
for i in [0-9]*;do echo -n "$i :";cat $i/status |grep ^State;done|grep D
To know the current directory of the process and, may be, the mounted NFS disk that has issues you can use a command similar to the following example (replace 31134 with the sleeping process number):
要了解进程的当前目录以及可能存在问题的已挂载 NFS 磁盘,您可以使用类似于以下示例的命令(将 31134 替换为睡眠进程号):
# ls -l /proc/31134/cwd
lrwxrwxrwx 1 pippo users 0 Aug 2 16:25 /proc/31134/cwd -> /auto/pippo
I found that giving the umount command with the -f (force) switch, to the related mounted nfs file system, was able to wake-up the sleeping process:
我发现将带有 -f(强制)开关的 umount 命令提供给相关挂载的 nfs 文件系统,能够唤醒睡眠过程:
umount -f /auto/pippo
the file system wasn't unmounted, because it was busy, but the related process did wake-up and I was able to solve the issue without rebooting.
文件系统没有卸载,因为它很忙,但相关进程确实唤醒了,我能够在不重新启动的情况下解决问题。