bash 如何获取管道中进程的PID
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3345460/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get the PID of a process in a pipeline
提问by User1
Consider the following simplified example:
考虑以下简化示例:
my_prog|awk '...' > output.csv &
my_pid="$!" #Gives the PID for awk instead of for my_prog
sleep 10
kill $my_pid #my_prog still has data in its buffer that awk never saw. Data is lost!
In bash, $my_pidpoints to the PID for awk. However, I need the PID for my_prog. If I kill awk, my_progdoes not know to flush it's output buffer and data is lost. So, how would one obtain the PID for my_prog? Note that ps aux|grep my_progwill not work since there may be several my_prog's going.
在 bash 中,$my_pid指向 PID 的awk. 但是,我需要 PID 的my_prog. 如果我 kill awk,my_prog不知道刷新它的输出缓冲区并且数据丢失。那么,如何获得 PIDmy_prog呢?请注意,这ps aux|grep my_prog将不起作用,因为可能有几个my_prog正在运行。
NOTE: changed catto awk '...'to help clarify what I need.
注意:更改cat为awk '...'以帮助澄清我需要的内容。
采纳答案by User1
I was able to solve it with explicitly naming the pipe using mkfifo.
我能够通过使用mkfifo.
Step 1: mkfifo capture.
第 1 步: mkfifo capture。
Step 2: Run this script
第 2 步:运行此脚本
my_prog > capture &
my_pid="$!" #Now, I have the PID for my_prog!
awk '...' capture > out.csv &
sleep 10
kill $my_pid #kill my_prog
wait #wait for awk to finish.
I don't like the management of having a mkfifo. Hopefully someone has an easier solution.
我不喜欢拥有 mkfifo 的管理。希望有人有更简单的解决方案。
回答by Marvin
Just had the same issue. My solution:
刚刚有同样的问题。我的解决方案:
process_1 | process_2 &
PID_OF_PROCESS_2=$!
PID_OF_PROCESS_1=`jobs -p`
Just make sure process_1 is the first background process. Otherwise, you need to parse the full output of jobs -l.
只要确保 process_1 是第一个后台进程。否则,您需要解析jobs -l.
回答by Matei David
Here is a solution without wrappers or temporary files. This only works for a background pipeline whose output is captured away from stdout of the containing script, as in your case. Suppose you want to do:
这是一个没有包装器或临时文件的解决方案。这仅适用于后台管道,其输出是从包含脚本的 stdout 中捕获的,就像您的情况一样。假设你想做:
cmd1 | cmd2 | cmd3 >pipe_out &
# do something with PID of cmd2
If only bash could provide ${PIPEPID[n]}!! The replacement "hack" that I found is the following:
如果只有 bash 可以提供${PIPEPID[n]}!我发现的替代“黑客”如下:
PID=$( { cmd1 | { cmd2 0<&4 & echo $! >&3 ; } 4<&0 | cmd3 >pipe_out & } 3>&1 | head -1 )
If needed, you can also close the fd 3 (for cmd*) and fd 4 (for cmd2) with 3>&-and 4<&-, respectively. If you do that, for cmd2make sure you close fd 4 only afteryou redirect fd 0 from it.
如果需要,您还可以分别使用和关闭 fd 3 (for cmd*) 和 fd 4 (for cmd2) 。如果这样做,请确保仅在重定向 fd 0后才关闭 fd 4 。3>&-4<&-cmd2
回答by Demosthenex
Add a shell wrapper around your command and capture the pid. For my example I use iostat.
在您的命令周围添加一个 shell 包装器并捕获 pid。对于我的示例,我使用 iostat。
#!/bin/sh
echo $$ > /tmp/my.pid
exec iostat 1
Exec replaces the shell with the new process preserving the pid.
Exec 用保留 pid 的新进程替换 shell。
test.sh | grep avg
While that runs:
运行时:
$ cat my.pid
22754
$ ps -ef | grep iostat
userid 22754 4058 0 12:33 pts/12 00:00:00 iostat 1
So you can:
这样你就可以:
sleep 10
kill `cat my.pid`
Is that more elegant?
是不是更优雅?
回答by Jonas Berlin
Improving @Marvin's and @Nils Goroll's answers with a oneliner that extract the pids for all commands in the pipe into a shell array variable:
使用oneliner改进@Marvin和@Nils Goroll的答案,将管道中所有命令的 pid 提取到 shell 数组变量中:
# run some command
ls -l | rev | sort > /dev/null &
# collect pids
pids=(`jobs -l % | egrep -o '^(\[[0-9]+\]\+| ) [ 0-9]{5} ' | sed -e 's/^[^ ]* \+//' -e 's! $!!'`)
# use them for something
echo pid of ls -l: ${pids[0]}
echo pid of rev: ${pids[1]}
echo pid of sort: ${pids[2]}
echo pid of first command e.g. ls -l: $pids
echo pid of last command e.g. sort: ${pids[-1]}
# wait for last command in pipe to finish
wait ${pids[-1]}
In my solution ${pids[-1]}contains the value normally available in $!. Please note the use of jobs -l %which outputs just the "current" job, which by default is the last one started.
在我的解决方案中${pids[-1]}包含通常在$!. 请注意使用jobs -l %which 只输出“当前”作业,默认情况下是最后一个启动的作业。
Sample output:
示例输出:
pid of ls -l: 2725
pid of rev: 2726
pid of sort: 2727
pid of first command e.g. ls -l: 2725
pid of last command e.g. sort: 2727
UPDATE 2017-11-13:Improved the pids=...command that works better with complex (multi-line) commands.
更新 2017-11-13:改进了pids=...更适合复杂(多行)命令的命令。
回答by msw
Based on your comment, I still can't see why you'd prefer killing my_progto having it complete in an orderly fashion. Ten seconds is a pretty arbitrary measurement on a multiprocessing system whereby my_progcould generate 10k lines or 0 lines of output depending upon system load.
根据您的评论,我仍然不明白为什么您宁愿杀死my_prog而不是有序地完成它。在多处理系统上,10 秒是一个相当随意的度量,它my_prog可以根据系统负载生成 10k 行或 0 行输出。
If you want to limit the output of my_progto something more determinate try
如果您想将 的输出限制为my_prog更确定的尝试
my_prog | head -1000 | awk
without detaching from the shell. In the worst case, head will close its input and my_prog will get a SIGPIPE. In the best case, change my_progso it gives you the amount of output you want.
不脱离外壳。在最坏的情况下,head 将关闭其输入,而 my_prog 将获得一个 SIGPIPE。在最好的情况下,进行更改,my_prog以便为您提供所需的输出量。
added in response to comment:
添加以回应评论:
In so far as you have control over my_proggive it an optional -s durationargument. Then somewhere in your main loop you can put the predicate:
只要你有控制权,my_prog就给它一个可选的-s duration参数。然后在主循环中的某个地方,您可以放置谓词:
if (duration_exceeded()) {
exit(0);
}
where exit will in turn properly flush the output FILEs. If desperate and there is no place to put the predicate, this could be implemented using alarm(3), which I am intentionally not showing because it is bad.
其中 exit 将依次正确刷新输出文件。如果绝望并且没有地方放置谓词,这可以使用 alarm(3) 来实现,我故意不显示它,因为它很糟糕。
The core of your trouble is that my_progruns forever. Everything else here is a hack to get around that limitation.
你的问题的核心是my_prog永远运行。这里的其他一切都是为了绕过这个限制。
回答by glenn Hymanman
With inspiration from @Demosthenex's answer: using subshells:
灵感来自@Demosthenex 的回答:使用子外壳:
$ ( echo $BASHPID > pid1; exec vmstat 1 5 ) | tail -1 &
[1] 17371
$ cat pid1
17370
$ pgrep -fl vmstat
17370 vmstat 1 5
回答by hzpc-joostk
My solution was to query jobsand parse it using perl.
Start two pipelines in the background:
我的解决方案是jobs使用perl.
在后台启动两个管道:
$ sleep 600 | sleep 600 |sleep 600 |sleep 600 |sleep 600 &
$ sleep 600 | sleep 600 |sleep 600 |sleep 600 |sleep 600 &
Query background jobs:
查询后台作业:
$ jobs
[1]- Running sleep 600 | sleep 600 | sleep 600 | sleep 600 | sleep 600 &
[2]+ Running sleep 600 | sleep 600 | sleep 600 | sleep 600 | sleep 600 &
$ jobs -l
[1]- 6108 Running sleep 600
6109 | sleep 600
6110 | sleep 600
6111 | sleep 600
6112 | sleep 600 &
[2]+ 6114 Running sleep 600
6115 | sleep 600
6116 | sleep 600
6117 | sleep 600
6118 | sleep 600 &
Parse the jobs list of the second job %2. The parsing is probably error prone, but in these cases it works. We aim to capture the first number followed by a space. It is stored into the variable pidsas an array using the parenthesis:
解析第二个作业的作业列表%2。解析可能容易出错,但在这些情况下它有效。我们的目标是捕获第一个数字后跟一个空格。它pids使用括号作为数组存储到变量中:
$ pids=($(jobs -l %2 | perl -pe '/(\d+) /; $_= . "\n"'))
$ echo $pids
6114
$ echo ${pids[*]}
6114 6115 6116 6117 6118
$ echo ${pids[2]}
6116
$ echo ${pids[4]}
6118
And for the first pipeline:
对于第一个管道:
$ pids=($(jobs -l %1 | perl -pe '/(\d+) /; $_= . "\n"'))
$ echo ${pids[2]}
6110
$ echo ${pids[4]}
6112
We could wrap this into a little alias/function:
我们可以把它包装成一个小别名/函数:
function pipeid() { jobs -l ${1:-%%} | perl -pe '/(\d+) /; $_= . "\n"'; }
$ pids=($(pipeid)) # PIDs of last job
$ pids=($(pipeid %1)) # PIDs of first job
I have tested this in bashand zsh. Unfortunately, in bashI could not pipe the output of pipeid into another command. Probably because that pipeline is ran in a sub shell not able to query the job list??
我已经在bash和 中对此进行了测试zsh。不幸的是,bash我无法将 pipeid 的输出通过管道传输到另一个命令中。可能是因为该管道在无法查询作业列表的子 shell 中运行?
回答by Nils Goroll
I was desperately looking for good solution to get all the PIDs from a pipe job, and one promising approach failed miserably (see previous revisions of this answer).
我正在拼命寻找从管道作业中获取所有 PID 的好解决方案,但一种有希望的方法惨遭失败(请参阅此答案的先前修订版)。
So, unfortunately, the best I could come up with is parsing the jobs -loutput using GNU awk:
所以,不幸的是,我能想到的最好的方法是jobs -l使用 GNU awk解析输出:
function last_job_pids {
if [[ -z "" ]] ; then
return
fi
jobs -l | awk '
/^\[/ { delete pids; pids[]=; seen=1; next; }
// { if (seen) { pids[]=; } }
END { for (p in pids) print p; }'
}

