bash 如何获取管道中进程的PID

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3345460/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 22:24:22  来源:igfitidea点击:

How to get the PID of a process in a pipeline

bashawk

提问by User1

Consider the following simplified example:

考虑以下简化示例:


my_prog|awk '...' > output.csv &
my_pid="$!" #Gives the PID for awk instead of for my_prog
sleep 10
kill $my_pid #my_prog still has data in its buffer that awk never saw. Data is lost!

In bash, $my_pidpoints to the PID for awk. However, I need the PID for my_prog. If I kill awk, my_progdoes not know to flush it's output buffer and data is lost. So, how would one obtain the PID for my_prog? Note that ps aux|grep my_progwill not work since there may be several my_prog's going.

在 bash 中,$my_pid指向 PID 的awk. 但是,我需要 PID 的my_prog. 如果我 kill awkmy_prog不知道刷新它的输出缓冲区并且数据丢失。那么,如何获得 PIDmy_prog呢?请注意,这ps aux|grep my_prog将不起作用,因为可能有几个my_prog正在运行。

NOTE: changed catto awk '...'to help clarify what I need.

注意:更改catawk '...'以帮助澄清我需要的内容。

采纳答案by User1

I was able to solve it with explicitly naming the pipe using mkfifo.

我能够通过使用mkfifo.

Step 1: mkfifo capture.

第 1 步: mkfifo capture

Step 2: Run this script

第 2 步:运行此脚本


my_prog > capture &
my_pid="$!" #Now, I have the PID for my_prog!
awk '...' capture > out.csv & 
sleep 10
kill $my_pid #kill my_prog
wait #wait for awk to finish.

I don't like the management of having a mkfifo. Hopefully someone has an easier solution.

我不喜欢拥有 mkfifo 的管理。希望有人有更简单的解决方案。

回答by Marvin

Just had the same issue. My solution:

刚刚有同样的问题。我的解决方案:

process_1 | process_2 &
PID_OF_PROCESS_2=$!
PID_OF_PROCESS_1=`jobs -p`

Just make sure process_1 is the first background process. Otherwise, you need to parse the full output of jobs -l.

只要确保 process_1 是第一个后台进程。否则,您需要解析jobs -l.

回答by Matei David

Here is a solution without wrappers or temporary files. This only works for a background pipeline whose output is captured away from stdout of the containing script, as in your case. Suppose you want to do:

这是一个没有包装器或临时文件的解决方案。这仅适用于后台管道,其输出是从包含脚本的 stdout 中捕获的,就像您的情况一样。假设你想做:

cmd1 | cmd2 | cmd3 >pipe_out &
# do something with PID of cmd2

If only bash could provide ${PIPEPID[n]}!! The replacement "hack" that I found is the following:

如果只有 bash 可以提供${PIPEPID[n]}!我发现的替代“黑客”如下:

PID=$( { cmd1 | { cmd2 0<&4 & echo $! >&3 ; } 4<&0 | cmd3 >pipe_out & } 3>&1 | head -1 )

If needed, you can also close the fd 3 (for cmd*) and fd 4 (for cmd2) with 3>&-and 4<&-, respectively. If you do that, for cmd2make sure you close fd 4 only afteryou redirect fd 0 from it.

如果需要,您还可以分别使用和关闭 fd 3 (for cmd*) 和 fd 4 (for cmd2) 。如果这样做,请确保仅重定向 fd 0后才关闭 fd 4 。3>&-4<&-cmd2

回答by Demosthenex

Add a shell wrapper around your command and capture the pid. For my example I use iostat.

在您的命令周围添加一个 shell 包装器并捕获 pid。对于我的示例,我使用 iostat。

#!/bin/sh
echo $$ > /tmp/my.pid
exec iostat 1

Exec replaces the shell with the new process preserving the pid.

Exec 用保留 pid 的新进程替换 shell。

test.sh | grep avg

While that runs:

运行时:

$ cat my.pid 
22754
$ ps -ef | grep iostat
userid  22754  4058  0 12:33 pts/12   00:00:00 iostat 1

So you can:

这样你就可以:

sleep 10
kill `cat my.pid`

Is that more elegant?

是不是更优雅?

回答by Jonas Berlin

Improving @Marvin's and @Nils Goroll's answers with a oneliner that extract the pids for all commands in the pipe into a shell array variable:

使用oneliner改进@Marvin@Nils Goroll的答案,将管道中所有命令的 pid 提取到 shell 数组变量中:

# run some command
ls -l | rev | sort > /dev/null &

# collect pids
pids=(`jobs -l % | egrep -o '^(\[[0-9]+\]\+|    ) [ 0-9]{5} ' | sed -e 's/^[^ ]* \+//' -e 's! $!!'`)

# use them for something
echo pid of ls -l: ${pids[0]}
echo pid of rev: ${pids[1]}
echo pid of sort: ${pids[2]}
echo pid of first command e.g. ls -l: $pids
echo pid of last command e.g. sort: ${pids[-1]}

# wait for last command in pipe to finish
wait ${pids[-1]}

In my solution ${pids[-1]}contains the value normally available in $!. Please note the use of jobs -l %which outputs just the "current" job, which by default is the last one started.

在我的解决方案中${pids[-1]}包含通常在$!. 请注意使用jobs -l %which 只输出“当前”作业,默认情况下是最后一个启动的作业。

Sample output:

示例输出:

pid of ls -l: 2725
pid of rev: 2726
pid of sort: 2727
pid of first command e.g. ls -l: 2725
pid of last command e.g. sort: 2727

UPDATE 2017-11-13:Improved the pids=...command that works better with complex (multi-line) commands.

更新 2017-11-13:改进了pids=...更适合复杂(多行)命令的命令。

回答by msw

Based on your comment, I still can't see why you'd prefer killing my_progto having it complete in an orderly fashion. Ten seconds is a pretty arbitrary measurement on a multiprocessing system whereby my_progcould generate 10k lines or 0 lines of output depending upon system load.

根据您的评论,我仍然不明白为什么您宁愿杀死my_prog而不是有序地完成它。在多处理系统上,10 秒是一个相当随意的度量,它my_prog可以根据系统负载生成 10k 行或 0 行输出。

If you want to limit the output of my_progto something more determinate try

如果您想将 的输出限制为my_prog更确定的尝试

my_prog | head -1000 | awk

without detaching from the shell. In the worst case, head will close its input and my_prog will get a SIGPIPE. In the best case, change my_progso it gives you the amount of output you want.

不脱离外壳。在最坏的情况下,head 将关闭其输入,而 my_prog 将获得一个 SIGPIPE。在最好的情况下,进行更改,my_prog以便为您提供所需的输出量。

added in response to comment:

添加以回应评论

In so far as you have control over my_proggive it an optional -s durationargument. Then somewhere in your main loop you can put the predicate:

只要你有控制权,my_prog就给它一个可选的-s duration参数。然后在主循环中的某个地方,您可以放置​​谓词:

if (duration_exceeded()) {
    exit(0);
}

where exit will in turn properly flush the output FILEs. If desperate and there is no place to put the predicate, this could be implemented using alarm(3), which I am intentionally not showing because it is bad.

其中 exit 将依次正确刷新输出文件。如果绝望并且没有地方放置谓词,这可以使用 alarm(3) 来实现,我故意不显示它,因为它很糟糕。

The core of your trouble is that my_progruns forever. Everything else here is a hack to get around that limitation.

你的问题的核心是my_prog永远运行。这里的其他一切都是为了绕过这个限制。

回答by glenn Hymanman

With inspiration from @Demosthenex's answer: using subshells:

灵感来自@Demosthenex 的回答:使用子外壳:

$ ( echo $BASHPID > pid1; exec vmstat 1 5 ) | tail -1 & 
[1] 17371
$ cat pid1
17370
$ pgrep -fl vmstat
17370 vmstat 1 5

回答by hzpc-joostk

My solution was to query jobsand parse it using perl.
Start two pipelines in the background:

我的解决方案是jobs使用perl.
在后台启动两个管道:

$ sleep 600 | sleep 600 |sleep 600 |sleep 600 |sleep 600 &
$ sleep 600 | sleep 600 |sleep 600 |sleep 600 |sleep 600 &

Query background jobs:

查询后台作业:

$ jobs
[1]-  Running                 sleep 600 | sleep 600 | sleep 600 | sleep 600 | sleep 600 &
[2]+  Running                 sleep 600 | sleep 600 | sleep 600 | sleep 600 | sleep 600 &

$ jobs -l
[1]-  6108 Running                 sleep 600
      6109                       | sleep 600
      6110                       | sleep 600
      6111                       | sleep 600
      6112                       | sleep 600 &
[2]+  6114 Running                 sleep 600
      6115                       | sleep 600
      6116                       | sleep 600
      6117                       | sleep 600
      6118                       | sleep 600 &

Parse the jobs list of the second job %2. The parsing is probably error prone, but in these cases it works. We aim to capture the first number followed by a space. It is stored into the variable pidsas an array using the parenthesis:

解析第二个作业的作业列表%2。解析可能容易出错,但在这些情况下它有效。我们的目标是捕获第一个数字后跟一个空格。它pids使用括号作为数组存储到变量中:

$ pids=($(jobs -l %2 | perl -pe '/(\d+) /; $_= . "\n"'))
$ echo $pids
6114
$ echo ${pids[*]}
6114 6115 6116 6117 6118
$ echo ${pids[2]}
6116
$ echo ${pids[4]}
6118

And for the first pipeline:

对于第一个管道:

$ pids=($(jobs -l %1 | perl -pe '/(\d+) /; $_= . "\n"'))
$ echo ${pids[2]}
6110
$ echo ${pids[4]}
6112

We could wrap this into a little alias/function:

我们可以把它包装成一个小别名/函数:

function pipeid() { jobs -l ${1:-%%} | perl -pe '/(\d+) /; $_= . "\n"'; }
$ pids=($(pipeid))     # PIDs of last job
$ pids=($(pipeid %1))  # PIDs of first job

I have tested this in bashand zsh. Unfortunately, in bashI could not pipe the output of pipeid into another command. Probably because that pipeline is ran in a sub shell not able to query the job list??

我已经在bash和 中对此进行了测试zsh。不幸的是,bash我无法将 pipeid 的输出通过管道传输到另一个命令中。可能是因为该管道在无法查询作业列表的子 shell 中运行?

回答by Nils Goroll

I was desperately looking for good solution to get all the PIDs from a pipe job, and one promising approach failed miserably (see previous revisions of this answer).

我正在拼命寻找从管道作业中获取所有 PID 的好解决方案,但一种有希望的方法惨遭失败(请参阅此答案的先前修订版)。

So, unfortunately, the best I could come up with is parsing the jobs -loutput using GNU awk:

所以,不幸的是,我能想到的最好的方法是jobs -l使用 GNU awk解析输出:

function last_job_pids {
    if [[ -z "" ]] ; then
        return
    fi

    jobs -l | awk '
        /^\[/ { delete pids; pids[]=; seen=1; next; }
        // { if (seen) { pids[]=; } }
        END { for (p in pids) print p; }'
}