Linux 在 bash 中使用命名管道 - 数据丢失问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4290684/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 00:08:35  来源:igfitidea点击:

Using named pipes with bash - Problem with data loss

linuxbashnamed-pipesdata-loss

提问by asoundmove

Did some search online, found simple 'tutorials' to use named pipes. However when I do anything with background jobs I seem to lose a lot of data.

在网上进行了一些搜索,找到了使用命名管道的简单“教程”。然而,当我对后台工作做任何事情时,我似乎丢失了很多数据。

[[Edit: found a much simpler solution, see reply to post. So the question I put forward is now academic - in case one might want a job server]]

[[编辑:找到了一个更简单的解决方案,请参阅帖子回复。所以我现在提出的问题是学术性的 - 如果有人可能想要一个工作服务器]]

Using Ubuntu 10.04 with Linux 2.6.32-25-generic #45-Ubuntu SMP Sat Oct 16 19:52:42 UTC 2010 x86_64 GNU/Linux

将 Ubuntu 10.04 与 Linux 2.6.32-25-generic #45-Ubuntu SMP 周六 10 月 16 日 19:52:42 UTC 2010 x86_64 GNU/Linux

GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu).

GNU bash,版本 4.1.5(1)-release (x86_64-pc-linux-gnu)。

My bash function is:

我的 bash 功能是:

function jqs
{
  pipe=/tmp/__job_control_manager__
  trap "rm -f $pipe; exit"  EXIT SIGKILL

  if [[ ! -p "$pipe" ]]; then
      mkfifo "$pipe"
  fi

  while true
  do
    if read txt <"$pipe"
    then
      echo "$(date +'%Y'): new text is [[$txt]]"

      if [[ "$txt" == 'quit' ]]
      then
    break
      fi
    fi
  done
}

I run this in the background:

我在后台运行这个:

> jqs&
[1] 5336

And now I feed it:

现在我喂它:

for i in 1 2 3 4 5 6 7 8
do
  (echo aaa$i > /tmp/__job_control_manager__ && echo success$i &)
done

The output is inconsistent. I frequently don't get all success echoes. I get at most as many new text echos as success echoes, sometimes less.

输出不一致。我经常没有得到所有成功的回声。我收到最多与成功回声一样多的新文本回声,有时更少。

If I remove the '&' from the 'feed', it seems to work, but I am blocked until the output is read. Hence me wanting to let sub-processes get blocked, but not the main process.

如果我从“提要”中删除“&”,它似乎可以工作,但在读取输出之前我会被阻止。因此我想让子进程被阻塞,而不是主进程。

The aim being to write a simple job control script so I can run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run.

目的是编写一个简单的作业控制脚本,这样我最多可以并行运行 10 个作业,并将其余的作业排队等待以后处理,但可靠地知道它们确实在运行。

Full job manager below:

完整的工作经理如下:

function jq_manage
{
  export __gn__=""

  pipe=/tmp/__job_control_manager_"$__gn__"__
  trap "rm -f $pipe"    EXIT
  trap "break"      SIGKILL

  if [[ ! -p "$pipe" ]]; then
      mkfifo "$pipe"
  fi

  while true
  do
    date
    jobs
    if (($(jobs | egrep "Running.*echo '%#_Group_#%_$__gn__'" | wc -l) < $__jN__))
    then
      echo "Waiting for new job"
      if read new_job <"$pipe"
      then
    echo "new job is [[$new_job]]"

    if [[ "$new_job" == 'quit' ]]
    then
      break
    fi

    echo "In group $__gn__, starting job $new_job"
    eval "(echo '%#_Group_#%_$__gn__' > /dev/null; $new_job) &"
      fi
    else
      sleep 3
    fi
  done
}

function jq
{
  # __gn__ = first parameter to this function, the job group name (the pool within which to allocate __jN__ jobs)
  # __jN__ = second parameter to this function, the maximum of job numbers to run concurrently

  export __gn__=""
  shift
  export __jN__=""
  shift

  export __jq__=$(jobs | egrep "Running.*echo '%#_GroupQueue_#%_$__gn__'" | wc -l)
  if (($__jq__ '<' 1))
  then
    eval "(echo '%#_GroupQueue_#%_$__gn__' > /dev/null; jq_manage $__gn__) &"
  fi

  pipe=/tmp/__job_control_manager_"$__gn__"__

  echo $@ >$pipe
}

Calling

打电话

jq <name> <max processes> <command>
jq abc 2 sleep 20

will start one process. That part works fine. Start a second one, fine. One by one by hand seem to work fine. But starting 10 in a loop seems to lose the system, as in the simpler example above.

将启动一个进程。那部分工作正常。开始第二个,很好。一个一个地手工似乎工作得很好。但是在一个循环中开始 10 似乎失去了系统,就像上面更简单的例子一样。

Any hints as to what I can do to solve this apparent loss of IPC data would be greatly appreciated.

任何关于我可以做些什么来解决 IPC 数据明显丢失的提示都将不胜感激。

Regards, Alain.

问候,阿兰。

采纳答案by camh

Your problem is ifstatement below:

你的问题是if下面的陈述:

while true
do
    if read txt <"$pipe"
    ....
done

What is happening is that your job queue server is opening and closing the pipe each time around the loop. This means that some of the clients are getting a "broken pipe" error when they try to write to the pipe - that is, the reader of the pipe goes away after the writer opens it.

发生的情况是您的作业队列服务器每次都在循环中打开和关闭管道。这意味着一些客户端在尝试写入管道时会遇到“管道损坏”错误——也就是说,管道的读取器在写入器打开后消失了。

To fix this, change your loop in the server open the pipe once for the entire loop:

要解决此问题,请更改服务器中的循环,为整个循环打开管道一次:

while true
do
    if read txt
    ....
done < "$pipe"

Done this way, the pipe is opened once and kept open.

通过这种方式,管道打开一次并保持打开状态。

You will need to be careful of what you run inside the loop, as all processing inside the loop will have stdin attached to the named pipe. You will want to make sure you redirect stdin of all your processes inside the loop from somewhere else, otherwise they may consume the data from the pipe.

您需要小心在循环内运行的内容,因为循环内的所有处理都将 stdin 附加到命名管道。您需要确保从其他地方重定向循环内所有进程的标准输入,否则它们可能会消耗管道中的数据。

Edit: With the problem now being that you are getting EOF on your reads when the last client closes the pipe, you can use jilles method of duping the file descriptors, or you can just make sure you are a client too and keep the write side of the pipe open:

编辑:现在的问题是当最后一个客户端关闭管道时读取 EOF,您可以使用 jilles 复制文件描述符的方法,或者您可以确保您也是客户端并保留写入端管道打开:

while true
do
    if read txt
    ....
done < "$pipe" 3> "$pipe"

This will hold the write side of the pipe open on fd 3. The same caveat applies with this file descriptor as with stdin. You will need to close it so any child processes dont inherit it. It probably matters less than with stdin, but it would be cleaner.

这将保持管道的写入端在 fd 3 上打开。同样的警告适用于这个文件描述符和 stdin。您将需要关闭它,以便任何子进程都不会继承它。它可能不如使用 stdin 重要,但它会更干净。

回答by asoundmove

On the one hand the problem is worse than I thought: Now there seems to be a case in my more complex example (jq_manage) where the same data is being read over and over again from the pipe (even though no new data is being written to it).

一方面,问题比我想象的更糟糕:现在在我更复杂的示例(jq_manage)中似乎有一个案例,其中从管道中一遍又一遍地读取相同的数据(即使没有写入新数据)到它)。

On the other hand, I found a simple solution (edited following Dennis' comment):

另一方面,我找到了一个简单的解决方案(根据丹尼斯的评论进行了编辑):

function jqn    # compute the number of jobs running in that group
{
  __jqty__=$(jobs | egrep "Running.*echo '%#_Group_#%_$__groupn__'" | wc -l)
}

function jq
{
  __groupn__="";  shift   # job group name (the pool within which to allocate $__jmax__ jobs)
  __jmax__="";    shift   # maximum of job numbers to run concurrently

  jqn
  while (($__jqty__ '>=' $__jmax__))
  do
    sleep 1
    jqn
  done

  eval "(echo '%#_Group_#%_$__groupn__' > /dev/null; $@) &"
}

Works like a charm. No socket or pipe involved. Simple.

奇迹般有效。不涉及套接字或管道。简单的。

回答by asoundmove

Like camh & Dennis Williamson say don't break the pipe.

就像卡姆和丹尼斯威廉姆森说的不要打破管道。

Now I have smaller examples, direct on the command line:

现在我有更小的例子,直接在命令行上:

Server:

服务器:

(
  for i in {0,1,2,3,4}{0,1,2,3,4,5,6,7,8,9};
  do
    if read s;
      then echo ">>$i--$s//";
    else
      echo "<<$i";
    fi;
  done < tst-fifo
)&

Client:

客户:

(
  for i in {%a,#b}{1,2}{0,1};
  do
    echo "Test-$i" > tst-fifo;
  done
)&

Can replace the key line with:

可以用以下内容替换关键行:

    (echo "Test-$i" > tst-fifo&);

All client data sent to the pipe gets read, though with option two of the client one may need to start the server a couple of times before all data is read.

发送到管道的所有客户端数据都会被读取,尽管客户端的选项二可能需要在读取所有数据之前启动服务器几次。

But although the read waits for data in the pipe to start with, once data has been pushed, it reads the empty string forever.

但是,尽管读取等待管道中的数据开始,但一旦数据被推送,它就会永远读取空字符串。

Any way to stop this?

有什么办法可以阻止这个吗?

Thanks for any insights again.

再次感谢您的任何见解。

回答by jilles

As said in other answers you need to keep the fifo open at all times to avoid losing data.

正如其他答案中所说,您需要始终保持 fifo 打开以避免丢失数据。

However, once all writers have left after the fifo has been open (so there was a writer), reads return immediately (and poll()returns POLLHUP). The only way to clear this state is to reopen the fifo.

但是,一旦所有写入者在打开 fifo 后都离开(因此有写入者),则读取立即返回(并poll()返回POLLHUP)。清除此状态的唯一方法是重新打开 fifo。

POSIX does not provide a solution to this but at least Linux and FreeBSD do: if reads start failing, open the fifo again while keeping the original descriptor open. This works because in Linux and FreeBSD the "hangup" state is local to a particular open file description, while in POSIX it is global to the fifo.

POSIX 没有为此提供解决方案,但至少 Linux 和 FreeBSD 可以:如果读取开始失败,请在保持原始描述符打开的同时再次打开 fifo。这是有效的,因为在 Linux 和 FreeBSD 中,“挂断”状态对于特定的打开文件描述是本地的,而在 POSIX 中它对于 fifo 是全局的。

This can be done in a shell script like this:

这可以在 shell 脚本中完成,如下所示:

while :; do
    exec 3<tmp/testfifo
    exec 4<&-
    while read x; do
        echo "input: $x"
    done <&3
    exec 4<&3
    exec 3<&-
done

回答by asoundmove

Just for those that might be interested, [[re-edited]] following comments by camh and jilles, here are two new versions of the test server script.

只是对于那些可能感兴趣的人,[[重新编辑]] 在 camh 和 jilles 的评论之后,这里有两个新版本的测试服务器脚本。

Both versions now works exactly as hoped.

两个版本现在都按预期工作。

camh's version for pipe management:

用于管道管理的 camh 版本:

function jqs    # Job queue manager
{
  pipe=/tmp/__job_control_manager__
  trap "rm -f $pipe; exit"  EXIT TERM

  if [[ ! -p "$pipe" ]]; then
      mkfifo "$pipe"
  fi

  while true
  do
    if read -u 3 txt
    then
      echo "$(date +'%Y'): new text is [[$txt]]"

      if [[ "$txt" == 'quit' ]]
      then
    break
      else
        sleep 1
        # process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
      fi
    fi
  done 3< "$pipe" 4> "$pipe"    # 4 is just to keep the pipe opened so any real client does not end up causing read to return EOF
}

jille's version for pipe management:

jille 的管道管理版本:

function jqs    # Job queue manager
{
  pipe=/tmp/__job_control_manager__
  trap "rm -f $pipe; exit"  EXIT TERM

  if [[ ! -p "$pipe" ]]; then
      mkfifo "$pipe"
  fi

  exec 3< "$pipe"
  exec 4<&-

  while true
  do
    if read -u 3 txt
    then
      echo "$(date +'%Y'): new text is [[$txt]]"

      if [[ "$txt" == 'quit' ]]
      then
    break
      else
        sleep 1
        # process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
      fi
    else
      # Close the pipe and reconnect it so that the next read does not end up returning EOF
      exec 4<&3
      exec 3<&-
      exec 3< "$pipe"
      exec 4<&-
    fi
  done
}

Thanks to all for your help.

感谢大家的帮助。

回答by jcalfee314

run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run

最多并行运行 10 个作业,并将其余作业排队等待稍后处理,但可靠地知道它们确实在运行

You can do this with GNU Parallel. You will not need a this scripting.

您可以使用 GNU Parallel 执行此操作。您将不需要此脚本。

http://www.gnu.org/software/parallel/man.html#options

http://www.gnu.org/software/parallel/man.html#options

You can set max-procs "Number of jobslots. Run up to N jobs in parallel." There is an option to set the number of CPU cores you want to use. You can save the list of executed jobs to a log file, but that is a beta feature.

您可以设置 max-procs "Number of jobslots. Run up to N jobs in parallel"。有一个选项可以设置要使用的 CPU 内核数。您可以将已执行作业的列表保存到日志文件中,但这是一项测试功能。