与 cat 相比,Bash while read 循环非常慢,为什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13762625/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bash while read loop extremely slow compared to cat, why?
提问by David Parks
A simple test script here:
一个简单的测试脚本在这里:
while read LINE; do
LINECOUNT=$(($LINECOUNT+1))
if [[ $(($LINECOUNT % 1000)) -eq 0 ]]; then echo $LINECOUNT; fi
done
When I do cat my450klinefile.txt | myscriptthe CPU locks up at 100% and it can process about 1000 lines a second. About 5 minutes to process what cat my450klinefile.txt >/dev/nulldoes in half a second.
当我这样做时cat my450klinefile.txt | myscript,CPU 锁定在 100% 并且每秒可以处理大约 1000 行。大约 5 分钟来处理cat my450klinefile.txt >/dev/null在半秒内执行的操作。
Is there a more efficient wayto do essentially this. I just need to read a line from stdin, count the bytes, and write it out to a named pipe. But the speed of even this example is impossibly slow.
有没有更有效的方法来做到这一点。我只需要从 stdin 中读取一行,计算字节数,然后将其写出到命名管道中。但即使是这个例子的速度也慢得令人难以置信。
Every 1Gb of input lines I need to do a few more complex scripting actions (close and open some pipes that the data is being feed to).
每 1Gb 的输入行我都需要做一些更复杂的脚本操作(关闭和打开一些正在输入数据的管道)。
回答by William Pursell
The reason while readis so slow is that the shell is required to make a system call for every byte. It cannot read a large buffer from the pipe, because the shell must not read more than one line from the input stream and therefore must compare each character against a newline. If you run straceon a while readloop, you can see this behavior. This behavior is desirable, because it makes it possible to reliably do things like:
之所以while read这么慢,是因为 shell 需要对每个字节进行一次系统调用。它无法从管道中读取大缓冲区,因为 shell 不能从输入流中读取多于一行,因此必须将每个字符与换行符进行比较。如果您strace在while read循环中运行,您可以看到这种行为。这种行为是可取的,因为它可以可靠地执行以下操作:
while read size; do dd bs=$size count=1 of=file$(( i++ )); done
in which the commands inside the loop are reading from the same stream that the shell reads from. If the shell consumed a big chunk of data by reading large buffers, the inner commands would not have access to that data. An unfortunate side-effect is that readis absurdly slow.
其中循环内的命令从 shell 读取的同一流中读取。如果 shell 通过读取大缓冲区消耗了大量数据,则内部命令将无法访问该数据。一个不幸的副作用是read速度太慢了。
回答by paxdiablo
It's because the bashscript is interpreted and not really optimised for speed in this case. You're usually better off using one of the external tools such as:
这是因为bash在这种情况下脚本被解释并且没有真正针对速度进行优化。通常最好使用一种外部工具,例如:
awk 'NR%1000==0{print}' inputFile
which matches your "print every 1000 lines" sample.
与您的“每 1000 行打印一次”示例相匹配。
If you wanted to (for each line) output the line count in characters followed by the line itself, and pipe it through another process, you could also do that:
如果您想(对于每一行)输出以字符为单位的行数,然后是行本身,并将其通过另一个进程进行管道传输,您也可以这样做:
awk '{print length(perl -p -e '
use Encode;
print length(Encode::encode_utf8($_))."\n";$_=""'
)" "dd if=/dev/urandom bs=1M count=100 |
perl -p -e 'use Encode;print length(Encode::encode_utf8($_))."\n";$_=""' |
tail
}' inputFile | someOtherProcess
Tools like awk, sed, grep, cutand the more powerful perlare far more suited to these tasks than an interpreted shell script.
之类的工具awk,sed,grep,cut和更强大的perl是更加适合于这些任务不是解释shell脚本。
回答by zb'
The perl solution for count bytes of each string:
每个字符串的 count 个字节的 perl 解决方案:
dd if=/dev/urandom bs=1M count=100 >/dev/null
for example:
例如:
while read line
do
echo $line
done <file.txt
works for me as 7.7Mb/s
对我来说有效为 7.7Mb/s
to compare how much script used:
比较使用了多少脚本:
##代码##run as 9.1Mb/s
以 9.1Mb/s 的速度运行
seems script not so slow :)
似乎脚本没那么慢:)
回答by Arnestig
Not really sure what your script is supposed to do. So this might not be an answer to your question but more of a generic tip.
不太确定你的脚本应该做什么。因此,这可能不是您问题的答案,而是更多的通用提示。
Don't catyour file and pipe it to your script, instead when reading from a file with a bash script do it like this:
不要将cat您的文件通过管道传递给您的脚本,而是在使用 bash 脚本从文件中读取时,请执行以下操作:

