bash 什么时候 xargs 比 while-read 循环更受欢迎?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2574134/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
When should xargs be preferred over while-read loops?
提问by Charles Stewart
xargsis widely used in shell scripting; it is usually easy to recast these uses in bash using while read -r; do ... doneor while read -ar; do ... doneloops.
xargs广泛用于shell脚本;通常很容易在 bash usingwhile read -r; do ... done或while read -ar; do ... doneloops 中重铸这些用途。
When should xargsbe preferred, and when should while-read loops be preferred?
什么时候应该xargs优先,什么时候应该优先使用 while-read 循环?
回答by paxdiablo
The thing with whileloops is that they tend to process one item at a time, often when it's unnecessary. This is where xargshas an advantage - it can batch up the arguments to allow one command to process lots of items.
用的东西while环路是,他们往往在一次处理一个项目,往往当它是不必要的。这是xargs一个优势 - 它可以批量处理参数以允许一个命令处理大量项目。
For example, a while loop:
例如,一个while循环:
pax> echo '1
2
3
4
5' | while read -r; do echo $REPLY; done
1
2
3
4
5
and the corresponding xargs:
和相应的xargs:
pax> echo '1
2
3
4
5' | xargs echo
1 2 3 4 5
Here you can see that the lines are processed one-by-one with the whileand altogether with the xargs. In other words, the former is equivalent to echo 1 ; echo 2 ; echo 3 ; echo 4 ; echo 5while the latter is equivalent to echo 1 2 3 4 5(five processes as opposed to one). This really makes a difference when processing thousands or tens of thousands of lines, since process creation takes time.
在这里您可以看到,这些行与while和 一起被一一处理xargs。换句话说,前者等价于echo 1 ; echo 2 ; echo 3 ; echo 4 ; echo 5后者等价于echo 1 2 3 4 5(五个进程相对于一个进程)。这在处理数千或数万行时确实会有所不同,因为流程创建需要时间。
It's mostly advantageous when using commands that can accept multiple arguments since it reduces the number of individual processes started, making things much faster.
当使用可以接受多个参数的命令时,它最有利,因为它减少了启动的单个进程的数量,使事情变得更快。
When I'm processing small files or the commands to run on each item are complicated (where I'm too lazy to write a separate script to give to xargs), I will use the whilevariant.
当我处理小文件或在每个项目上运行的命令很复杂时(我懒得编写单独的脚本来提供给xargs),我将使用while变体。
Where I'm interested in performance (large files), I will use xargs, even if I have to write a separate script.
在我对性能(大文件)感兴趣的地方,我会使用xargs,即使我必须编写单独的脚本。
回答by ony
"xargs" have option "-n max-args", which I guess will allow to call command for several arguments at-once (useful for "grep", "rm" and many more such programs) Try example from man-page:
“xargs”有选项“-n max-args”,我想这将允许一次调用多个参数的命令(对“grep”、“rm”和更多此类程序有用)尝试手册页中的示例:
cut -d: -f1 < /etc/passwd | sort | xargs -n 5 echo
And you'll see that it "echo"-ed 5 users per line
你会看到它“回显”每行 5 个用户
P.S. And don't forget that "xargs" - is program (like subshell). So no way to get information to your shell-script in an easy way (you'll need to read output of your "xargs" and interpret somehow to fill-up your shell/env-variables).
PS并且不要忘记“xargs” - 是程序(如子shell)。因此无法以简单的方式将信息获取到您的 shell 脚本(您需要读取“xargs”的输出并以某种方式解释以填充您的 shell/env 变量)。
回答by ndim
Some implementations of xargsalso understand a -P MAX-PROCSargument which lets xargsrun multiple jobs in parallel. This would be quite difficult to simulate with a while readloop.
的一些实现xargs也理解-P MAX-PROCS允许xargs并行运行多个作业的参数。这将很难用while read循环来模拟。
回答by Ole Tange
GNU Parallel http://www.gnu.org/software/parallel/has the advantages from xargs(using -m) and the advantage of while-readwith newline as separator and some new features (e.g. grouping of output, parallel running of jobs on remote computers, and context replace).
GNU Parallel http://www.gnu.org/software/parallel/具有xargs(使用 -m)的优点以及while-read使用换行符作为分隔符的优点和一些新功能(例如输出分组,在远程计算机上并行运行作业, 和上下文替换)。
If you have GNU Parallel installed I cannot see a single situation in which you would use xargs. And the only situation in which I would use read-whilewould be if the block to execute is so big it becomes unreadable to put in a single line (e.g. if it contains if-statements or similar) and you refuse to make a bash function.
如果您安装了 GNU Parallel,我看不到您将使用xargs. 我唯一会使用的read-while情况是,如果要执行的块太大以至于无法放入一行中(例如,如果它包含 if 语句或类似语句)并且您拒绝创建 bash 函数。
For all the small scripts I actually find it more readable to use GNU Parallel. paxdiablo's example:
对于所有的小脚本,我实际上发现使用 GNU Parallel 更具可读性。paxdiablo 的例子:
echo '1
2
3
4
5' | parallel -m echo
Converting of WAV files to MP3 using GNU Parallel:
使用 GNU Parallel 将 WAV 文件转换为 MP3:
find sounddir -type f -name '*.wav' | parallel -j+0 lame {} -o {.}.mp3
Watch the intro video for GNU Parallel: http://www.youtube.com/watch?v=OpaiGYxkSuQ
观看 GNU Parallel 的介绍视频:http: //www.youtube.com/watch?v=OpaiGYxkSuQ
回答by Andrey Taranov
On the opposite, there are cases when you have a list of files, 1 per line, containing spaces. E.g. coming from a findor a pkgutilor similar. To work with xargsyou'll have to wrap the lines in quotes using sedfirst but this looks unwieldy.
相反,有些情况下您有一个包含空格的文件列表,每行 1 个。例如来自 afind或 apkgutil或类似的。要与xargs您合作,您必须sed首先使用引号将行括起来,但这看起来很笨拙。
With a while loop the script might look easier to read/write. And quoting of space-contaminated args is trivial. The example below is artificial but imagine getting the list of files from something other than find...
使用 while 循环,脚本可能看起来更容易读/写。引用受空间污染的参数是微不足道的。下面的例子是人为的,但想象一下从其他地方获取文件列表find...
function process {
while read line; do
test -d "$line" && echo "$line"
done
}
find . -name "*foo*" | process
回答by I don't know
I don't get it, people keep yammering on about how while MUST be execute in the loop instead of outside of the loop. I know very little on linux's side, but I know it is fairly simple to use MS-DOS's variables to build up a parameter list, or > file, cmd < file to build up a parameter list if you exceed the line length limitation.
我不明白,人们一直在争论 while 必须如何在循环中而不是在循环外执行。我对 linux 知之甚少,但我知道使用 MS-DOS 的变量来构建参数列表是相当简单的,或者如果超过行长度限制,则使用 > file, cmd < file 来构建参数列表。
Or are people saying that linux isn't as good as ms-dos? (Hell, I KNOW you can build chains because many bash scripts obviously are doing it, just not in loops).
还是有人说 linux 不如 ms-dos?(见鬼,我知道您可以构建链,因为许多 bash 脚本显然都在这样做,只是不在循环中)。
At this point, it becomes a matter of kernel limitations / preference. xargs isn't magical; piping does have advantages over string building (well, ms-dos; you could build the string out of "pointers" and avoid any copying (it's virtual memory after all, unless you are changing the data you can skip the expense in string concat... but piping is a more native support)). Actually, I don't think I can give it the advantage of parallel processing because you can easily create several tasked loops to review sliced data (which again, if you avoid copying, is a very fast action).
在这一点上,它成为内核限制/偏好的问题。xargs 并不神奇;管道确实比字符串构建有优势(好吧,ms-dos;您可以用“指针”构建字符串并避免任何复制(毕竟它是虚拟内存,除非您正在更改数据,否则您可以跳过字符串 concat 中的费用。 .. 但管道是一种更原生的支持))。实际上,我不认为我可以赋予它并行处理的优势,因为您可以轻松创建多个任务循环来查看切片数据(如果您避免复制,这也是一个非常快速的操作)。
In the end, xargs is more for inline commands, the speed advantage is negligable (the difference between compiled / interpreted string building) because everything it does, you can do via shell scripts.
最后,xargs 更适用于内联命令,速度优势可以忽略不计(编译/解释字符串构建之间的差异),因为它所做的一切,您都可以通过 shell 脚本来完成。

