bash 使用 xargs 并行运行程序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28357997/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 21:52:07  来源:igfitidea点击:

Running programs in parallel using xargs

bashparallel-processingxargs

提问by Olivier

I currently have the current script.

我目前有当前的脚本。

#!/bin/bash
# script.sh

for i in {0..99}; do
   script-to-run.sh input/ output/ $i
done

I wish to run it in parallel using xargs. I have tried

我希望使用 xargs 并行运行它。我试过了

script.sh | xargs -P8

But doing the above only executed once at the time. No luck with -n8 as well. Adding & at the end of the line to be executed in the script for loop would try to run the script 99 times at once. How do I execute the loop only 8 at the time, up to 100 total.

但是执行上述操作一次只执行一次。-n8 也不走运。在要在脚本 for 循环中执行的行的末尾添加 & 将尝试一次运行脚本 99 次。我如何一次只执行 8 个循环,总共执行 100 个。

回答by Etan Reisner

From the xargsman page:

xargs手册页:

This manual page documents the GNU version of xargs. xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (default is /bin/echo) one or more times with any initial- arguments followed by items read from standard input. Blank lines on the standard input are ignored.

本手册页记录了 xargs 的 GNU 版本。xargs 从标准输入读取项目,以空格(可以用双引号或单引号或反斜杠保护)或换行符分隔,并执行命令(默认为 /bin/echo)一次或多次,后跟任何初始参数通过从标准输入读取的项目。标准输入上的空行被忽略。

Which means that for your example xargsis waiting and collecting all of the output from your script and then running echo <that output>. Not exactly all that useful nor what you wanted.

这意味着对于您的示例,xargs正在等待并收集脚本的所有输出,然后运行echo <that output>. 不是那么有用,也不是你想要的。

The -nargument is how many items from the input to use with each command that gets run (nothing, by itself, about parallelism here).

-n参数是如何从输入的许多项目与每个被运行(没什么,本身有关并行这里)命令使用。

To do what you want with xargsyou would need to do something more like this (untested):

要做你想做的xargs事情,你需要做更多这样的事情(未经测试):

printf %s\n {0..99} | xargs -n 1 -P 8 script-to-run.sh input/ output/

Which breaks down like this.

像这样崩溃了。

  • printf %s\\n {0..99}- Print one number per-line from 0to 99.
  • Run xargs
    • taking at mostone argument per run command line
    • and run up toeight processes at a time
  • printf %s\\n {0..99}- 从0到每行打印一个数字99
  • xargs
    • 最多每次运行命令行一个参数
    • 并且一次最多运行八个进程

回答by Ole Tange

With GNU Parallel you would do:

使用 GNU Parallel,您可以:

parallel script-to-run.sh input/ output/ {} ::: {0..99}

Add in -P8if you do notwant to run one job per CPU core.

在添加-P8如果你希望运行每个CPU核心一个作业。

Opposite xargsit will do The Right Thing, even if the input contain space, ', or " (not the case here, though). It also makes sure the output from different jobs are not mixed together, so if you use the output you are guaranteed that you will not get half-a-line from two different jobs.

相反,xargs它会做正确的事情,即使输入包含空格、' 或 "(但这里不是这种情况)。它还确保来自不同作业的输出不会混合在一起,因此如果您使用输出保证你不会从两个不同的工作中得到半条线。

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

GNU Parallel 是一个通用的并行器,可以很容易地在同一台机器上或在您有 ssh 访问权限的多台机器上并行运行作业。

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

如果您有 32 个不同的作业要在 4 个 CPU 上运行,一个直接的并行化方法是在每个 CPU 上运行 8 个作业:

Simple scheduling

简单的调度

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel 会在完成后生成一个新进程 - 保持 CPU 处于活动状态,从而节省时间:

GNU Parallel scheduling

GNU 并行调度

Installation

安装

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

如果没有为您的发行版打包 GNU Parallel,您可以进行个人安装,这不需要 root 访问权限。这样做可以在 10 秒内完成:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
   fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 3374ec53bacb199b245af2dda86df6c9
12345678 3374ec53 bacb199b 245af2dd a86df6c9
$ md5sum install.sh | grep 029a9ac06e8b5bc6052eac57b2c3c9ca
029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
$ sha512sum install.sh | grep f517006d9897747bed8a4694b1acba1b
40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
$ bash install.sh

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

有关其他安装选项,请参阅http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

了解更多

See more examples: http://www.gnu.org/software/parallel/man.html

查看更多示例:http: //www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

观看介绍视频:https: //www.youtube.com/playlist?list =PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

演练教程:http: //www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

注册电子邮件列表以获得支持:https: //lists.gnu.org/mailman/listinfo/parallel