并行运行 shell 脚本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5547787/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 00:37:08  来源:igfitidea点击:

Running shell script in parallel

linuxbashshellunixparallel-processing

提问by Tony

I have a shell script which

我有一个shell脚本

  1. shuffles a large text file (6 million rows and 6 columns)
  2. sorts the file based the first column
  3. outputs 1000 files
  1. 打乱一个大文本文件(600 万行和 6 列)
  2. 根据第一列对文件进行排序
  3. 输出 1000 个文件

So the pseudocode looks like this

所以伪代码看起来像这样

file1.sh 

#!/bin/bash
for i in $(seq 1 1000)
do

  Generating random numbers here , sorting  and outputting to file$i.txt  

done

Is there a way to run this shell script in parallelto make full use of multi-core CPUs?

有没有办法运行这个shell脚本parallel来充分利用多核CPU?

At the moment, ./file1.shexecutes in sequence 1 to 1000 runs and it is very slow.

在这一刻, 。/file1.sh按顺序执行 1 到 1000 次运行,而且速度非常慢。

Thanks for your help.

谢谢你的帮助。

采纳答案by Anders Lindahl

Check out bash subshells, these can be used to run parts of a script in parallel.

查看bash subshel​​ls,它们可用于并行运行脚本的一部分。

I haven't tested this, but this could be a start:

我还没有测试过这个,但这可能是一个开始:

#!/bin/bash
for i in $(seq 1 1000)
do
   ( Generating random numbers here , sorting  and outputting to file$i.txt ) &
   if (( $i % 10 == 0 )); then wait; fi # Limit to 10 concurrent subshells.
done
wait

回答by Tony Delroy

To make things run in parallel you use '&' at the end of a shell command to run it in the background, then waitwill by default (i.e. without arguments) wait until all background processes are finished. So, maybe kick off 10 in parallel, then wait, then do another ten. You can do this easily with two nested loops.

为了使事情并行运行,您在 shell 命令的末尾使用“&”在后台运行它,然后wait默认情况下(即不带参数)等待所有后台进程完成。所以,也许并行开始 10 次,然后等待,然后再做 10 次。您可以使用两个嵌套循环轻松完成此操作。

回答by Jonathan Dursi

Another very handy way to do this is with gnu parallel, which is well worth installing if you don't already have it; this is invaluable if the tasks don't necessarily take the same amount of time.

另一种非常方便的方法是使用gnu parallel,如果您还没有它,那么非常值得安装;如果任务不一定需要相同的时间,这是非常宝贵的。

seq 1000 | parallel -j 8 --workdir $PWD ./myrun {}

will launch ./myrun 1, ./myrun 2, etc, making sure 8 jobs at a time are running. It can also take lists of nodes if you want to run on several nodes at once, eg in a PBS job; our instructions to our users for how to do that on our system are here.

将启动./myrun 1./myrun 2等,确保一次运行 8 个作业。如果您想同时在多个节点上运行,它也可以获取节点列表,例如在 PBS 作业中;我们向用户提供的有关如何在我们的系统上执行操作的说明位于此处

Updated to add:You want to make sure you're using gnu-parallel, not the more limited utility of the same name that comes in the moreutils package (the divergent history of the two is described here.)

更新添加:您要确保使用的是 gnu-parallel,而不是 moreutils 包中同名的更有限的实用程序(此处描述了两者的不同历史。)

回答by Eric O Lebigot

There is a simple, portable program that does just this for you: PPSS. PPSS automatically schedules jobs for you, by checking how many cores are available and launching another job every time another one just finished.

有一个简单、可移植的程序可以为您完成此任务:PPSS。PPSS 会自动为您安排作业,方法是检查有多少可用内核,并在每次完成另一个作业时启动另一个作业。

回答by Eric O Lebigot

There is a whole list of programsthat can run jobs in parallel from a shell, which even includes comparisons between them, in the documentation for GNU parallel. There are many, many solutions out there. Another good news is that they are probably quite efficient at scheduling jobs so that all the cores/processors are kept busy at all times.

在GNU parallel 的文档中,有一整套程序可以从 shell 并行运行作业,其中甚至包括它们之间的比较。有很多很多解决方案。另一个好消息是,它们在调度作业方面可能非常有效,因此所有内核/处理器始终保持忙碌状态。

回答by Bash Coder

generating random numbers is easy. suppose u got a huge file like a shop database and u want to rewrite that file on some specific basis. My idea was to calculate number of cores, split file into how many cores, make a script.cfg file , split.sh and recombine.sh split.sh will split file in how many cores, clone script.cfg ( script that changes stuff in that huge files), clone script.cgf in how many cores, make them executable, search and replace in clones some variables that have to know what part of the file to process and run them in background when a clone is done generate a clone$core.ok file, so when all clones are done will tell to a loop to recombine partial results into a single one only when all .ok files are generated. it can be done with " wait" but i fancy my way

生成随机数很容易。假设您有一个像商店数据库这样的大文件,并且您想在某些特定的基础上重写该文件。我的想法是计算内核数,将文件拆分为多少个内核,制作一个 script.cfg 文件,split.sh 和 recombine.sh split.sh 将在多少个内核中拆分文件,克隆 script.cfg(更改内容的脚本)在那个巨大的文件中),在多少个内核中克隆 script.cgf,使它们可执行,在克隆中搜索和替换一些变量,这些变量必须知道要处理文件的哪一部分并在克隆完成后在后台运行它们生成克隆$core.ok 文件,因此当所有克隆完成后,将告诉循环仅在生成所有 .ok 文件时将部分结果重新组合成一个。它可以用“等待”来完成,但我喜欢我的方式

http://www.linux-romania.com/product.php?id_product=76look at the bottom ,is partially translated in EN in this way i can procces 20000 articles with 16 columns in 2 minutes(quad core) instead of 8(single core) You have to care about CPU temperature, coz all cores are running at 100%

http://www.linux-romania.com/product.php?id_product=76看底部,部分翻译成英文,这样我可以在 2 分钟内处理 20000 篇 16 列的文章(四核)而不是 8 (单核)您必须关心 CPU 温度,因为所有内核都以 100% 运行

回答by Zakaria

IDLE_CPU=1
NCPU=$(nproc)

int_childs() {
    trap - INT
    while IFS=$'\n' read -r pid; do
        kill -s SIGINT -$pid
    done < <(jobs -p -r)
    kill -s SIGINT -$$
}

# cmds is array that hold commands
# the complex thing is display which will handle all cmd output
# and serialized it correctly

trap int_childs INT
{
    exec 2>&1
    set -m

    if [ $NCPU -gt $IDLE_CPU ]; then
        for cmd in "${cmds[@]}"; do
            $cmd &
            while [ $(jobs -pr |wc -l) -ge $((NCPU - IDLE_CPU)) ]; do
                wait -n
            done
        done
        wait

    else
        for cmd in "${cmds[@]}"; do
            $cmd
        done
    fi
} | display

回答by jreisinger

You might wanna take a look at runp. runpis a simple command line tool that runs (shell) commands in parallel. It's useful when you want to run multiple commands at once to save time. It's easy to install since it's a single binary. It's been tested on Linux (amd64 and arm) and MacOS/darwin (amd64).

你可能想看看runprunp是一个简单的命令行工具,可以并行运行(shell)命令。当您想一次运行多个命令以节省时间时,它很有用。它很容易安装,因为它是一个二进制文件。它已经在 Linux(amd64 和 arm)和 MacOS/darwin(amd64)上进行了测试。