如何在 Bash 中并行运行给定的函数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17307800/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 23:53:01  来源:igfitidea点击:

How to run given function in Bash in parallel?

bashparallel-processing

提问by Ole Tange

There have been some similar questions, but my problem is not "run several programs in parallel" - which can be trivially done with parallelor xargs.

有一些类似的问题,但我的问题不是“并行运行多个程序”——这可以用parallel或轻松完成xargs

I need to parallelize Bash functions.

我需要并行化 Bash 函数。

Let's imagine code like this:

让我们想象一下这样的代码:

for i in "${list[@]}"
do
    for j in "${other[@]}"
    do
    # some processing in here - 20-30 lines of almost pure bash
    done
done

Some of the processing requires calls to external programs.

一些处理需要调用外部程序。

I'd like to run some (4-10) tasks, each running for different $i. Total number of elements in $list is > 500.

我想运行一些(4-10)个任务,每个任务运行不同的$i. $list 中的元素总数 > 500。

I know I can put the whole for j ... doneloop in external script, and just call this program in parallel, but is it possible to do without splitting the functionality between two separate programs?

我知道我可以将整个for j ... done循环放在外部脚本中,然后并行调用这个程序,但是是否可以不将功能拆分到两个单独的程序之间?

采纳答案by that other guy

Edit: Please consider Ole's answerinstead.

编辑:请考虑Ole 的回答

Instead of a separate script, you can put your code in a separate bash function. You can then export it, and run it via xargs:

您可以将代码放在单独的 bash 函数中,而不是单独的脚本。然后你可以导出它,并通过 xargs 运行它:

#!/bin/bash
dowork() { 
    sleep $((RANDOM % 10 + 1))
    echo "Processing i=, j="
}
export -f dowork

for i in "${list[@]}"
do
    for j in "${other[@]}"
    do
        printf "%s
for i in "${list[@]}"
do
    for j in "${other[@]}"
    do
        # some processing in here - 20-30 lines of almost pure bash
        sem -j 4 dolong task
    done
done
%s
dowork() { 
  echo "Starting i=, j="
  sleep 5
  echo "Done i=, j="
}
export -f dowork

parallel dowork ::: "${list[@]}" ::: "${other[@]}"
" "$i" "$j" done done | xargs -0 -n 2 -P 4 bash -c 'dowork "$@"' --

回答by Ole Tange

semis part of GNU Paralleland is made for this kind of situation.

semGNU Parallel 的一部分,专为这种情况而设计。

for ...your_loop...; do
  test "$(jobs | wc -l)" -ge 8 && wait -n || true  # wait if needed

  {
    any bash commands here
  } &
done
wait

If you like the function better GNU Parallel can do the dual for loop in one go:

如果你更喜欢这个功能,GNU Parallel 可以一次性完成双重 for 循环:

for i in "${list[@]}"
do
    for j in "${other[@]}"
    do
        test "$(jobs | wc -l)" -ge 8 && wait -n || true
        {
            your
            multi-line
            commands
            here
        } &
    done
done
wait

回答by VasiliNovikov

Solution to run multi-line commands in parallel:

并行运行多行命令的解决方案:

##代码##

In your case:

在你的情况下:

##代码##

If there are 8 bash jobs already running, waitwill wait for at least one job to complete. If/when there are less jobs, it starts new ones asynchronously.

如果已经有 8 个 bash 作业正在运行,wait则将等待至少一个作业完成。如果/当作业较少时,它会异步启动新作业。

Benefits of this approach:

这种方法的好处:

  1. It's very easy for multi-line commands. All your variables are automatically "captured" in scope, no need to pass them around as arguments
  2. It's relatively fast. Compare this, for example, to parallel (I'm quoting official man):

    parallel is slow at starting up - around 250 ms the first time and 150 ms after that.

  3. Only needs bashto work.
  1. 多行命令非常容易。您的所有变量都会在范围内自动“捕获”,无需将它们作为参数传递
  2. 它相对较快。例如,将此与并行(我引用官方man)进行比较:

    并行启动时很慢 - 第一次大约 250 毫秒,之后大约 150 毫秒。

  3. 只需要bash工作。

Downsides:

缺点:

  1. There is a possibility that there were 8 jobs when we counted them, but less when we started waiting. (It happens if a jobs finishes in those milliseconds between the two commands.) This can make us waitwith fewer jobs than required. However, it will resume when at least one job completes, or immediately if there are 0 jobs running (wait -nexits immediately in this case).
  2. If you already have some commands running asynchronously (&) within the same bash script, you'll have fewer worker processes in the loop.
  1. 有可能当我们计算它们时有 8 个工作,但当我们开始等待时就更少了。(如果作业在两个命令之间的那几毫秒内完成,就会发生这种情况。)这会使我们wait的作业比所需的少。但是,它会在至少一个作业完成时恢复,或者如果有 0 个作业正在运行(wait -n在这种情况下立即退出),它将立即恢复。
  2. 如果您已经&在同一个 bash 脚本中异步运行了一些命令 ( ),那么循环中的工作进程就会更少。