如何编写进程池 bash shell

Question

提问by Sili

I have more than 10 tasks to execute, and the system restrict that there at most 4 tasks can run at the same time.

我有10多个任务要执行，系统限制最多同时运行4个任务。

My task can be started like: myprog taskname

我的任务可以这样开始：myprog taskname

How can I write a bash shell script to run these task. The most important thing is that when one task finish, the script can start another immediately, making the running tasks count remain 4 all the time.

如何编写 bash shell 脚本来运行这些任务。最重要的是，当一个任务完成时，脚本可以立即启动另一个任务，使正在运行的任务计数始终保持为 4。

Answer 1

采纳答案by thelazyenginerd

I chanced upon this thread while looking into writing my own process pool and particularly liked Brandon Horsley's solution, though I couldn't get the signals working right, so I took inspiration from Apache and decided to try a pre-fork model with a fifo as my job queue.

我在考虑编写自己的进程池时偶然发现了这个线程，并且特别喜欢 Brandon Horsley 的解决方案，尽管我无法使信号正常工作，所以我从 Apache 中获得灵感，并决定尝试一个带有 fifo 的 pre-fork 模型作为我的工作队列。

The following function is the function that the worker processes run when forked.

以下函数是工作进程在 fork 时运行的函数。

# \brief the worker function that is called when we fork off worker processes
# \param[in] id  the worker ID
# \param[in] job_queue  the fifo to read jobs from
# \param[in] result_log  the temporary log file to write exit codes to
function _job_pool_worker()
{
    local id=
    local job_queue=
    local result_log=
    local line=

    exec 7<> ${job_queue}
    while [[ "${line}" != "${job_pool_end_of_jobs}" && -e "${job_queue}" ]]; do
        # workers block on the exclusive lock to read the job queue
        flock --exclusive 7
        read line <${job_queue}
        flock --unlock 7
        # the worker should exit if it sees the end-of-job marker or run the
        # job otherwise and save its exit code to the result log.
        if [[ "${line}" == "${job_pool_end_of_jobs}" ]]; then
            # write it one more time for the next sibling so that everyone
            # will know we are exiting.
            echo "${line}" >&7
        else
            _job_pool_echo "### _job_pool_worker-${id}: ${line}"
            # run the job
            { ${line} ; } 
            # now check the exit code and prepend "ERROR" to the result log entry
            # which we will use to count errors and then strip out later.
            local result=$?
            local status=
            if [[ "${result}" != "0" ]]; then
                status=ERROR
            fi  
            # now write the error to the log, making sure multiple processes
            # don't trample over each other.
            exec 8<> ${result_log}
            flock --exclusive 8
            echo "${status}job_pool: exited ${result}: ${line}" >> ${result_log}
            flock --unlock 8
            exec 8>&-
            _job_pool_echo "### _job_pool_worker-${id}: exited ${result}: ${line}"
        fi  
    done
    exec 7>&-
}

You can get a copy of my solutionat Github. Here's a sample program using my implementation.

您可以在 Github 上获取我的解决方案的副本。这是使用我的实现的示例程序。

#!/bin/bash

. job_pool.sh

function foobar()
{
    # do something
    true
}   

# initialize the job pool to allow 3 parallel jobs and echo commands
job_pool_init 3 0

# run jobs
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run sleep 3
job_pool_run foobar
job_pool_run foobar
job_pool_run /bin/false

# wait until all jobs complete before continuing
job_pool_wait

# more jobs
job_pool_run /bin/false
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run foobar

# don't forget to shut down the job pool
job_pool_shutdown

# check the $job_pool_nerrors for the number of jobs that exited non-zero
echo "job_pool_nerrors: ${job_pool_nerrors}"

Hope this helps!

希望这可以帮助！

Answer 2

回答by Parag Sarda

Use xargs:

使用xargs：

xargs -P <maximun-number-of-process-at-a-time> -n <arguments per process> <commnad>

Details here.

详情请看这里。

Answer 3

回答by Ole Tange

Using GNU Parallel you can do:

使用 GNU Parallel 您可以执行以下操作：

cat tasks | parallel -j4 myprog

If you have 4 cores, you can even just do:

如果你有 4 个内核，你甚至可以这样做：

cat tasks | parallel myprog

From http://git.savannah.gnu.org/cgit/parallel.git/tree/README:

从http://git.savannah.gnu.org/cgit/parallel.git/tree/README：

Full installation

完全安装

Full installation of GNU Parallel is as simple as:

完全安装 GNU Parallel 非常简单：

./configure && make && make install

Personal installation

个人安装

If you are not root you can add ~/bin to your path and install in ~/bin and ~/share:

如果您不是 root，您可以将 ~/bin 添加到您的路径并安装在 ~/bin 和 ~/share 中：

./configure --prefix=$HOME && make && make install

Or if your system lacks 'make' you can simply copy src/parallel src/sem src/niceload src/sql to a dir in your path.

或者，如果您的系统缺少“make”，您可以简单地将 src/parallel src/sem src/niceload src/sql 复制到路径中的目录。

Minimal installation

最少安装

If you just need parallel and do not have 'make' installed (maybe the system is old or Microsoft Windows):

如果您只需要并行并且没有安装“make”（可能系统是旧的或 Microsoft Windows）：

wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
mv parallel sem dir-in-your-$PATH/bin/

Test the installation

测试安装

After this you should be able to do:

在此之后，您应该能够执行以下操作：

parallel -j0 ping -nc 3 ::: foss.org.my gnu.org freenetproject.org

This will send 3 ping packets to 3 different hosts in parallel and print the output when they complete.

这将并行向 3 个不同的主机发送 3 个 ping 数据包，并在它们完成时打印输出。

Watch the intro video for a quick introduction: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

观看介绍视频以进行快速介绍：https: //www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Answer 4

回答by Zhehao Mao

I would suggest writing four scripts, each one of which executes a certain number of tasks in series. Then write another script that starts the four scripts in parallel. For instance, if you have scripts, script1.sh, script2.sh, script3.sh, and script4.sh, you could have a script called headscript.sh like so.

我建议编写四个脚本，每个脚本依次执行一定数量的任务。然后编写另一个脚本，并行启动四个脚本。例如，如果您有脚本、script1.sh、script2.sh、script3.sh 和 script4.sh，那么您可以像这样拥有一个名为 headscript.sh 的脚本。

#!/bin/sh
./script1.sh & 
./script2.sh & 
./script3.sh & 
./script4.sh &

Answer 5

回答by zurfyx

Following @Parag Sardas'answer and the documentation linked here's a quick script you might want to add on your .bash_aliases.

按照@Parag Sardas 的回答和此处链接的文档，您可能希望在.bash_aliases.

Relinking the doc linkbecause it's worth a read

重新链接文档链接，因为它值得一读

#!/bin/bash
# https://stackoverflow.com/a/19618159
# https://stackoverflow.com/a/51861820
#
# Example file contents:
# touch /tmp/a.txt
# touch /tmp/b.txt

if [ "$#" -eq 0 ];  then
  echo "#!/usr/local/bin/bash

this_pid="$$"
jobs_running=0
sleep_pid=

# Catch alarm signals to adjust the number of running jobs
trap 'decrement_jobs' SIGALRM

# When a job finishes, decrement the total and kill the sleep process
decrement_jobs()
{
  jobs_running=$(($jobs_running - 1))
  if [ -n "${sleep_pid}" ]
  then
    kill -s SIGKILL "${sleep_pid}"
    sleep_pid=
  fi
}

# Check to see if the max jobs are running, if so sleep until woken
launch_task()
{
  if [ ${jobs_running} -gt 3 ]
  then
    (
      while true
      do
        sleep 999
      done
    ) &
    sleep_pid=$!
    wait ${sleep_pid}
  fi

  # Launch the requested task, signalling the parent upon completion
  (
    "$@"
    kill -s SIGALRM "${this_pid}"
  ) &
  jobs_running=$((${jobs_running} + 1))
}

# Launch all of the tasks, this can be in a loop, etc.
launch_task task1
launch_task tast2
...
launch_task task99
 <file> [max-procs=0]"
  exit 1
fi

FILE=
MAX_PROCS=${2:-0}
cat $FILE | while read line; do printf "%q\n" "$line"; done | xargs --max-procs=$MAX_PROCS -I CMD bash -c CMD

I.e. ./xargs-parallel.sh jobs.txt 4maximum of 4 processes read from jobs.txt

即 ./xargs-parallel.sh jobs.txt 4最多 4 个进程从 jobs.txt 中读取

Answer 6

回答by Brandon Horsley

You could probably do something clever with signals.

你可能可以用信号做一些聪明的事情。

Note this is only to illustrate the concept, and thus not thoroughly tested.

请注意，这只是为了说明这个概念，因此没有经过彻底的测试。

#!/usr/bin/bash

set -o monitor
trap "pkill -P $$ -f 'sleep 10\.9' >&/dev/null" SIGCHLD

totaljobs=15
numjobs=5
worktime=10
curjobs=0
declare -A pidlist

dojob()
{
  slot=
  time=$(echo "$RANDOM * 10 / 32768" | bc -l)
  echo Starting job $slot with args $time
  sleep $time &
  pidlist[$slot]=`jobs -p %%`
  curjobs=$(($curjobs + 1))
  totaljobs=$(($totaljobs - 1))
}

# start
while [ $curjobs -lt $numjobs -a $totaljobs -gt 0 ]
 do
  dojob $curjobs
 done

# Poll for jobs to die, restarting while we have them
while [ $totaljobs -gt 0 ]
 do
  for ((i=0;$i < $curjobs;i++))
   do
    if ! kill -0 ${pidlist[$i]} >&/dev/null
     then
      dojob $i
      break
     fi
   done
   sleep 10.9 >&/dev/null
 done
wait

Answer 7

回答by Seth Robertson

This tested script runs 5 jobs at a time and will restart a new job as soon as it does (due to the kill of the sleep 10.9 when we get a SIGCHLD. A simpler version of this could use direct polling (change the sleep 10.9 to sleep 1 and get rid of the trap).

这个经过测试的脚本一次运行 5 个作业，并且会尽快重新启动一个新作业（由于当我们获得 SIGCHLD 时会杀死 sleep 10.9。这个更简单的版本可以使用直接轮询（将 sleep 10.9 更改为sleep 1 并摆脱陷阱）。

ln -s executable1 ./01-task.01

Answer 8

回答by Alex Gitelman

Other answer about 4 shell scripts does not fully satisfies me as it assumes that all tasks take approximatelu the same time and because it requires manual set up. But here is how I would improve it.

关于 4 个 shell 脚本的其他答案并不完全让我满意，因为它假设所有任务都大约在同一时间执行，并且因为它需要手动设置。但这是我将如何改进它。

Main script will create symbolic links to executables following certain namimg convention. For example,

主脚本将按照特定的 namimg 约定创建指向可执行文件的符号链接。例如，

for t in $(ls ./*-task.$batch | sort ; do
   t
   rm t
done

first prefix is for sorting and suffix identifies batch (01-04). Now we spawn 4 shell scripts that would take batch number as input and do something like this

第一个前缀用于排序，后缀标识批次（01-04）。现在我们生成 4 个 shell 脚本，它们将批次号作为输入并执行如下操作

./jp.sh "My Download Pool" 3 curl http://site1/...
./jp.sh "My Download Pool" 3 curl http://site2/...
./jp.sh "My Download Pool" 3 curl http://site3/...
...

Answer 9

回答by Michael Spector

Look at my implementation of job pool in bash: https://github.com/spektom/shell-utils/blob/master/jp.sh

看看我在 bash 中工作池的实现：https: //github.com/spektom/shell-utils/blob/master/jp.sh

For example, to run at most 3 processes of cURL when downloading from a lot of URLs, you can wrap your cURL commands as follows:

例如，从大量 URL 下载时，最多运行 3 个 cURL 进程，您可以将 cURL 命令包装如下：

function task() {
    local task_no=""
    # doing the actual task...
    echo "Executing Task ${task_no}"
    # which takes a long time
    sleep 1
}

function execute_concurrently() {
    local tasks=""
    local ps_pool_size=""

    # create an anonymous fifo as a Semaphore
    local sema_fifo
    sema_fifo="$(mktemp -u)"
    mkfifo "${sema_fifo}"
    exec 3<>"${sema_fifo}"
    rm -f "${sema_fifo}"

    # every 'x' stands for an available resource
    for i in $(seq 1 "${ps_pool_size}"); do
        echo 'x' >&3
    done

    for task_no in $(seq 1 "${tasks}"); do
        read dummy <&3 # blocks util a resource is available
        (
            trap 'echo x >&3' EXIT # returns the resource on exit
            task "${task_no}"
        )&
    done
    wait # wait util all forked tasks have finished
}

execute_concurrently 10 4

Answer 10

回答by Wenhao Ji

Here is my solution. The idea is quite simple. I create a fifoas a semaphore, where each line stands for an available resource. When reading the queue, the main process blocks if there is nothing left. And, we return the resource after the task is done by simply echoing anything to the queue.

这是我的解决方案。这个想法很简单。我创建了fifo一个信号量，其中每一行代表一个可用资源。在read队列中，如果没有任何东西，主进程就会阻塞。并且，我们在任务完成后通过简单地将echo任何内容放入队列来返回资源。

##代码##

The script above will run 10 tasks and 4 each time concurrently. You can change the $(seq 1 "${tasks}")sequence to the actual task queue you want to run.

上面的脚本将同时运行 10 个任务和 4 个任务。您可以将$(seq 1 "${tasks}")顺序更改为要运行的实际任务队列。

如何编写进程池 bash shell

提问by Sili

采纳答案by thelazyenginerd

回答by Parag Sarda

回答by Ole Tange

Full installation

完全安装

Personal installation

个人安装

Minimal installation

最少安装

Test the installation

测试安装

回答by Zhehao Mao

回答by zurfyx

回答by Brandon Horsley

回答by Seth Robertson

回答by Alex Gitelman

回答by Michael Spector

回答by Wenhao Ji

相关推荐

最近更新

标签

如何编写进程池 bash shell

提问by Sili

采纳答案by thelazyenginerd

回答by Parag Sarda

回答by Ole Tange

Full installation

完全安装

Personal installation

个人安装

Minimal installation

最少安装

Test the installation

测试安装

回答by Zhehao Mao

回答by zurfyx

回答by Brandon Horsley

回答by Seth Robertson

回答by Alex Gitelman

回答by Michael Spector

回答by Wenhao Ji

相关推荐

bash seq 替换中的变量 ({1..10})

bash Echo Control C 字符

bash iTerm 2：如何设置键盘快捷键以跳转到行首/行尾？

bash 如何将输出附加到文本文件的末尾

相关推荐

最近更新

标签