Bash:限制并发作业的数量?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1537956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bash: limit the number of concurrent jobs?
提问by static_rtti
Is there an easy way to limit the number of concurrent jobs in bash? By that I mean making the & block when there are more then n concurrent jobs running in the background.
有没有一种简单的方法来限制 bash 中并发作业的数量?我的意思是在后台运行超过 n 个并发作业时制作 & 块。
I know I can implement this with ps | grep -style tricks, but is there an easier way?
我知道我可以用 ps 来实现这个 | grep 风格的技巧,但有更简单的方法吗?
采纳答案by Ole Tange
If you have GNU Parallel http://www.gnu.org/software/parallel/installed you can do this:
如果你安装了 GNU Parallel http://www.gnu.org/software/parallel/你可以这样做:
parallel gzip ::: *.log
which will run one gzip per CPU core until all logfiles are gzipped.
它将为每个 CPU 内核运行一个 gzip,直到所有日志文件都被 gzip 压缩。
If it is part of a larger loop you can use sem
instead:
如果它是较大循环的一部分,则可以sem
改用:
for i in *.log ; do
echo $i Do more stuff here
sem -j+0 gzip $i ";" echo done
done
sem --wait
It will do the same, but give you a chance to do more stuff for each file.
它会做同样的事情,但让你有机会为每个文件做更多的事情。
If GNU Parallel is not packaged for your distribution you can install GNU Parallel simply by:
如果 GNU Parallel 没有为您的发行版打包,您可以简单地通过以下方式安装 GNU Parallel:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 3374ec53bacb199b245af2dda86df6c9
12345678 3374ec53 bacb199b 245af2dd a86df6c9
$ md5sum install.sh | grep 029a9ac06e8b5bc6052eac57b2c3c9ca
029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
$ sha512sum install.sh | grep f517006d9897747bed8a4694b1acba1b
40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
$ bash install.sh
It will download, check signature, and do a personal installation if it cannot install globally.
如果无法全局安装,它将下载、检查签名并进行个人安装。
Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
观看 GNU Parallel 的介绍视频以了解更多信息:https: //www.youtube.com/playlist?list=PL284C9FF2488BC6D1
回答by tangens
A small bash script could help you:
一个小的 bash 脚本可以帮助你:
# content of script exec-async.sh
joblist=($(jobs -p))
while (( ${#joblist[*]} >= 3 ))
do
sleep 1
joblist=($(jobs -p))
done
$* &
If you call:
如果你打电话:
. exec-async.sh sleep 10
...four times, the first three calls will return immediately, the fourth call will block until there are less than three jobs running.
...四次,前三个调用将立即返回,第四个调用将阻塞,直到运行的作业少于三个。
You need to start this script inside the current session by prefixing it with .
, because jobs
lists only the jobs of the current session.
您需要通过在当前会话中添加前缀来启动此脚本.
,因为jobs
仅列出当前会话的作业。
The sleep
inside is ugly, but I didn't find a way to wait for the first job that terminates.
在sleep
里面是丑陋的,但我没有找到一个方法来等待第一个作业终止。
回答by paxdiablo
The following script shows a way to do this with functions. You can either put the bgxupdate
and bgxlimit
functions in your script or have them in a separate file which is sourced from your script with:
以下脚本显示了使用函数执行此操作的方法。您可以将bgxupdate
和bgxlimit
函数放在脚本中,也可以将它们放在来自脚本的单独文件中:
. /path/to/bgx.sh
It has the advantage that you can maintain multiple groups of processes independently (you can run, for example, one group with a limit of 10 and another totally separate group with a limit of 3).
它的优点是您可以独立维护多组进程(例如,您可以运行一个限制为 10 的组和另一个限制为 3 的完全独立的组)。
It used the bash
built-in, jobs
, to get a list of sub-processes but maintains them in individual variables. In the loop at the bottom, you can see how to call the bgxlimit
function:
它使用bash
内置的jobs
, 来获取子流程列表,但将它们保存在单独的变量中。在底部的循环中,您可以看到如何调用该bgxlimit
函数:
- set up an empty group variable.
- transfer that to
bgxgrp
. - call
bgxlimit
with the limit and command you want to run. - transfer the new group back to your group variable.
- 设置一个空的组变量。
- 将其转移到
bgxgrp
. bgxlimit
使用您要运行的限制和命令调用。- 将新组转移回您的组变量。
Of course, if you only have one group, just use bgxgrp
directly rather than transferring in and out.
当然,如果你只有一组,直接使用bgxgrp
,而不是转入转出。
#!/bin/bash
# bgxupdate - update active processes in a group.
# Works by transferring each process to new group
# if it is still active.
# in: bgxgrp - current group of processes.
# out: bgxgrp - new group of processes.
# out: bgxcount - number of processes in new group.
bgxupdate() {
bgxoldgrp=${bgxgrp}
bgxgrp=""
((bgxcount = 0))
bgxjobs=" $(jobs -pr | tr '\n' ' ')"
for bgxpid in ${bgxoldgrp} ; do
echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1
if [[ $? -eq 0 ]] ; then
bgxgrp="${bgxgrp} ${bgxpid}"
((bgxcount = bgxcount + 1))
fi
done
}
# bgxlimit - start a sub-process with a limit.
# Loops, calling bgxupdate until there is a free
# slot to run another sub-process. Then runs it
# an updates the process group.
# in: - the limit on processes.
# in: + - the command to run for new process.
# in: bgxgrp - the current group of processes.
# out: bgxgrp - new group of processes
bgxlimit() {
bgxmax= ; shift
bgxupdate
while [[ ${bgxcount} -ge ${bgxmax} ]] ; do
sleep 1
bgxupdate
done
if [[ "" != "-" ]] ; then
$* &
bgxgrp="${bgxgrp} $!"
fi
}
# Test program, create group and run 6 sleeps with
# limit of 3.
group1=""
echo 0 $(date | awk '{print }') '[' ${group1} ']'
echo
for i in 1 2 3 4 5 6 ; do
bgxgrp=${group1} ; bgxlimit 3 sleep ${i}0 ; group1=${bgxgrp}
echo ${i} $(date | awk '{print }') '[' ${group1} ']'
done
# Wait until all others are finished.
echo
bgxgrp=${group1} ; bgxupdate ; group1=${bgxgrp}
while [[ ${bgxcount} -ne 0 ]] ; do
oldcount=${bgxcount}
while [[ ${oldcount} -eq ${bgxcount} ]] ; do
sleep 1
bgxgrp=${group1} ; bgxupdate ; group1=${bgxgrp}
done
echo 9 $(date | awk '{print }') '[' ${group1} ']'
done
Here's a sample run:
这是一个示例运行:
0 12:38:00 [ ]
1 12:38:00 [ 3368 ]
2 12:38:00 [ 3368 5880 ]
3 12:38:00 [ 3368 5880 2524 ]
4 12:38:10 [ 5880 2524 1560 ]
5 12:38:20 [ 2524 1560 5032 ]
6 12:38:30 [ 1560 5032 5212 ]
9 12:38:50 [ 5032 5212 ]
9 12:39:10 [ 5212 ]
9 12:39:30 [ ]
- The whole thing starts at 12:38:00 and, as you can see, the first three processes run immediately.
- Each process sleeps for
n*10
seconds so the fourth process doesn't start until the first exits (at time t=10 or 12:38:10). You can see that process 3368 has disappeared from the list before 1560 is added. - Similarly, the fifth process (5032) starts when the second (5880) exits at time t=20.
- And finally, the sixth process (5212) starts when the third (2524) exits at time t=30.
- Then the rundown begins, fourth process exits at t=50 (started at 10, duration of 40), fifth at t=70 (started at 20, duration of 50) and sixth at t=90 (started at 30, duration of 60).
- 整个过程从 12:38:00 开始,如您所见,前三个进程立即运行。
- 每个进程都会休眠
n*10
几秒钟,因此第四个进程在第一个进程退出之前不会启动(在时间 t=10 或 12:38:10)。可以看到进程3368在添加1560之前已经从列表中消失了。 - 类似地,当第二个进程 (5880) 在时间 t=20 退出时,第五个进程 (5032) 开始。
- 最后,当第三个进程 (2524) 在时间 t=30 退出时,第六个进程 (5212) 开始。
- 然后运行开始,第四个进程在 t=50 处退出(从 10 开始,持续时间为 40),第五个进程在 t=70(从 20 开始,持续时间为 50)和第六个进程在 t=90(从 30 开始,持续时间为 60 )。
Or, in time-line form:
或者,以时间线形式:
Process: 1 2 3 4 5 6
-------- - - - - - -
12:38:00 ^ ^ ^
12:38:10 v | | ^
12:38:20 v | | ^
12:38:30 v | | ^
12:38:40 | | |
12:38:50 v | |
12:39:00 | |
12:39:10 v |
12:39:20 |
12:39:30 v
回答by Scarabeetle
Here's the shortest way:
这是最短的方法:
waitforjobs() {
while test $(jobs -p | wc -w) -ge ""; do wait -n; done
}
Call this function before forking off any new job:
在分叉任何新工作之前调用此函数:
waitforjobs 10
run_another_job &
To have as many background jobs as cores on the machine, use $(nproc)
instead of a fixed number like 10.
要在机器上拥有与内核一样多的后台作业,请使用$(nproc)
而不是像 10 这样的固定数字。
回答by Aaron McDaid
Assuming you'd like to write code like this:
假设你想写这样的代码:
for x in $(seq 1 100); do # 100 things we want to put into the background.
max_bg_procs 5 # Define the limit. See below.
your_intensive_job &
done
Where max_bg_procs
should be put in your .bashrc
:
哪里max_bg_procs
应该放在你的.bashrc
:
function max_bg_procs {
if [[ $# -eq 0 ]] ; then
echo "Usage: max_bg_procs NUM_PROCS. Will wait until the number of background (&)"
echo " bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
return
fi
local max_number=$((0 + ${1:-0}))
while true; do
local current_number=$(jobs -pr | wc -l)
if [[ $current_number -lt $max_number ]]; then
break
fi
sleep 1
done
}
回答by user3769065
The following function (developed from tangens answer above, either copy into script or source from file):
以下函数(从上面的 tangens 答案开发,复制到脚本或从文件源):
job_limit () {
# Test for single positive integer input
if (( $# == 1 )) && [[ =~ ^[1-9][0-9]*$ ]]
then
# Check number of running jobs
joblist=($(jobs -rp))
while (( ${#joblist[*]} >= ))
do
# Wait for any job to finish
command='wait '${joblist[0]}
for job in ${joblist[@]:1}
do
command+=' || wait '$job
done
eval $command
joblist=($(jobs -rp))
done
fi
}
1) Only requires inserting a single line to limit an existing loop
1) 只需要插入一行即可限制现有循环
while :
do
task &
job_limit `nproc`
done
2) Waits on completion of existing background tasks rather than polling, increasing efficiency for fast tasks
2) 等待现有后台任务完成而不是轮询,提高快速任务的效率
回答by Mark Rushakoff
If you're willing to do this outside of pure bash, you should look into a job queuing system.
如果您愿意在纯 bash 之外执行此操作,则应该研究作业排队系统。
For instance, there's GNU queueor PBS. And for PBS, you might want to look into Mauifor configuration.
例如,有GNU 队列或PBS。对于 PBS,您可能需要查看Maui进行配置。
Both systems will require some configuration, but it's entirely possible to allow a specific number of jobs to run at once, only starting newly queued jobs when a running job finishes. Typically, these job queuing systems would be used on supercomputing clusters, where you would want to allocate a specific amount of memory or computing time to any given batch job; however, there's no reason you can't use one of these on a single desktop computer without regard for compute time or memory limits.
两个系统都需要一些配置,但完全有可能允许一次运行特定数量的作业,只有在正在运行的作业完成时才启动新排队的作业。通常,这些作业排队系统将用于超级计算集群,您可能希望为任何给定的批处理作业分配特定数量的内存或计算时间;但是,如果不考虑计算时间或内存限制,您没有理由不能在一台台式计算机上使用其中之一。
回答by cat
This might be good enough for most purposes, but is not optimal.
这对于大多数用途来说可能已经足够了,但不是最佳的。
#!/bin/bash
n=0
maxjobs=10
for i in *.m4a ; do
# ( DO SOMETHING ) &
# limit jobs
if (( $(($((++n)) % $maxjobs)) == 0 )) ; then
wait # wait until all have finished (not optimal, but most times good enough)
echo $n wait
fi
done
回答by Tomas M
It is hard to do without wait -n (for example, shell in busybox does not support it). So here is a workaround, it is not optimal because it calls 'jobs' and 'wc' commands 10x per second. You can reduce the calls to 1x per second for example, if you don't mind waiting a bit longer for each job to complete.
没有wait -n很难做到(例如busybox中的shell不支持它)。所以这是一个解决方法,它不是最佳的,因为它每秒调用 'jobs' 和 'wc' 命令 10 次。例如,如果您不介意为每个作业完成等待更长的时间,您可以将调用减少到每秒 1 次。
# = maximum concurent jobs
#
limit_jobs()
{
while true; do
if [ "$(jobs -p | wc -l)" -lt "" ]; then break; fi
usleep 100000
done
}
# and now start some tasks:
task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
wait
回答by Tuttle
On Linux I use this to limit the bash jobs to the number of available CPUs (possibly overriden by setting the CPU_NUMBER
).
在 Linux 上,我使用它来将 bash 作业限制为可用 CPU 的数量(可能通过设置 来覆盖CPU_NUMBER
)。
[ "$CPU_NUMBER" ] || CPU_NUMBER="`nproc 2>/dev/null || echo 1`"
while [ "" ]; do
{
do something
with
in parallel
echo "[$# items left] done"
} &
while true; do
# load the PIDs of all child processes to the array
joblist=(`jobs -p`)
if [ ${#joblist[*]} -ge "$CPU_NUMBER" ]; then
# when the job limit is reached, wait for *single* job to finish
wait -n
else
# stop checking when we're below the limit
break
fi
done
# it's great we executed zero external commands to check!
shift
done
# wait for all currently active child processes
wait