如何编写进程池 bash shell
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6441509/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to write a process-pool bash shell
提问by Sili
I have more than 10 tasks to execute, and the system restrict that there at most 4 tasks can run at the same time.
我有10多个任务要执行,系统限制最多同时运行4个任务。
My task can be started like: myprog taskname
我的任务可以这样开始:myprog taskname
How can I write a bash shell script to run these task. The most important thing is that when one task finish, the script can start another immediately, making the running tasks count remain 4 all the time.
如何编写 bash shell 脚本来运行这些任务。最重要的是,当一个任务完成时,脚本可以立即启动另一个任务,使正在运行的任务计数始终保持为 4。
采纳答案by thelazyenginerd
I chanced upon this thread while looking into writing my own process pool and particularly liked Brandon Horsley's solution, though I couldn't get the signals working right, so I took inspiration from Apache and decided to try a pre-fork model with a fifo as my job queue.
我在考虑编写自己的进程池时偶然发现了这个线程,并且特别喜欢 Brandon Horsley 的解决方案,尽管我无法使信号正常工作,所以我从 Apache 中获得灵感,并决定尝试一个带有 fifo 的 pre-fork 模型作为我的工作队列。
The following function is the function that the worker processes run when forked.
以下函数是工作进程在 fork 时运行的函数。
# \brief the worker function that is called when we fork off worker processes
# \param[in] id the worker ID
# \param[in] job_queue the fifo to read jobs from
# \param[in] result_log the temporary log file to write exit codes to
function _job_pool_worker()
{
local id=
local job_queue=
local result_log=
local line=
exec 7<> ${job_queue}
while [[ "${line}" != "${job_pool_end_of_jobs}" && -e "${job_queue}" ]]; do
# workers block on the exclusive lock to read the job queue
flock --exclusive 7
read line <${job_queue}
flock --unlock 7
# the worker should exit if it sees the end-of-job marker or run the
# job otherwise and save its exit code to the result log.
if [[ "${line}" == "${job_pool_end_of_jobs}" ]]; then
# write it one more time for the next sibling so that everyone
# will know we are exiting.
echo "${line}" >&7
else
_job_pool_echo "### _job_pool_worker-${id}: ${line}"
# run the job
{ ${line} ; }
# now check the exit code and prepend "ERROR" to the result log entry
# which we will use to count errors and then strip out later.
local result=$?
local status=
if [[ "${result}" != "0" ]]; then
status=ERROR
fi
# now write the error to the log, making sure multiple processes
# don't trample over each other.
exec 8<> ${result_log}
flock --exclusive 8
echo "${status}job_pool: exited ${result}: ${line}" >> ${result_log}
flock --unlock 8
exec 8>&-
_job_pool_echo "### _job_pool_worker-${id}: exited ${result}: ${line}"
fi
done
exec 7>&-
}
You can get a copy of my solutionat Github. Here's a sample program using my implementation.
您可以在 Github 上获取我的解决方案的副本。这是使用我的实现的示例程序。
#!/bin/bash
. job_pool.sh
function foobar()
{
# do something
true
}
# initialize the job pool to allow 3 parallel jobs and echo commands
job_pool_init 3 0
# run jobs
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run sleep 3
job_pool_run foobar
job_pool_run foobar
job_pool_run /bin/false
# wait until all jobs complete before continuing
job_pool_wait
# more jobs
job_pool_run /bin/false
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run foobar
# don't forget to shut down the job pool
job_pool_shutdown
# check the $job_pool_nerrors for the number of jobs that exited non-zero
echo "job_pool_nerrors: ${job_pool_nerrors}"
Hope this helps!
希望这可以帮助!
回答by Parag Sarda
回答by Ole Tange
Using GNU Parallel you can do:
使用 GNU Parallel 您可以执行以下操作:
cat tasks | parallel -j4 myprog
If you have 4 cores, you can even just do:
如果你有 4 个内核,你甚至可以这样做:
cat tasks | parallel myprog
From http://git.savannah.gnu.org/cgit/parallel.git/tree/README:
从http://git.savannah.gnu.org/cgit/parallel.git/tree/README:
Full installation
完全安装
Full installation of GNU Parallel is as simple as:
完全安装 GNU Parallel 非常简单:
./configure && make && make install
Personal installation
个人安装
If you are not root you can add ~/bin to your path and install in ~/bin and ~/share:
如果您不是 root,您可以将 ~/bin 添加到您的路径并安装在 ~/bin 和 ~/share 中:
./configure --prefix=$HOME && make && make install
Or if your system lacks 'make' you can simply copy src/parallel src/sem src/niceload src/sql to a dir in your path.
或者,如果您的系统缺少“make”,您可以简单地将 src/parallel src/sem src/niceload src/sql 复制到路径中的目录。
Minimal installation
最少安装
If you just need parallel and do not have 'make' installed (maybe the system is old or Microsoft Windows):
如果您只需要并行并且没有安装“make”(可能系统是旧的或 Microsoft Windows):
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
mv parallel sem dir-in-your-$PATH/bin/
Test the installation
测试安装
After this you should be able to do:
在此之后,您应该能够执行以下操作:
parallel -j0 ping -nc 3 ::: foss.org.my gnu.org freenetproject.org
This will send 3 ping packets to 3 different hosts in parallel and print the output when they complete.
这将并行向 3 个不同的主机发送 3 个 ping 数据包,并在它们完成时打印输出。
Watch the intro video for a quick introduction: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
观看介绍视频以进行快速介绍:https: //www.youtube.com/playlist?list=PL284C9FF2488BC6D1
回答by Zhehao Mao
I would suggest writing four scripts, each one of which executes a certain number of tasks in series. Then write another script that starts the four scripts in parallel. For instance, if you have scripts, script1.sh, script2.sh, script3.sh, and script4.sh, you could have a script called headscript.sh like so.
我建议编写四个脚本,每个脚本依次执行一定数量的任务。然后编写另一个脚本,并行启动四个脚本。例如,如果您有脚本、script1.sh、script2.sh、script3.sh 和 script4.sh,那么您可以像这样拥有一个名为 headscript.sh 的脚本。
#!/bin/sh
./script1.sh &
./script2.sh &
./script3.sh &
./script4.sh &
回答by zurfyx
Following @Parag Sardas'answer and the documentation linked here's a quick script you might want to add on your .bash_aliases
.
按照@Parag Sardas 的回答和此处链接的文档,您可能希望在.bash_aliases
.
Relinking the doc linkbecause it's worth a read
重新链接文档链接,因为它值得一读
#!/bin/bash
# https://stackoverflow.com/a/19618159
# https://stackoverflow.com/a/51861820
#
# Example file contents:
# touch /tmp/a.txt
# touch /tmp/b.txt
if [ "$#" -eq 0 ]; then
echo "#!/usr/local/bin/bash
this_pid="$$"
jobs_running=0
sleep_pid=
# Catch alarm signals to adjust the number of running jobs
trap 'decrement_jobs' SIGALRM
# When a job finishes, decrement the total and kill the sleep process
decrement_jobs()
{
jobs_running=$(($jobs_running - 1))
if [ -n "${sleep_pid}" ]
then
kill -s SIGKILL "${sleep_pid}"
sleep_pid=
fi
}
# Check to see if the max jobs are running, if so sleep until woken
launch_task()
{
if [ ${jobs_running} -gt 3 ]
then
(
while true
do
sleep 999
done
) &
sleep_pid=$!
wait ${sleep_pid}
fi
# Launch the requested task, signalling the parent upon completion
(
"$@"
kill -s SIGALRM "${this_pid}"
) &
jobs_running=$((${jobs_running} + 1))
}
# Launch all of the tasks, this can be in a loop, etc.
launch_task task1
launch_task tast2
...
launch_task task99
<file> [max-procs=0]"
exit 1
fi
FILE=
MAX_PROCS=${2:-0}
cat $FILE | while read line; do printf "%q\n" "$line"; done | xargs --max-procs=$MAX_PROCS -I CMD bash -c CMD
I.e.
./xargs-parallel.sh jobs.txt 4
maximum of 4 processes read from jobs.txt
即
./xargs-parallel.sh jobs.txt 4
最多 4 个进程从 jobs.txt 中读取
回答by Brandon Horsley
You could probably do something clever with signals.
你可能可以用信号做一些聪明的事情。
Note this is only to illustrate the concept, and thus not thoroughly tested.
请注意,这只是为了说明这个概念,因此没有经过彻底的测试。
#!/usr/bin/bash
set -o monitor
trap "pkill -P $$ -f 'sleep 10\.9' >&/dev/null" SIGCHLD
totaljobs=15
numjobs=5
worktime=10
curjobs=0
declare -A pidlist
dojob()
{
slot=
time=$(echo "$RANDOM * 10 / 32768" | bc -l)
echo Starting job $slot with args $time
sleep $time &
pidlist[$slot]=`jobs -p %%`
curjobs=$(($curjobs + 1))
totaljobs=$(($totaljobs - 1))
}
# start
while [ $curjobs -lt $numjobs -a $totaljobs -gt 0 ]
do
dojob $curjobs
done
# Poll for jobs to die, restarting while we have them
while [ $totaljobs -gt 0 ]
do
for ((i=0;$i < $curjobs;i++))
do
if ! kill -0 ${pidlist[$i]} >&/dev/null
then
dojob $i
break
fi
done
sleep 10.9 >&/dev/null
done
wait
回答by Seth Robertson
This tested script runs 5 jobs at a time and will restart a new job as soon as it does (due to the kill of the sleep 10.9 when we get a SIGCHLD. A simpler version of this could use direct polling (change the sleep 10.9 to sleep 1 and get rid of the trap).
这个经过测试的脚本一次运行 5 个作业,并且会尽快重新启动一个新作业(由于当我们获得 SIGCHLD 时会杀死 sleep 10.9。这个更简单的版本可以使用直接轮询(将 sleep 10.9 更改为sleep 1 并摆脱陷阱)。
ln -s executable1 ./01-task.01
回答by Alex Gitelman
Other answer about 4 shell scripts does not fully satisfies me as it assumes that all tasks take approximatelu the same time and because it requires manual set up. But here is how I would improve it.
关于 4 个 shell 脚本的其他答案并不完全让我满意,因为它假设所有任务都大约在同一时间执行,并且因为它需要手动设置。但这是我将如何改进它。
Main script will create symbolic links to executables following certain namimg convention. For example,
主脚本将按照特定的 namimg 约定创建指向可执行文件的符号链接。例如,
for t in $(ls ./*-task.$batch | sort ; do
t
rm t
done
first prefix is for sorting and suffix identifies batch (01-04). Now we spawn 4 shell scripts that would take batch number as input and do something like this
第一个前缀用于排序,后缀标识批次(01-04)。现在我们生成 4 个 shell 脚本,它们将批次号作为输入并执行如下操作
./jp.sh "My Download Pool" 3 curl http://site1/...
./jp.sh "My Download Pool" 3 curl http://site2/...
./jp.sh "My Download Pool" 3 curl http://site3/...
...
回答by Michael Spector
Look at my implementation of job pool in bash: https://github.com/spektom/shell-utils/blob/master/jp.sh
看看我在 bash 中工作池的实现:https: //github.com/spektom/shell-utils/blob/master/jp.sh
For example, to run at most 3 processes of cURL when downloading from a lot of URLs, you can wrap your cURL commands as follows:
例如,从大量 URL 下载时,最多运行 3 个 cURL 进程,您可以将 cURL 命令包装如下:
function task() {
local task_no=""
# doing the actual task...
echo "Executing Task ${task_no}"
# which takes a long time
sleep 1
}
function execute_concurrently() {
local tasks=""
local ps_pool_size=""
# create an anonymous fifo as a Semaphore
local sema_fifo
sema_fifo="$(mktemp -u)"
mkfifo "${sema_fifo}"
exec 3<>"${sema_fifo}"
rm -f "${sema_fifo}"
# every 'x' stands for an available resource
for i in $(seq 1 "${ps_pool_size}"); do
echo 'x' >&3
done
for task_no in $(seq 1 "${tasks}"); do
read dummy <&3 # blocks util a resource is available
(
trap 'echo x >&3' EXIT # returns the resource on exit
task "${task_no}"
)&
done
wait # wait util all forked tasks have finished
}
execute_concurrently 10 4
回答by Wenhao Ji
Here is my solution. The idea is quite simple. I create a fifo
as a semaphore, where each line stands for an available resource. When read
ing the queue, the main process blocks if there is nothing left. And, we return the resource after the task is done by simply echo
ing anything to the queue.
这是我的解决方案。这个想法很简单。我创建了fifo
一个信号量,其中每一行代表一个可用资源。在read
队列中,如果没有任何东西,主进程就会阻塞。并且,我们在任务完成后通过简单地将echo
任何内容放入队列来返回资源。
The script above will run 10 tasks and 4 each time concurrently. You can change the $(seq 1 "${tasks}")
sequence to the actual task queue you want to run.
上面的脚本将同时运行 10 个任务和 4 个任务。您可以将$(seq 1 "${tasks}")
顺序更改为要运行的实际任务队列。