bash 如何向 GNU Parallel 提供大量命令?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16426845/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to feed a large array of commands to GNU Parallel?
提问by sv.
I'm evaluating if GNU Parallelcan be used to search files stored on a system in parallel. There can be only one file for each day of year (doy) on the system (so a maximum of 366 files per year). Let's say there are 3660 files on the system (about 10 years worth of data). The system could be a multi-CPU multi-core Linux or a multi-CPU Solaris.
我正在评估GNU Parallel 是否可用于并行搜索系统上存储的文件。系统上一年中的每一天 (doy) 只能有一个文件(因此每年最多 366 个文件)。假设系统上有 3660 个文件(大约 10 年的数据)。该系统可以是多 CPU 多核 Linux 或多 CPU Solaris。
I'm storing the search commands to run on the files in an array (one command per file). And this is what I'm doing right now (using bash) but then I have no control on how many searches to start in parallel (definitely don't want to start all 3660 searches at once):
我正在存储搜索命令以在数组中的文件上运行(每个文件一个命令)。这就是我现在正在做的事情(使用 bash),但是我无法控制并行启动多少次搜索(绝对不想一次启动所有 3660 次搜索):
#!/usr/bin/env bash
declare -a cmds
declare -i cmd_ctr=0
while [[ <condition> ]]; do
if [[ -s $cur_archive_path/log.${doy_ctr} ]]; then
cmds[$cmd_ctr]="<cmd_to_run>"
let cmd_ctr++
fi
done
declare -i arr_len=${#cmds[@]}
for (( i=0; i<${arr_len}; i++ ));
do
# Get the command and run it in background
eval ${cmds[$i]} &
done
wait
If I were to use parallel(which will automatically figure out the max. CPUs/cores and start only so many searches in parallel), how can I reuse the array cmdswith parallel and rewrite the above code? The other alternative is to write all commands to a file and then do cat cmd_file | parallel
如果我要使用parallel(它会自动计算出最大的 CPUs/cores 并只开始并行搜索这么多),我该如何并行重用数组cmds并重写上面的代码?另一种选择是将所有命令写入文件,然后执行cat cmd_file | parallel
采纳答案by Ole Tange
https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Using-shell-variablessays:
https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Using-shell-variables说:
parallel echo ::: "${V[@]}"
You do not want the echo, so:
你不想要回声,所以:
parallel ::: "${cmds[@]}"
If you do not need $cmds for anything else, then use 'sem' (which is an alias for parallel --semaphore) https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Working-as-mutex-and-counting-semaphore
如果您不需要 $cmds 用于其他任何事情,请使用“sem”(这是并行 --semaphore 的别名)https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Working-作为互斥量和计数信号量
while [[ <condition> ]]; do
if [[ -s $cur_archive_path/log.${doy_ctr} ]]; then
sem -j+0 <cmd_to_run>
fi
done
sem --wait
You have not described what <condition> might be. If you are simply doing a something like a for-loop you could replace the whole script with:
您还没有描述 <condition> 可能是什么。如果您只是在执行类似 for 循环的操作,则可以将整个脚本替换为:
parallel 'if [ -s {} ] ; then cmd_to_run {}; fi' ::: $cur_archive_path/log.{1..3660}
(based on https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Composed-commands).
(基于https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Composed-commands)。

