当我的 qsub 作业在服务器上完成时如何自动运行 bash 脚本?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3886168/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to automatically run a bash script when my qsub jobs are finished on a server?
提问by David LeBauer
I would like to run a script when all of the jobs that I have sent to a server are done.
当我发送到服务器的所有作业都完成后,我想运行一个脚本。
for example, I send
例如,我发送
ssh server "for i in config*; do qsub ./run 1 $i; done"
And I get back a list of the jobs that were started. I would like to automatically start another script on the server to process the output from these jobs once all are completed.
我得到一份已经开始的工作清单。我想在服务器上自动启动另一个脚本,以便在所有这些作业完成后处理这些作业的输出。
I would appreciate any advice that would help me avoid the following inelegant solution:
我将不胜感激任何可以帮助我避免以下不雅解决方案的建议:
If I save each of the 1000 job id's from the above call in a separate file, I could check the contents of each file against the current list of running jobs, i.e. output from a call to:
如果我将来自上述调用的 1000 个作业 ID 中的每一个保存在一个单独的文件中,我可以根据当前正在运行的作业列表检查每个文件的内容,即调用的输出:
ssh qstat
I would only need to check every half hour, but I would imagine that there is a better way.
我只需要每半小时检查一次,但我想有更好的方法。
回答by Jonathan Dursi
It depends a bit on what job scheduler you are using and what version, but there's another approach that can be taken too if your results-processing can also be done on the same queue as the job.
这在一定程度上取决于您使用的作业调度程序和版本,但是如果您的结果处理也可以在与作业相同的队列上完成,那么也可以采用另一种方法。
One very handy way of managing lots of related job in more recent versions of torque (and with grid engine, and others) is to launch the any individual jobs as a job array (cf. http://docs.adaptivecomputing.com/torque/4-1-4/Content/topics/commands/qsub.htm#-t). This requires mapping the individual runs to numbers somehow, which may or may not be convenient; but if you can do it for your jobs, it does greatly simplify managing the jobs; you can qsub them all in one line, you can qdel or qhold them all at once (while still having the capability to deal with jobs individually).
在更新的扭矩版本(以及使用网格引擎等)中管理大量相关作业的一种非常方便的方法是将任何单个作业作为作业阵列启动(参见http://docs.adaptivecomputing.com/torque /4-1-4/Content/topics/commands/qsub.htm#-t)。这需要以某种方式将单个运行映射到数字,这可能方便也可能不方便;但如果你能为你的工作做到这一点,它确实大大简化了工作的管理;您可以在一行中对它们全部进行 qsub,也可以一次对它们进行 qdel 或 qhold(同时仍然具有单独处理作业的能力)。
If you do this, then you could submit an analysis job which had a dependency on the array of jobs which would only run once all of the jobs in the array were complete: (cf. http://docs.adaptivecomputing.com/torque/4-1-4/Content/topics/commands/qsub.htm#dependencyExamples). Submitting the job would look like:
如果您这样做,那么您可以提交一个依赖于作业数组的分析作业,该作业仅在数组中的所有作业完成后才运行:(参见http://docs.adaptivecomputing.com/torque /4-1-4/Content/topics/commands/qsub.htm#dependencyExamples)。提交作业看起来像:
qsub analyze.sh -W depend=afterokarray:427[]
where analyze.sh had the script to do the analysis, and 427 would be the job id of the array of jobs you launched. (The [] means only run after all are completed). The syntax differs for other schedulers (eg, SGE/OGE) but the ideas are the same.
其中,analyze.sh 有执行分析的脚本,427 将是您启动的作业数组的作业 ID。([] 表示仅在所有完成后运行)。其他调度程序(例如,SGE/OGE)的语法不同,但思想是相同的。
Getting this right can take some doing, and certainly Tristan's approach has the advantage of being simple, and working with any scheduler; but learning to use job arrays in this situation if you'll be doing alot of this may be worth your time.
要做到这一点可能需要做一些事情,当然 Tristan 的方法具有简单的优点,并且可以与任何调度程序一起工作;但是在这种情况下学习使用作业数组,如果你会做很多这样的事情可能值得你花时间。
回答by Tristan
Something you might consider is having each job script just touch a filename in a dedicated folder like $i.jobdone, and in your master script, you could simply use ls *.jobdone | wc -lto test for the right number of jobs done.
您可能会考虑让每个作业脚本只需触摸专用文件夹中的文件名,例如$i.jobdone,并且在您的主脚本中,您可以简单地使用ls *.jobdone | wc -l来测试已完成的正确数量的作业。
回答by slezica
You can use waitto stop execution until all your jobs are done. You can even collect all the exit statuses and other running statistics (time it took, count of jobs done at the time, whatever) if you cycle around waiting for specific ids.
您可以使用wait停止执行,直到完成所有作业。如果您循环等待特定 id,您甚至可以收集所有退出状态和其他运行统计信息(花费的时间,当时完成的作业计数,等等)。
I'd write a small C program to do the waiting and collecting (if you have permissions to upload and run executables), but you can easily use the bash waitbuilt-in for roughly the same purpose, albeit with less flexibility.
我会编写一个小的 C 程序来进行等待和收集(如果您有上传和运行可执行文件的权限),但是您可以轻松地将内置的 bash wait用于大致相同的目的,尽管灵活性较低。
Edit: small example.
编辑:小例子。
#!/bin/bash
...
waitfor=''
for i in tasks; do
task &
waitfor="$waitfor $!"
done
wait $waitfor
...
If you run this script in background, It won't bother you and whatever comes after the waitline will run when your jobs are over.
如果你在后台运行这个脚本,它不会打扰你,当你的工作结束时,等待线之后的任何内容都会运行。

