bash slurm 脚本给出“找不到命令”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30019295/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 12:55:21  来源:igfitidea点击:

slurm script gives "command not found"

linuxbashshellcluster-computingslurm

提问by Madeleine P. Vincent

I am trying to submit a script to slurm that runs m4 on an input file. m4 is installed on our cluster, and if I run the script by itself, everything works as expected. But when I submit a run to slurm via a slurm script, I get an error.

我正在尝试向 slurm 提交一个脚本,该脚本在输入文件上运行 m4。m4 安装在我们的集群上,如果我自己运行脚本,一切都会按预期进行。但是,当我通过 slurm 脚本向 slurm 提交运行时,出现错误。

Here is the script I would like to run (named m4it.sh).
[Note that I'm printing PATH and SHELL in an attempt to debug.]

这是我想运行的脚本(名为 m4it.sh)。
[请注意,我正在打印 PATH 和 SHELL 以尝试调试。]

#!/usr/bin/env bash

echo "Beginning m4it.sh"
echo "PATH=$PATH"
echo "SHELL=$SHELL"
echo

m4 file.m4 > fileout.txt

and here is my slurm script:

这是我的 slurm 脚本:

#!/usr/bin/env bash
#
#SBATCH --job-name=m4it

### Account name (req'd)
#SBATCH --account=MyAccount

### Redirect .o and .e files to the logs dir
#SBATCH -o m4it.out
#SBATCH -e m4it.err
#
#SBATCH --ntasks=1
#SBATCH --time=00:01:00
#SBATCH --mem-per-cpu=125

echo "PATH=$PATH"
echo "SHELL=$SHELL"
echo 
echo "running m4it.sh"
echo
./m4it.sh

which submits successfully to slurm via

成功提交到 slurm 通过

sbatch m4it.slurm

When it executes, I get the following error in my m4it.err logfile:

当它执行时,我在 m4it.err 日志文件中收到以下错误:

./m4it.sh: line 8: m4: command not found

The PATH and the SHELL variables (printed to m4it.out by the m4it.slurm and by the m4it.sh scripts) are identical. The PATH contains my PATH when I login, and SHELL is /bin/bash, as expected.

PATH 和 SHELL 变量(由 m4it.slurm 和 m4it.sh 脚本打印到 m4it.out)是相同的。PATH 包含我登录时的 PATH,正如预期的那样,SHELL 是 /bin/bash。

Even if I include a symlink to the m4 executable from a directory in my PATH, I still get this error. Also, it is not just m4 that is the problem. The script will report the command "apropos" as an unknown command, even though it runs fine on the command line. The script can "cd" and "ls" just fine though.

即使我在 PATH 中的目录中包含指向 m4 可执行文件的符号链接,我仍然会收到此错误。此外,不仅仅是 m4 是问题所在。该脚本会将命令“apropos”报告为未知命令,即使它在命令行上运行良好。不过,脚本可以“cd”和“ls”就好了。

I've checked read/write/execute permissions.

我已经检查了读/写/执行权限。

ls -ld / /usr /usr/bin /usr/bin/m4 

yields the following:

产生以下结果:

dr-xr-xr-x. 30 root root   4096 Apr  8 11:11 /
drwxr-xr-x. 14 root root   4096 Feb 17 20:24 /usr
dr-xr-xr-x.  2 root root  36864 Apr 29 11:14 /usr/bin
-rwxr-xr-x   1 root root 212440 Jun  3  2010 /usr/bin/m4

It seems that the node the m4it.sh script executes on is different from the front node and that somehow information (environment variables or paths) are not coming across. I have also tried to export all my settings with the argument --export=ALL as follows:

似乎 m4it.sh 脚本在其上执行的节点与前端节点不同,并且信息(环境变量或路径)不知何故没有出现。我还尝试使用参数 --export=ALL 导出我的所有设置,如下所示:

sbatch m4it.slurm --export=ALL

but this didn't work either (same result). Can anyone help here?

但这也不起作用(结果相同)。有人可以帮忙吗?

采纳答案by Madeleine P. Vincent

I was able to log in to the compute node in an interactive session. Indeed that node's /usr/bin is significantly different than the front node's, and m4 is not installed.

我能够在交互式会话中登录到计算节点。实际上,该节点的 /usr/bin 与前端节点的明显不同,并且未安装 m4。

This also explains why the symlink from a directory in my PATH no longer worked. It was pointing to /usr/bin/m4, but as soon as the job was executed on that compute node, /usr/bin/m4 no longer existed, and thus the symlink was invalid.

这也解释了为什么我的 PATH 中某个目录的符号链接不再起作用。它指向 /usr/bin/m4,但是一旦在该计算节点上执行作业,/usr/bin/m4 就不再存在,因此符号链接无效。

If I want to use m4, the solution is to either ask the admins to install m4 on the compute nodes or, alternatively, copy a local version of the executable to somewhere in my home directory that exists in my PATH variable.

如果我想使用 m4,解决方案是要求管理员在计算节点上安装 m4,或者,将可执行文件的本地版本复制到我的 PATH 变量中存在的主目录中的某个位置。