Linux 什么会导致 exec 失败?接下来发生什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3703013/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 23:09:45  来源:igfitidea点击:

What can cause exec to fail? What happens next?

clinuxunixexec

提问by pythonic metaphor

What are the reasons that an exec (execl,execlp, etc.) can fail? If you make a call to exec and it returns, are there any best practices other than just panicking and calling exit?

exec(execl、execlp 等)失败的原因是什么?如果您调用 exec 并返回,除了恐慌和调用 exit 之外,还有其他最佳实践吗?

采纳答案by Carl Norum

From the exec(3)man page:

exec(3)手册页

The execl(), execle(), execlp(), execvp(), and execvP()functions may fail and set errno for any of the errors specified for the library functions execve(2)and malloc(3).

The execv()function may fail and set errno for any of the errors specified for the library function execve(2).

execl()execle()execlp()execvp(),和execvP()功能可能会失败,任何对库函数指定的错误设置errnoexecve(2)malloc(3)

execv()函数可能会失败并为库函数指定的任何错误设置 errno execve(2)

And then from the execve(2)man page:

然后从execve(2)手册页

ERRORS

Execve()will fail and return to the calling process if:

  • [E2BIG]- The number of bytes in the new process's argument list is larger than the system-imposed limit. This limit is specified by the sysctl(3)MIB variable KERN_ARGMAX.
  • [EACCES]- Search permission is denied for a component of the path prefix.
  • [EACCES]- The new process file is not an ordinary file.
  • [EACCES]- The new process file mode denies execute permission.
  • [EACCES]- The new process file is on a filesystem mounted with execution disabled (MNT_NOEXECin <sys/mount.h>).
  • [EFAULT]- The new process file is not as long as indicated by the size values in its header.
  • [EFAULT]- Path, argv, or envp point to an illegal address.
  • [EIO]- An I/O error occurred while reading from the file system.
  • [ELOOP]- Too many symbolic links were encountered in translating the pathname. This is taken to be indicative of a looping symbolic link.
  • [ENAMETOOLONG]- A component of a pathname exceeded {NAME_MAX}characters, or an entire path name exceeded {PATH_MAX}characters.
  • [ENOENT]- The new process file does not exist.
  • [ENOEXEC]- The new process file has the appropriate access permission, but has an unrecognized format (e.g., an invalid magic number in its header).
  • [ENOMEM]- The new process requires more virtual memory than is allowed by the imposed maximum (getrlimit(2)).
  • [ENOTDIR]- A component of the path prefix is not a directory.
  • [ETXTBSY]- The new process file is a pure procedure (shared text) file that is currently open for writing or reading by some process.

错误

Execve()如果出现以下情况,将失败并返回调用过程:

  • [E2BIG]- 新进程的参数列表中的字节数大于系统施加的限制。此限制由sysctl(3)MIB 变量指定KERN_ARGMAX
  • [EACCES]- 拒绝路径前缀组件的搜索权限。
  • [EACCES]- 新的进程文件不是普通文件。
  • [EACCES]- 新的进程文件模式拒绝执行权限。
  • [EACCES]- 新进程文件位于已禁用执行的挂载文件系统上 ( MNT_NOEXECin <sys/mount.h>)。
  • [EFAULT]- 新的进程文件没有其标题中的大小值所指示的那么长。
  • [EFAULT]- 路径、argv 或 envp 指向非法地址。
  • [EIO]- 从文件系统读取时发生 I/O 错误。
  • [ELOOP]- 在翻译路径名时遇到太多符号链接。这被认为是循环符号链接的指示。
  • [ENAMETOOLONG]- 路径名的一个组成部分超过了{NAME_MAX}字符,或者整个路径名超过了{PATH_MAX}字符。
  • [ENOENT]- 新的进程文件不存在。
  • [ENOEXEC]- 新进程文件具有适当的访问权限,但格式无法识别(例如,其标题中的幻数无效)。
  • [ENOMEM]- 新进程需要比强加的最大值 ( getrlimit(2))所允许的更多的虚拟内存。
  • [ENOTDIR]- 路径前缀的组成部分不是目录。
  • [ETXTBSY]- 新进程文件是一个纯过程(共享文本)文件,当前打开以供某些进程写入或读取。

malloc()is a lot less complicated, and uses only ENOMEM. From the malloc(3) man page:

malloc()不那么复杂,并且只使用ENOMEM. 来自malloc(3) man page

If successful, calloc(), malloc(), realloc(), reallocf(), and valloc()functions return a pointer to allocated memory. If there is an error, they return a NULLpointer and set errnoto ENOMEM.

如果成功,calloc()malloc()realloc()reallocf()valloc()函数返回指向已分配内存的指针。如果有错误,它们返回一个NULL指针并设置errnoENOMEM

回答by user446568

Whether than just panicking, you could take a decision based on errno's value.

是否不仅仅是恐慌,您可以根据 errno 的值做出决定。

回答by Jonathan Leffler

What you do after the exec()call returns depends on the context - what the program is supposed to do, what the error is, and what you might be able to do to work around the problem.

您在exec()调用返回后执行的操作取决于上下文 - 程序应该执行什么操作、错误是什么以及您可能能够做什么来解决该问题。

One source of trouble could be that you specified a simple program name instead of a pathname; maybe you could retry with execvp(), or convert the command into an invocation of sh -c 'what you originally specified'. Whether any of these is reasonable depends on the application. If there are major security issues involved, probably you don't try again.

问题的一个来源可能是您指定了一个简单的程序名而不是路径名;也许您可以使用 重试execvp(),或将命令转换为对 的调用sh -c 'what you originally specified'。这些是否合理取决于应用程序。如果涉及重大安全问题,您可能不会再试一次。

If you specified a pathname and there is a problem with that (ENOTDIR, ENOENT, EPERM), then you may not have any sensible fallback, but you can report the error meaningfully.

如果您指定了一个路径名并且该路径名存在问题(ENOTDIR、ENOENT、EPERM),那么您可能没有任何合理的回退,但您可以有意义地报告错误。

In the old days (10+ years ago), some systems did not support the '#!' shebang notation, and if you were not sure whether you were executing an executable or a shell script, you tried it as an executable and then retried it as a shell script. That might or might not work if you were running a Perl script, but in those days, you wrote your Perl scripts to detect that they were being run by a shell and to re-exec themselves with Perl. Fortunately, those days are mostly over.

在过去(10 多年前),某些系统不支持“#!” shebang 表示法,如果您不确定是在执行可执行文件还是 shell 脚本,则可以将其作为可执行文件进行尝试,然后将其作为 shell 脚本进行重试。如果您正在运行 Perl 脚本,这可能会也可能不会起作用,但是在那些日子里,您编写了 Perl 脚本来检测它们是由 shell 运行的,并用 Perl 重新执行它们自己。幸运的是,那些日子已经过去了。

To the extent possible, it is important to ensure that the process reports the problem so that it can be traced - writing its message to a log file or just to stderr (or maybe even syslog()), so that those who have to work out what went wrong have more information to help them other than the hapless end user's report "I tried X and it didn't work". It is crucial that if nothing works, then the exit status is not 0 as that indicates success. Even that might be ignored - but you did what you could.

在可能的情况下,重要的是确保进程报告问题以便可以对其进行跟踪 - 将其消息写入日志文件或仅写入 stderr(甚至可能syslog()),以便那些必须弄清楚发生了什么的人除了不幸的最终用户报告“我尝试了 X 但它没有用”之外,错误还有更多信息可以帮助他们。至关重要的是,如果没有任何效果,则退出状态不是 0,因为这表示成功。即使这可能会被忽略 - 但你做了你能做的。

回答by R.. GitHub STOP HELPING ICE

The problem with handling execfailure is that usually execis performed in a child process, and you want to do the error handling in the parent process. But you can't just exit(errno)because (1) you don't know if error codes fit in an exit code, and (2), you can't distinguish between failure to execand failure exit codes from the new program you exec.

处理exec失败的问题是通常exec在子进程中执行,而您希望在父进程中进行错误处理。但是您不能仅仅exit(errno)因为 (1) 您不知道错误代码是否适合退出代码,以及 (2) 您无法区分exec新程序的失败和失败退出代码exec

The best solution I know is using pipes to communicate the success or failure of exec:

我知道的最好的解决方案是使用管道来传达成功或失败的信息exec

  1. Before forking, open a pipe in the parent process.
  2. After forking, the parent closes the writing end of the pipe and reads from the reading end.
  3. The child closes the reading end and sets the close-on-exec flag for the writing end.
  4. The child calls exec.
  5. If exec fails, the child writes the error code back to the parent using the pipe, then exits.
  6. The parent reads eof (a zero-length read) if the child successfully performed exec, since close-on-exec made successful execclose the writing end of the pipe. Or, if execfailed, the parent reads the error code and can proceed accordingly. Either way, the parent blocks until the child calls exec.
  7. The parent closes the reading end of the pipe.
  1. 在 fork 之前,在父进程中打开一个管道。
  2. 分叉后,父进程关闭管道的写端,从读端读取。
  3. 子进程关闭读取端并为写入端设置 close-on-exec 标志。
  4. 孩子调用exec。
  5. 如果 exec 失败,子进程使用管道将错误代码写回父进程,然后退出。
  6. 如果子进程成功执行exec,父进程读取 eof(零长度读取),因为 close-on-exec 成功exec关闭了管道的写入端。或者,如果exec失败,父级读取错误代码并可以相应地继续。无论哪种方式,父进程都会阻塞,直到子进程调用exec.
  7. 父级关闭管道的读取端。

回答by Sam Watkins

Exec should always succeed. (except for shells, i.e. if the user entered a bogus command)

Exec 应该总是成功。(shell 除外,即如果用户输入了虚假命令)

If exec does fail, it indicates:

如果 exec 确实失败,则表明:

  • a "fault" with the program (missing or bad component, wrong pathname, bad memory, ...), or
  • a serious system error (out of memory, too many processes, disk fault, ...)
  • 程序的“错误”(缺少或损坏的组件、错误的路径名、坏的内存等),或
  • 严重的系统错误(内存不足、进程过多、磁盘故障……)

For any serious error, the normal approach is to write the error message on stderr, then exit with a failure code. Almost all of the standard tools do this. For exec:

对于任何严重错误,通常的方法是将错误消息写入 stderr,然后以失败代码退出。几乎所有的标准工具都这样做。对于执行:

execl("bork", "bork", NULL);
perror("failed: exec");
exit(127);

The shell does that, too (more or less).

外壳也这样做(或多或少)。

Normally if a child process fails, the parent has failed too and should exit. It does not matter whether the child failed in exec, or while running the program. If exec failed, it does not matter why exec failed. If the child process failed for any reason, the calling process is in trouble and needs to stop.

通常,如果子进程失败,父进程也失败并应该退出。孩子是在 exec 中失败还是在运行程序时失败都没有关系。如果 exec 失败,则 exec 失败的原因无关紧要。如果子进程因任何原因失败,调用进程就会遇到麻烦,需要停止。

Don't waste lots of time trying to anticipate all possible error conditions. Don't write code that tries to handle each error code in the best possible way. You'll just bloat the code, and introduce many new bugs. If your program is broken, or it's being abused, it should simply fail. If you force it to continue, worse trouble will come of that.

不要浪费大量时间试图预测所有可能的错误情况。不要编写试图以最佳方式处理每个错误代码的代码。您只会使代码膨胀,并引入许多新错误。如果您的程序被破坏,或者被滥用,它应该只是失败。如果你强迫它继续,更糟糕的麻烦会来。

For example, if the system is out of memory and thrashing swap, we don't want to cycle over and over trying to run a process; it would just make the situation worse. If we get a filesystem error, we don't want to continue running on that filesystem; it might make the corruption worse. If the program was installed wrongly, or has a bug, or has memory corruption, we want to stop as soon as possible, before that broken program does some real damage (such as sending a corrupted report to a client, trashing a database, ...).

例如,如果系统内存不足并且交换抖动,我们不想一遍又一遍地尝试运行一个进程;它只会使情况变得更糟。如果我们收到文件系统错误,我们不想继续在该文件系统上运行;这可能会使腐败变得更糟。如果程序安装错误、存在错误或内存损坏,我们希望尽快停止,以免损坏的程序造成真正的损坏(例如向客户端发送损坏的报告、破坏数据库、. ...)。

One possible alternative: a failing process might call for help, pause itself (SIGSTOP), then retry the operation if told to continue. This could help when the system is out of memory, or disks are full, or perhaps even if there is a fault in the program. Few operations are so expensive and important that this would be worth while.

一种可能的替代方法:失败的进程可能会寻求帮助,暂停自身 (SIGSTOP),然后在被告知继续时重试该操作。当系统内存不足,磁盘已满,或者即使程序中存在错误时,这可能会有所帮助。很少有操作如此昂贵和重要,以至于值得这样做。

If you're making an interactive GUI program, try to do it as a thin wrapper over reusable command-line tools (which exit if something goes wrong). Every function in your program should be accessible through the GUI, through the command-line, and as a function call. Write your functions. Write a few tools to make commmand-line and GUI wrappers for any function. Use sub-processes too.

如果您正在制作交互式 GUI 程序,请尝试将其作为可重用命令行工具的薄包装器(如果出现问题则退出)。程序中的每个函数都应该可以通过 GUI、命令行和函数调用来访问。编写您的函数。编写一些工具来为任何函数制作命令行和 GUI 包装器。也使用子流程。

If you are making a truly critical system, such as a controller for a nuclear power station, or a program to predict tsunamis, then what are you doing reading my dumb advice? Critical systems should not depend entirely on computers or software. There needs to be a 'manual override', with someone to drive it. Especially, do not attempt to build a critical system on MS Windows, that is like building sand castles underwater.

如果你正在制作一个真正关键的系统,比如核电站的控制器,或者一个预测海啸的程序,那么你在阅读我愚蠢的建议做什么?关键系统不应完全依赖计算机或软件。需要有一个“手动覆盖”,有人来驾驶它。特别是,不要试图在 MS Windows 上构建关键系统,就像在水下建造沙堡一样。