Linux Python subprocess.Popen "OSError: [Errno 12] 无法分配内存"
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1367373/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory"
提问by DavidM
Note:This question was originally asked herebut the bounty time expired even though an acceptable answer was not actually found. I am re-asking this question including all details provided in the original question.
注意:这个问题最初是在这里提出的,但即使实际上没有找到可接受的答案,赏金时间也已过期。我正在重新提出这个问题,包括原始问题中提供的所有细节。
A python script is running a set of class functions every 60 seconds using the schedmodule:
一个 python 脚本使用sched模块每 60 秒运行一组类函数:
# sc is a sched.scheduler instance
sc.enter(60, 1, self.doChecks, (sc, False))
The script is running as a daemonised process using the code here.
该脚本使用此处的代码作为守护进程运行。
A number of class methods that are called as part of doChecks use the subprocessmodule to call system functions in order to get system statistics:
许多类的方法被称为doChecks的一部分使用的子模块调用系统函数,以获得系统统计:
ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
This runs fine for a period of time before the entire script crashing with the following error:
在整个脚本因以下错误而崩溃之前,它可以正常运行一段时间:
File "/home/admin/sd-agent/checks.py", line 436, in getProcesses
File "/usr/lib/python2.4/subprocess.py", line 533, in __init__
File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles
OSError: [Errno 12] Cannot allocate memory
The output of free -m on the server once the script has crashed is:
一旦脚本崩溃,服务器上 free -m 的输出是:
$ free -m
total used free shared buffers cached
Mem: 894 345 549 0 0 0
-/+ buffers/cache: 345 549
Swap: 0 0 0
The server is running CentOS 5.3. I am unable to reproduce on my own CentOS boxes nor with any other user reporting the same problem.
服务器正在运行 CentOS 5.3。我无法在我自己的 CentOS 机器上重现,也无法与报告相同问题的任何其他用户一起重现。
I have tried a number of things to debug this as suggested in the original question:
我已经尝试了很多方法来调试原始问题中的建议:
Logging the output of free -m before and after the Popen call. There is no significant change in memory usage i.e. memory is not gradually being used up as the script runs.
I added close_fds=True to the Popen call but this made no difference - the script still crashed with the same error. Suggested hereand here.
I checked the rlimits which showed (-1, -1) on both RLIMIT_DATA and RLIMIT_AS as suggested here.
An articlesuggested the having no swap space might be the cause but swap is actually available on demand (according to the web host) and this was also suggested as a bogus cause here.
The processes are being closed because that is the behaviour of using .communicate() as backed up by the Python source code and comments here.
在 Popen 调用之前和之后记录 free -m 的输出。内存使用没有显着变化,即内存不会随着脚本运行而逐渐用完。
我在 Popen 调用中添加了 close_fds=True 但这没有任何区别 - 脚本仍然因相同的错误而崩溃。建议在这里和这里。
我检查了这所建议双方RLIMIT_DATA和RLIMIT_AS显示(-1,-1)的rlimits这里。
这些进程正在关闭,因为这是使用 .communicate() 的行为,由 Python 源代码和此处的注释支持。
The entire checks can be found at on GitHub herewith the getProcesses function defined from line 442. This is called by doChecks() starting at line 520.
整个检查可以在GitHub上找到,这里有从第 442 行定义的 getProcesses 函数。这由 doChecks() 从第 520 行开始调用。
The script was run with strace with the following output before the crash:
该脚本在崩溃前使用 strace 运行,输出如下:
recv(4, "Total Accesses: 516662\nTotal kBy"..., 234, 0) = 234
gettimeofday({1250893252, 887805}, NULL) = 0
write(3, "2009-08-21 17:20:52,887 - checks"..., 91) = 91
gettimeofday({1250893252, 888362}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 74) = 74
gettimeofday({1250893252, 888897}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 67) = 67
gettimeofday({1250893252, 889184}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 81) = 81
close(4) = 0
gettimeofday({1250893252, 889591}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 63) = 63
pipe([4, 5]) = 0
pipe([6, 7]) = 0
fcntl64(7, F_GETFD) = 0
fcntl64(7, F_SETFD, FD_CLOEXEC) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
write(2, "Traceback (most recent call last"..., 35) = 35
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/agent."..., 52) = 52
open("/home/admin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/home/admin/sd-agent/dae"..., 60) = 60
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/agent."..., 54) = 54
open("/usr/lib/python2.4/sched.py", O_RDONLY|O_LARGEFILE) = 8
write(2, " File \"/usr/lib/python2.4/sched"..., 55) = 55
fstat64(8, {st_mode=S_IFREG|0644, st_size=4054, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "\"\"\"A generally useful event sche"..., 4096) = 4054
write(2, " ", 4) = 4
write(2, "void = action(*argument)\n", 25) = 25
close(8) = 0
munmap(0xb7d28000, 4096) = 0
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/checks"..., 60) = 60
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/checks"..., 64) = 64
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, " File \"/usr/lib/python2.4/subpr"..., 65) = 65
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n # c2pread <-"..., 4096) = 4096
write(2, " ", 4) = 4
write(2, "errread, errwrite)\n", 19) = 19
close(8) = 0
munmap(0xb7d28000, 4096) = 0
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, " File \"/usr/lib/python2.4/subpr"..., 71) = 71
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n # c2pread <-"..., 4096) = 4096
read(8, "table(self, handle):\n "..., 4096) = 4096
read(8, "rrno using _sys_errlist (or siml"..., 4096) = 4096
read(8, " p2cwrite = None, None\n "..., 4096) = 4096
write(2, " ", 4) = 4
write(2, "self.pid = os.fork()\n", 21) = 21
close(8) = 0
munmap(0xb7d28000, 4096) = 0
write(2, "OSError", 7) = 7
write(2, ": ", 2) = 2
write(2, "[Errno 12] Cannot allocate memor"..., 33) = 33
write(2, "\n", 1) = 1
unlink("/var/run/sd-agent.pid") = 0
close(3) = 0
munmap(0xb7e0d000, 4096) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x589978}, {0xb89a60, [], SA_RESTORER, 0x589978}, 8) = 0
brk(0xa022000) = 0xa022000
exit_group(1) = ?
回答by pilcrow
swap may not be the red herring previously suggested. How big is the python process in question just before the ENOMEM
?
交换可能不是之前建议的红鲱鱼。在ENOMEM
?之前有问题的 python 进程有多大?
Under kernel 2.6, /proc/sys/vm/swappiness
controls how aggressively the kernel will turn to swap, and overcommit*
files how much and how precisely the kernel may apportion memory with a wink and a nod. Like your facebook relationship status, it's complicated.
在内核 2.6 下,/proc/sys/vm/swappiness
控制内核转向交换的积极程度,并记录overcommit*
内核可以通过眨眼和点头来分配内存的数量和精确度。就像你的 facebook 关系状态一样,它很复杂。
...but swap is actually available on demand (according to the web host)...
...但交换实际上是按需提供的(根据网络主机)...
but not according to the output of your free(1)
command, which shows no swap space recognized by your server instance. Now, your web host may certainly know much more than I about this topic, but virtual RHEL/CentOS systems I've used have reported swap available to the guest OS.
但不是根据您的free(1)
命令的输出,它显示您的服务器实例没有识别出交换空间。现在,您的 Web 主机肯定比我更了解这个主题,但是我使用过的虚拟 RHEL/CentOS 系统报告说来宾操作系统可用交换。
Adapting Red Hat KB Article 15252:
A Red Hat Enterprise Linux 5 system will run just fine with no swap space at all as long as the sum of anonymous memory and system V shared memory is less than about 3/4 the amount of RAM. .... Systems with 4GB of ram or less [are recommended to have]a minimum of 2GB of swap space.
只要匿名内存和系统 V 共享内存的总和小于 RAM 量的大约 3/4,红帽企业 Linux 5 系统就可以在完全没有交换空间的情况下正常运行。.... 内存为 4GB 或更少的系统 [建议拥有]至少 2GB 的交换空间。
Compare your /proc/sys/vm
settings to a plain CentOS 5.3 installation. Add a swap file. Ratchet down swappiness
and see if you live any longer.
将您的/proc/sys/vm
设置与普通的 CentOS 5.3 安装进行比较。添加交换文件。放下棘轮swappiness
,看看你是否还能活得更久。
回答by codeDr
munmap(0xb7d28000, 4096) = 0
write(2, "OSError", 7) = 7
munmap(0xb7d28000, 4096) = 0
write(2, "OSError", 7) = 7
I've seen sloppy code that looks like this:
我见过看起来像这样的草率代码:
serrno = errno;
some_Syscall(...)
if (serrno != errno)
/* sound alarm: CATROSTOPHIC ERROR !!! */
You should check to see if this is what is happening in the python code. Errno is only valid if the proceeding system call failed.
您应该检查这是否是 python 代码中发生的情况。Errno 仅在进行中的系统调用失败时才有效。
Edited to add:
编辑添加:
You don't say how long this process lives. Possible consumers of memory
你没有说这个过程持续多久。可能的内存消费者
- forked processes
- unused data structures
- shared libraries
- memory mapped files
- 分支进程
- 未使用的数据结构
- 共享库
- 内存映射文件
回答by Jim Dennis
I continue to suspect that your customer/user has some kernel module or driver loaded which
is interfering with the clone()
system call (perhaps some obscure security enhancement,
something like LIDS but more obscure?) or is somehow filling up some of the kernel data
structures that are necessary for fork()
/clone()
to operate (process table, page
tables, file descriptor tables, etc).
我仍然怀疑您的客户/用户加载了一些内核模块或驱动程序,这些模块或驱动程序会干扰clone()
系统调用(也许是一些模糊的安全增强,比如 LIDS 但更模糊?)或者以某种方式填充了一些内核数据结构是fork()
/clone()
操作所必需的(进程表、页表、文件描述符表等)。
Here's the relevant portion of the fork(2)
man page:
这是fork(2)
手册页的相关部分:
ERRORS EAGAIN fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child. EAGAIN It was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered. To exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability. ENOMEM fork() failed to allocate the necessary kernel structures because memory is tight.
I suggest having the user try this after booting into a stock, generic kernel and with only a minimal set of modules and drivers loaded (minimum necessary to run your application/script). From there, assuming it works in that configuration, they can perform a binary search between that and the configuration which exhibits the issue. This is standard sysadmin troubleshooting 101.
我建议让用户在启动到普通的通用内核并且只加载最少的模块和驱动程序集(运行您的应用程序/脚本所必需的最少)后尝试此操作。从那里,假设它在该配置中工作,他们可以在该配置和出现问题的配置之间执行二进制搜索。这是标准的系统管理员故障排除 101。
The relevant line in your strace
is:
您的相关行strace
是:
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
... I know others have talked about swap and memory availability (and I would recommend that you set up at least a small swap partition, ironically even if it's on a RAM disk ... the code paths through the Linux kernel when it has even a tiny bit of swap available have been exercised far more extensively than those (exception handling paths) in which there is zero swap available.
...我知道其他人已经讨论过交换和内存可用性(我建议你至少设置一个小的交换分区,具有讽刺意味的是,即使它在 RAM 磁盘上......通过 Linux 内核时的代码路径与可用交换为零的那些(异常处理路径)相比,即使是一小部分可用交换也得到了更广泛的运用。
However I suspect that this is still a red herring.
但是我怀疑这仍然是一个红鲱鱼。
The fact that free
is reporting 0 (ZERO) memory in use by the cache and buffers is very disturbing. I suspect that the free
output ... and possibly your application issue here, are caused by some proprietary kernel module which is interfering with the memory allocation in some way.
free
报告缓存和缓冲区正在使用的 0(零)内存这一事实非常令人不安。我怀疑free
输出……可能还有您的应用程序问题,是由某些专有内核模块引起的,该模块以某种方式干扰了内存分配。
According to the man pages for fork()/clone() the fork() system call should return EAGAIN if your call would cause a resource limit violation (RLIMIT_NPROC) ... however, it doesn't say if EAGAIN is to be returned by other RLIMIT* violations. In any event if your target/host has some sort of weird Vormetric or other security settings (or even if your process is running under some weird SELinux policy) then it might be causing this -ENOMEM failure.
根据 fork()/clone() 的手册页,如果您的调用会导致资源限制违规 (RLIMIT_NPROC),则 fork() 系统调用应返回 EAGAIN ...但是,它没有说明是否要返回 EAGAIN其他 RLIMIT* 违规。无论如何,如果您的目标/主机具有某种奇怪的 Vormetric 或其他安全设置(或者即使您的进程在某种奇怪的 SELinux 策略下运行),那么它可能会导致此 -ENOMEM 失败。
It's pretty unlikely to be a normal run-of-the-mill Linux/UNIX issue. You've got something non-standard going on there.
这不太可能是正常的 Linux/UNIX 问题。你在那里发生了一些非标准的事情。
回答by totaam
Have you tried using:
您是否尝试过使用:
(status,output) = commands.getstatusoutput("ps aux")
I thought this had fixed the exact same problem for me. But then my process ended up getting killed instead of failing to spawn, which is even worse..
我认为这为我解决了完全相同的问题。但是后来我的进程最终被杀死而不是无法生成,这更糟糕。
After some testing I found that this only occurred on older versions of python: it happens with 2.6.5 but not with 2.7.2
经过一些测试,我发现这只发生在旧版本的 python 上:它发生在 2.6.5 而不是 2.7.2
My search had led me here python-close_fds-issue, but unsetting closed_fds had not solved the issue. It is still well worth a read.
我的搜索使我来到这里python-close_fds-issue,但取消关闭 closed_fds 并没有解决问题。它仍然值得一读。
I found that python was leaking file descriptors by just keeping an eye on it:
我发现 python 只是通过关注它来泄漏文件描述符:
watch "ls /proc/$PYTHONPID/fd | wc -l"
Like you, I do want to capture the command's output, and I do want to avoid OOM errors... but it looks like the only way is for people to use a less buggy version of Python. Not ideal...
像您一样,我确实想捕获命令的输出,并且确实想避免 OOM 错误……但似乎唯一的方法是让人们使用错误较少的 Python 版本。不理想...
回答by vladr
As a general rule (i.e. in vanilla kernels), fork
/clone
failures with ENOMEM
occur specificallybecause of either an honest to God out-of-memory condition(dup_mm
, dup_task_struct
, alloc_pid
, mpol_dup
, mm_init
etc. croak), or because security_vm_enough_memory_mm
failed you while enforcingthe overcommit policy.
作为一般规则(即在 vanilla 内核中),fork
/clone
失败的ENOMEM
发生特别是由于对上帝诚实的内存不足情况(dup_mm
, dup_task_struct
, alloc_pid
, mpol_dup
,mm_init
等等),或者因为在执行过量使用策略时security_vm_enough_memory_mm
失败了。
Start by checking the vmsize of the process that failed to fork, at the time of the fork attempt, and then compare to the amount of free memory (physical and swap) as it relates to the overcommit policy (plug the numbers in.)
首先检查在 fork 尝试时未能 fork 的进程的 vmsize,然后与与过量使用策略相关的可用内存量(物理和交换)进行比较(插入数字。)
In your particular case, note that Virtuozzo has additional checksin overcommit enforcement. Moreover, I'm not sure how much control you truly have, from withinyour container, over swap and overcommit configuration(in order to influence the outcome of the enforcement.)
在您的特定情况下,请注意 Virtuozzo在过度使用强制执行中进行了额外检查。此外,我不确定您在容器内对交换和过度使用配置真正拥有多少控制权(以影响执行的结果。)
Now, in order to actually move forward I'd say you're left with two options:
现在,为了真正向前迈进,我想说你有两个选择:
- switch to a larger instance, or
- put some coding effort into more effectively controlling your script's memoryfootprint
- 切换到更大的实例,或
- 投入一些编码工作以更有效地控制脚本的内存占用
NOTEthat the coding effort may be all for naught if it turns out that it's not you, but some other guy collocated in a different instance on the same server as you running amock.
请注意,如果结果证明不是您,而是其他人在您运行 amock 时在同一服务器上的不同实例中并置,则编码工作可能一无所获。
Memory-wise, we already know that subprocess.Popen
uses fork
/clone
under the hood, meaning that every time you call it you're requesting once more as much memory as Python is already eating up, i.e. in the hundreds of additional MB, all in order to then exec
a puny 10kB executable such as free
or ps
. In the case of an unfavourable overcommit policy, you'll soon see ENOMEM
.
在内存方面,我们已经知道subprocess.Popen
使用fork
/clone
在后台,这意味着每次调用它时,您都再次请求与 Python 已经消耗的内存一样多的内存,即数百 MB,所有这些都是为了exec
一个微不足道的 10kB 可执行文件,例如free
或ps
。在不利的过度使用策略的情况下,您很快就会看到ENOMEM
.
Alternatives to fork
that do not have this parent page tables etc. copy problem are vfork
and posix_spawn
. But if you do not feel like rewriting chunks of subprocess.Popen
in terms of vfork
/posix_spawn
, consider using suprocess.Popen
only once, at the beginning of your script (when Python's memory footprint is minimal), to spawn a shell script that then runs free
/ps
/sleep
and whatever else in a loopparallel to your script; poll the script's output or read it synchronously, possibly from a separate thread if you have other stuff to take care of asynchronously -- do your data crunching in Python but leave the forking to the subordinate process.
fork
没有这个父页表等复制问题的替代方案是vfork
和posix_spawn
。但是,如果您不想subprocess.Popen
根据vfork
/重写大块posix_spawn
,请考虑suprocess.Popen
仅在脚本开头使用一次(当 Python 的内存占用最少时),以生成一个 shell 脚本,然后运行free
/ ps
/sleep
以及其他任何内容与您的脚本平行循环;轮询脚本的输出或同步读取它,如果您有其他需要异步处理的东西,可能从一个单独的线程读取——在 Python 中处理数据,但将分叉留给从属进程。
HOWEVER, in your particular case you can skip invoking ps
and free
altogether; that information is readily available to you in Python directly from procfs
, whether you choose to access it yourself or via existing libraries and/or packages. If ps
and free
were the only utilities you were running, then you can do away with subprocess.Popen
completely.
无论其,在您的特定情况下,你可以跳过调用ps
和free
干脆; 无论您选择自己访问还是通过现有的库和/或包访问,您都可以直接从 Python 中轻松获得procfs
这些信息。如果和是你正在运行的唯一的实用工具,那么你就可以弄死完全。ps
free
subprocess.Popen
Finally, whatever you do as far as subprocess.Popen
is concerned, if your script leaks memory you will still hit the wall eventually. Keep an eye on it, and check for memory leaks.
最后,无论你做什么subprocess.Popen
,如果你的脚本泄漏内存,你最终还是会碰壁。密切关注它,并检查内存泄漏。
回答by Nima
Looking at the output of free -m
it seems to me that you actually do not have swap memory available. I am not sure if in Linux the swap always will be available automatically on demand, but I was having the same problem and none of the answers here really helped me. Adding some swap memory however, fixed the problem in my case so since this might help other people facing the same problem, I post my answer on how to add a 1GB swap (on Ubuntu 12.04 but it should work similarly for other distributions.)
在free -m
我看来,查看输出似乎您实际上没有可用的交换内存。我不确定在 Linux 中是否总是按需自动提供交换,但我遇到了同样的问题,这里的答案都没有真正帮助我。然而,添加一些交换内存,解决了我的问题,因为这可能会帮助其他人面临同样的问题,我发布了关于如何添加 1GB 交换的答案(在 Ubuntu 12.04 上,但它应该适用于其他发行版。)
You can first check if there is any swap memory enabled.
您可以先检查是否启用了任何交换内存。
$sudo swapon -s
if it is empty, it means you don't have any swap enabled. To add a 1GB swap:
如果它为空,则表示您没有启用任何交换。要添加 1GB 交换:
$sudo dd if=/dev/zero of=/swapfile bs=1024 count=1024k
$sudo mkswap /swapfile
$sudo swapon /swapfile
Add the following line to the fstab
to make the swap permanent.
将以下行添加到fstab
以使交换永久化。
$sudo vim /etc/fstab
/swapfile none swap sw 0 0
Source and more information can be found here.
可以在此处找到来源和更多信息。
回答by serv-inc
For an easy fix, you could
为了一个简单的修复,你可以
echo 1 > /proc/sys/vm/overcommit_memory
if your're sure that your system has enough memory. See Linux over commit heuristic.
如果您确定您的系统有足够的内存。请参阅Linux over commit heuristic。