Python 如何避免[Errno 12] 使用子进程模块导致无法分配内存错误

Question

提问by Paul

Complete Working Test Case

完整的工作测试用例

Of course depending on your memory on the local and remote machines your array sizes will be different.

当然，根据您在本地和远程机器上的内存，您的阵列大小会有所不同。

z1 = numpy.random.rand(300000000,2);
for i in range(1000):
  print('*******************************************\n'); 
  direct_output = subprocess.check_output('ssh blah@blah "ls /"', shell=True);
  direct_output = 'a'*1200000; 
  a2 = direct_output*10;
  print(len(direct_output));

Current Use Case

当前用例

In case it helps my use case is as follows:

如果它有助于我的用例如下：

I issue db queries then store the resulting tables on the remote machine. I then want to transfer them across a network and do analysis. Thus far I have been doing something like the following in python:

我发出数据库查询，然后将结果表存储在远程机器上。然后我想通过网络传输它们并进行分析。到目前为止，我一直在用 python 做类似下面的事情：

#run a bunch of queries before hand with the results in remote files

....
counter = 0
mergedDataFrame = None
while NotDone:
  output = subprocess.check_output('ssh blah@blah cat /data/file%08d'%(counter))
  data = pandas.read_csv(...)
  #do lots of analysis, append, merge, numpy stuff etc...
  mergedDataFrame = pandas.merge(...)
  counter += 1

At some point I receive the following error at the check_output command: [Errno 12] Cannot allocate memory

在某些时候，我在 check_output 命令中收到以下错误：[Errno 12] 无法分配内存

Background

背景

Thanks to the below questions I think I have an idea of what is wrong. There are a number of solutions posted, and I am trying to determine which of the solutions will avoid the [Errno 12] Cannot allocate memory error associated with the subprocess implementation using fork/clone.

由于以下问题，我想我知道出了什么问题。发布了许多解决方案，我正在尝试确定哪些解决方案将避免 [Errno 12] 无法分配与使用 fork/clone 的子进程实现相关的内存错误。

Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory"This gives the underlying diagnosis and suggests some workaround like spawning separate script etc...

Python subprocess.Popen "OSError: [Errno 12] 无法分配内存"这给出了底层诊断并建议了一些解决方法，例如生成单独的脚本等......

Understanding Python fork and memory allocation errorsSuggests using rfoo to circumvent the subprocess limitation of fork/clone and spawning child process and copy memory etc... This seems to imply a client-server model

了解 Python fork 和内存分配错误建议使用 rfoo 来规避 fork/clone 的子进程限制并生成子进程和复制内存等......这似乎暗示了客户端 - 服务器模型

What is the simplest way to SSH using Python?, but I have the additional constraints that I cannot use subprocess due to memory limitations and fork/clone implementation? The solutions suggests using paramiko or something built on top of it, others suggest subprocess (which I have found will not work in my case).

使用 Python 进行 SSH 的最简单方法是什么？，但由于内存限制和 fork/clone 实现，我还有其他限制无法使用子进程？解决方案建议使用 paramiko 或建立在它之上的东西，其他人建议使用子流程（我发现这在我的情况下不起作用）。

There were other similar questions but the answers often talked about file descriptors being the culprit (in this case they are not), adding more RAM to the system ( I cannot do this), upgrading to x64 ( I already am on x64). Some hint at the problem of ENOMEM. A few answers mention trying to determine if the subprocess.Popen (in my case check_output) is not properly cleaning the processes, but it looks like S. Lott and others agree that the subprocess code itself is properly cleaning up.

还有其他类似的问题，但答案经常谈到文件描述符是罪魁祸首（在这种情况下它们不是），向系统添加更多 RAM（我不能这样做），升级到 x64（我已经在 x64 上）。对 ENOMEM 问题的一些暗示。一些答案提到试图确定 subprocess.Popen（在我的例子中为 check_output）是否没有正确清理进程，但看起来 S. Lott 和其他人同意子进程代码本身正在正确清理。

I have searched through the source code on github https://github.com/paramiko/paramiko/search?q=Popen&type=Codeand it appears to use subprocess in the proxy.py file.

我已经搜索了 github https://github.com/paramiko/paramiko/search?q=Popen&type=Code上的源代码，它似乎在 proxy.py 文件中使用了子进程。

Actual Questions

实际问题

Does this mean that ultimately paramiko is using the Popen solution described above that will have problems when the python memory footprint grows and repeated Popen calls are made due to the clone/fork implementation?

这是否意味着 paramiko 最终会使用上面描述的 Popen 解决方案，当 python 内存占用增加并且由于克隆/分叉实现而重复 Popen 调用时会出现问题？

If paramiko will not work is there another way to do what I am looking for with a client side only solution? Or will a client/server/socket solution be needed? If so will any of rfoo, tornado, or zeromq, http transfers work here?

如果 paramiko 不起作用，是否有另一种方法可以使用仅客户端的解决方案来完成我正在寻找的工作？或者是否需要客户端/服务器/套接字解决方案？如果是这样，rfoo、tornado 或 zeromq、http 传输中的任何一个都可以在这里工作吗？

NotesI am running 64bit linux 8GB main memory. I do not want to pursue the options of buying more RAM.

备注我正在运行 64 位 linux 8GB 主内存。我不想追求购买更多内存的选择。

Answer 1

回答by dstromberg

This should do it:

这应该这样做：

http://docs.python.org/3.3/library/subprocess.html#replacing-os-popen-os-popen2-os-popen3

That way, you can read lines or blocks, instead of the entire thing at once.

这样，您可以读取行或块，而不是一次读取整个内容。

Answer 2

回答by dstromberg

If you're running out of memory, it's probably because subprocess is trying to read too much into memory at the same time. The solution, other than using a redirect to a local file, is probably to use popen-like functionality with an stdin/stdout pair that can be read from a little at a time.

如果内存不足，可能是因为子进程试图同时将太多内容读入内存。除了使用重定向到本地文件之外，解决方案可能是使用类似 popen 的功能以及可以一次读取一点的 stdin/stdout 对。

Answer 3

回答by Nima

If you are running out of memory, you may want to increase your swap memory. Or you might have no swap enabled at all. In Ubuntu (it should work for other distributions as well) you can check your swap by:

如果内存不足，您可能需要增加交换内存。或者您可能根本没有启用交换。在 Ubuntu（它也适用于其他发行版）中，您可以通过以下方式检查您的交换：

$sudo swapon -s

if it is empty it means you don't have any swap enabled. To add a 1GB swap:

如果它为空，则表示您没有启用任何交换。要添加 1GB 交换：

$sudo dd if=/dev/zero of=/swapfile bs=1024 count=1024k
$sudo mkswap /swapfile
$sudo swapon /swapfile

Add the following line to the fstabto make the swap permanent.

将以下行添加到fstab以使交换永久化。

$sudo vim /etc/fstab

     /swapfile       none    swap    sw      0       0

Source and more information can be found here.

可以在此处找到来源和更多信息。

Python 如何避免[Errno 12] 使用子进程模块导致无法分配内存错误

提问by Paul

回答by dstromberg

回答by dstromberg

回答by Nima

相关推荐

最近更新

标签

Python 如何避免[Errno 12] 使用子进程模块导致无法分配内存错误

提问by Paul

回答by dstromberg

回答by dstromberg

回答by Nima

相关推荐

使用python将XLSX正确转换为CSV

如何在 Python 中复制像元组这样的不可变对象？

Python 导入错误：导入 sklearn.mixture 时无法导入名称选择

Python 如何在 matplotlib 中制作按密度着色的散点图？

相关推荐

最近更新

标签