为什么带有列表“append()”的 Python“内存错误”还有大量 RAM

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4441947/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 15:45:28  来源:igfitidea点击:

Why Python `Memory Error` with list `append()` lots of RAM left

pythonlistmemory

提问by Pete

I am building a large data dictionary from a set of text files. As I read in the lines and process them, I append(dataline)to a list.

我正在从一组文本文件构建一个大型数据字典。当我阅读并处理它们时,我append(dataline)列出了一个列表。

At some point the append()generates a Memory Errorexception. However, watching the program run in the Windows Task Manager, at the point of the crash I see 4.3 GB available and 1.1 GB free.

在某些时候会append()产生Memory Error异常。但是,在 Windows 任务管理器中观察程序运行时,在崩溃时我看到 4.3 GB 可用空间和 1.1 GB 空闲空间。

Thus, I do not understand the reason for the exception.

因此,我不明白异常的原因。

Python version is 2.6.6. I guess, the only reason is that it is not able to use more of the available RAM. If this is so, is it possible to increase the allocation?

Python 版本是 2.6.6。我想,唯一的原因是它无法使用更多的可用 RAM。如果是这样,是否可以增加分配?

采纳答案by NPE

If you're using a 32-bit build of Python, you might want to try a 64-bit version.

如果您使用的是 32 位版本的 Python,您可能想尝试 64 位版本。

It is possible for a process to address at most 4GB of RAM using 32-bit addresses, but typically (depending on the OS), one gets much less. It sounds like your Python process may be hitting this limit. 64-bit addressing removes this limitation.

一个进程可以使用 32 位地址寻址最多 4GB 的 RAM,但通常(取决于操作系统),一个地址要少得多。听起来您的 Python 进程可能会达到此限制。64 位寻址消除了这一限制。

editSince you're asking about Windows, the following page is of relevance: Memory Limits for Windows Releases. As you can see, the limit per 32-bit process is 2, 3 or 4GB depending on the OS version and configuration.

编辑由于您询问的是 Windows,因此以下页面具有相关性:Windows Releases 的内存限制。如您所见,每个 32 位进程的限制为 2、3 或 4GB,具体取决于操作系统版本和配置。

回答by nmichaels

If you're open to restructuring the code instead of throwing more memory at it, you might be able to get by with this:

如果您愿意重构代码而不是为其投入更多内存,那么您可能可以通过以下方式解决:

data = (processraw(raw) for raw in lines)

where linesis either a list of lines or file.xreadlines()or similar.

其中lines是行列表file.xreadlines()或类似列表。

回答by Stu

I had a similar problem using a 32-bit version of python in a 64-bit windows environment. I tried the 64-bit windows version of python and very quickly ran into troubles with the Scipy libraries compiled for 64-bit windows.

我在 64 位 Windows 环境中使用 32 位版本的 python 时遇到了类似的问题。我尝试了 64 位 Windows 版本的 python,很快就遇到了为 64 位 Windows 编译的 Scipy 库的问题。

The totally free solution that I implemented was

我实施的完全免费的解决方案是

1) Install VirtualBox
2) Install CentOS 5.6 on the VM
3) Get the Enthought Python Distribution (Free 64 bit Linux Version).

1) 安装 VirtualBox
2) 在 VM 上安装 CentOS 5.6
3) 获取 Enthought Python 发行版(免费 64 位 Linux 版本)。

Now all of my Numpy, Scipy, and Matplotlib dependant python code can use as much memory as I have Ram and available Linux swap.

现在,我所有依赖于 Numpy、Scipy 和 Matplotlib 的 Python 代码都可以使用与 Ram 和可用 Linux 交换区一样多的内存。

回答by amaatouq

As its been already mentioned, you'll need a python64 bit (of a 64-bit version of windows).

正如已经提到的,您将需要一个 python64 位(Windows 的 64 位版本)。

Be aware that you'll probably face a lot of conflicts and problems with some of the basic packages you might want to work with. to avoid this problem I'd recommend Anacondafrom Continuum Analytics. I'd advice you to look into it :)

请注意,您可能会遇到与您可能想要使用的一些基本包的许多冲突和问题。为了避免这个问题,我推荐Continuum Analytics 的Anaconda。我建议你研究一下:)

回答by drevicko

I had a similar problem happening when evaluating an expression containing large numpyarrays (actually, one was sparse). I was doing this on a machine with 64GB of memory, of which only about 8GB was in use, so was surprised to get the MemoryError.

在评估包含大numpy数组(实际上,一个是稀疏数组)的表达式时,我遇到了类似的问题。我是在一台有 64GB 内存的机器上做这个的,其中只有大约 8GB 正在使用,所以很惊讶得到MemoryError.

It turned out that my problem was array shape broadcasting: I had inadvertently duplicated a large dimension.

原来我的问题是数组形状广播:我无意中复制了一个大维度。

It went something like this:

它是这样的:

  • I had passed an array with shape (286577, 1)where I was expecting (286577).
  • This was subracted from an array with shape (286577, 130).
  • Because I was expecting (286577), I applied [:,newaxis]in the expression to bring it to (286577,1)so it would be broadcast to (286577,130).
  • When I passed shape (286577,1)however, [:,newaxis]produced shape (286577,1,1)and the two arrays were both broadcast to shape (286577,286577,130)... of doubles. With two such arrays, that comes to about 80GB!
  • 我已经通过了一个形状的数组(286577, 1),我期待(286577)
  • 这是从一个形状为 的数组中减去的(286577, 130)
  • 因为我在期待(286577),所以我[:,newaxis]在表达式中应用了它,(286577,1)以便将它广播到(286577,130).
  • (286577,1)然而,当我通过 shape时,[:,newaxis]产生了 shape(286577,1,1)并且两个数组都被广播到(286577,286577,130)了双打的形状。使用两个这样的阵列,大约有 80GB!