Python Numpy 内存错误创建巨大的矩阵

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19085012/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:49:18  来源:igfitidea点击:

Numpy memory error creating huge matrix

pythonmemorynumpy

提问by Salvador Dali

I am using numpy and trying to create a huge matrix. While doing this, I receive a memory error

我正在使用 numpy 并尝试创建一个巨大的矩阵。执行此操作时,我收到内存错误

Because the matrix is not important, I will just show the way how to easily reproduce the error.

因为矩阵并不重要,我将只展示如何轻松重现错误的方法。

a = 10000000000
data = np.array([float('nan')] * a)

not surprisingly, this throws me MemoryError

毫不奇怪,这让我感到 MemoryError

There are two things I would like to tell:

有两件事我想说:

  1. I really need to create and to use a big matrix
  2. I think I have enough RAM to handle this matrix (I have 24 Gb or RAM)
  1. 我真的需要创建和使用一个大矩阵
  2. 我想我有足够的 RAM 来处理这个矩阵(我有 24 Gb 或 RAM)

Is there an easy way to handle big matrices in numpy?

有没有一种简单的方法来处理 numpy 中的大矩阵?

Just to be on the safe side, I previously read these posts (which sounds similar):

为了安全起见,我以前读过这些帖子(听起来很相似):

Very large matrices using Python and NumPy

使用 Python 和 NumPy 的超大矩阵

Python/Numpy MemoryError

Python/Numpy 内存错误

Processing a very very big data set in python - memory error

在python中处理一个非常非常大的数据集-内存错误

P.S. apparently I have some problems with multiplication and divisionof numbers, which made me think that I have enough memory. So I think it is time for me to go to sleep, review math and may be to buy some memory.

PS显然我在数字的乘法和除法方面遇到了一些问题,这让我认为我有足够的内存。所以我想是时候去睡觉了,复习数学,可能是买一些内存。

May be during this time some genius might come up with idea how to actually create this matrix using only 24 Gb of Ram.

可能在这段时间里,一些天才可能会想出如何仅使用 24 Gb 的 Ram 来实际创建这个矩阵的想法。

Why I need this big matrixI am not going to do any manipulations with this matrix. All I need to do with it is to save it into pytables.

为什么我需要这个大矩阵我不会对这个矩阵做任何操作。我需要做的就是将它保存到pytables 中

采纳答案by Eric Urban

Assuming each floating point number is 4 bytes each, you'd have

假设每个浮点数是 4 个字节,你有

(10000000000 * 4) /(2**30.0) = 37.25290298461914

Or 37.5 gigabytes you need to store in memory. So I don't think 24gb of RAM is enough.

或者您需要在内存中存储 37.5 GB。所以我认为 24​​GB 的 RAM 是不够的。

回答by Tigran Saluev

If you can't afford creating such a matrix, but still wish to do some computations, try sparse matrices.

如果您负担不起创建这样的矩阵,但仍希望进行一些计算,请尝试使用sparse matrices

If you wish to pass it to another Python package that uses duck typing, you may create your own class with __getitem__implementing dummy access.

如果您希望将它传递给另一个使用duck typing 的Python 包,您可以创建自己的类并__getitem__实现虚拟访问。

回答by Ottoman Empire

If you use pycharm editor for python you can change memory settings from

如果您为 python 使用 pycharm 编辑器,则可以从

C:\Program Files\JetBrains\PyCharm 2018.2.4\bin\pycharm64.exe.vmoptions

C:\Program Files\JetBrains\PyCharm 2018.2.4\bin\pycharm64.exe.vmoptions

you can decrease pycharm speed from this file so your program memory will allocate more megabites you must edit this codes

你可以从这个文件中降低 pycharm 的速度,这样你的程序内存就会分配更多的兆位你必须编辑这个代码

-Xms1024m
-Xmx2048m
-XX:ReservedCodeCacheSize=960m

so you can make them -Xms512m -Xmx1024m and finally your program will work but it'll affect the debugging performance in pycharm.

所以你可以让它们 -Xms512m -Xmx1024m 最终你的程序可以运行,但它会影响 pycharm 中的调试性能。