python 运行 Numpy Meshgrid 时出现内存错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2460627/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:42:37  来源:igfitidea点击:

MemoryError when running Numpy Meshgrid

pythonarraysnumpy

提问by greye

I have 8823data points with x,y coordinates. I'm trying to follow the answer on how to get a scatter dataset to be represented as a heatmapbut when I go through the

我有x,y 坐标的8823 个数据点。我正在尝试遵循有关如何将散点数据集表示为热图的答案,但是当我通过

X, Y = np.meshgrid(x, y)

X, Y = np.meshgrid(x, y)

instruction with my data arrays I get MemoryError. I am new to numpy and matplotlib and am essentially trying to run this by adapting the examples I can find.

我得到的数据数组的指令MemoryError。我是 numpy 和 matplotlib 的新手,我基本上是在尝试通过调整我能找到的例子来运行它。

Here's how I built my arrays from a file that has them stored:

下面是我如何从一个存储数组的文件构建我的数组:

XY_File = open ('XY_Output.txt', 'r')
XY = XY_File.readlines()
XY_File.close()

Xf=[]
Yf=[]
for line in XY:
    Xf.append(float(line.split('\t')[0]))
    Yf.append(float(line.split('\t')[1]))
x=array(Xf)
y=array(Yf)

Is there a problem with my arrays? This same code worked when put into this examplebut I'm not too sure.

我的阵列有问题吗?将相同的代码放入此示例中时也可以使用,但我不太确定。

Why am I getting this MemoryError and how can I fix this?

为什么我会收到此 MemoryError 以及如何解决此问题?

回答by Andrew Jaffe

Your call to meshgridrequires a lot of memory -- it produces two 8823*8823 floating point arrays. Each of them are about 0.6 GB.

您的调用meshgrid需要大量内存——它产生两个 8823*8823 浮点数组。他们每个人都大约 0.6 GB。

But your screen can't show (and your eye can't really process) that much information anyway, so you should probably think of a way to smooth your data to something more reasonable like 1024*1024 before you do this step.

但是无论如何,您的屏幕无法显示(并且您的眼睛无法真正处理)那么多信息,因此在执行此步骤之前,您可能应该想办法将数据平滑为更合理的值,例如 1024*1024。

回答by jtaylor

in numpy 1.7.0 and newer meshgridhas the sparsekeyword argument. A sparse meshgrid is setup so it broadcasts to a full meshgrid when used. This can save large amounts of memory e.g. when using the meshgrid to index arrays.

在 numpy 1.7.0 和更新版本中meshgridsparse关键字参数。设置了一个稀疏网格,因此它在使用时广播到一个完整的网格。这可以节省大量内存,例如在使用网格网格索引数组时。

In [2]: np.meshgrid(np.arange(10), np.arange(10), sparse=True)
Out[2]: 
[array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]), array([[0],
    [1],
    [2],
    [3],
    [4],
    [5],
    [6],
    [7],
    [8],
    [9]])]

Another option is to use smaller integers that are still able to represent the range:

另一种选择是使用仍然能够表示范围的较小整数:

np.meshgrid(np.arange(10).astype(np.int8), np.arange(10).astype(np.int8),
            sparse=True, copy=False)

though as of numpy 1.9 using these smaller integers for indexing will be slower as they will internally be converted back to larger integers in small (np.setbufsize sized) chunks.

尽管从 numpy 1.9 开始,使用这些较小的整数进行索引会变慢,因为它们将在内部转换回较小的(np.setbufsize 大小)块中的较大整数。

回答by Charlie Lee

When you call np.meshgrid for scatter figure, you need to normalize your data if it is too large to process, try this module

当你调用 np.meshgrid 进行散点图时,如果数据太大而无法处理,则需要对数据进行归一化,试试这个模块

    # Feature Scaling
from sklearn.preprocessing import StandardScaler
st = StandardScaler()
X = st.fit_transform(X)