Python 具有大量数据的散点图

Question

提问by Nicola Vianello

I would like to use Matplotlibto generate a scatter plot with a huge amount of data (about 3 million points). Actually I've 3 vectors with the same dimension and I use to plot in the following way.

我想使用Matplotlib生成一个包含大量数据（约 300 万个点）的散点图。实际上，我有 3 个具有相同维度的向量，我使用以下方式进行绘图。

import matplotlib.pyplot as plt
import numpy as np
from numpy import *
from matplotlib import rc
import pylab
from pylab import * 
fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
plt.scatter(delta,vf,c=dS,alpha=0.7,cmap=cm.Paired)

Nothing special actually. But it takes too long to generate it actually (I'm working on my MacBook Pro 4 GB RAM with Python 2.7 and Matplotlib 1.0). Is there any way to improve the speed?

其实没什么特别的。但是实际生成它需要很长时间（我正在使用 Python 2.7 和 Matplotlib 1.0 在我的 MacBook Pro 4 GB RAM 上工作）。有什么办法可以提高速度吗？

Answer 1

采纳答案by Paul

You could take the heatmap approach shown here. In this example the color represents the quantity of data in the bin, not the median value of the dS array, but that should be easy to change. More later if you are interested.

您可以采用此处显示的热图方法。在这个例子中，颜色代表 bin 中的数据量，而不是 dS 数组的中值，但这应该很容易改变。以后有兴趣再补充。

Answer 2

回答by unutbu

Unless your graphic is huge, many of those 3 million points are going to overlap. (A 400x600 image only has 240K dots...)

除非您的图形很大，否则这 300 万个点中的许多点都会重叠。（一个 400x600 的图像只有 240K 点...）

So the easiest thing to do would be to take a sample of say, 1000 points, from your data:

因此，最简单的方法是从您的数据中抽取 1000 个点的样本：

import random
delta_sample=random.sample(delta,1000)

and just plot that.

并绘制它。

For example:

例如：

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import random

fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)

N=3*10**6
delta=np.random.normal(size=N)
vf=np.random.normal(size=N)
dS=np.random.normal(size=N)

idx=random.sample(range(N),1000)

plt.scatter(delta[idx],vf[idx],c=dS[idx],alpha=0.7,cmap=cm.Paired)
plt.show()

alt text

替代文字

Or, if you need to pay more attention to outliers, then perhaps you could bin your data using np.histogram, and then compose a delta_samplewhich has representatives from each bin.

或者，如果您需要更多地关注异常值，那么也许您可以使用对数据进行分箱np.histogram，然后组合一个delta_sample包含来自每个分箱的代表。

Unfortunately, when using np.histogramI don't think there is any easy way to associate bins with individual data points. A simple, but approximate solution is to use the location of a point in or on the bin edge itself as a proxy for the points in it:

不幸的是，在使用时，np.histogram我认为没有任何简单的方法可以将 bin 与单个数据点相关联。一个简单但近似的解决方案是使用 bin 边缘内或上的点的位置作为其中点的代理：

xedges=np.linspace(-10,10,100)
yedges=np.linspace(-10,10,100)
zedges=np.linspace(-10,10,10)
hist,edges=np.histogramdd((delta,vf,dS), (xedges,yedges,zedges))
xidx,yidx,zidx=np.where(hist>0)
plt.scatter(xedges[xidx],yedges[yidx],c=zedges[zidx],alpha=0.7,cmap=cm.Paired)
plt.show()

alt text

替代文字

Answer 3

回答by conjectures

What about trying pyplot.hexbin? It generates a sort of heatmap based on point density in a set number of bins.

尝试pyplot.hexbin怎么样？它根据一定数量的 bin 中的点密度生成一种热图。

Python 具有大量数据的散点图

提问by Nicola Vianello

采纳答案by Paul

回答by unutbu

回答by conjectures

相关推荐

最近更新

标签

Python 具有大量数据的散点图

提问by Nicola Vianello

采纳答案by Paul

回答by unutbu

回答by conjectures

相关推荐

将已安装的 Python 包作为脚本执行？

Python：溢出错误：数学范围错误

如何在发送给 Google 的查询之间添加随机延迟以避免在 python 中被阻止

如何在python中声明零数组（或特定大小的数组）

相关推荐

最近更新

标签