Python:从 numpy 矩阵创建二维直方图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27156381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: Creating a 2D histogram from a numpy matrix
提问by Kestrel
I'm new to python.
我是python的新手。
I have a numpy matrix, of dimensions 42x42, with values in the range 0-996. I want to create a 2D histogram using this data. I've been looking at tutorials, but they all seem to show how to create 2D histograms from random data and not a numpy matrix.
我有一个 numpy 矩阵,尺寸为 42x42,值在 0-996 范围内。我想使用这些数据创建一个二维直方图。我一直在看教程,但它们似乎都展示了如何从随机数据而不是 numpy 矩阵创建 2D 直方图。
So far, I have imported:
到目前为止,我已经导入了:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
I'm not sure if these are correct imports, I'm just trying to pick up what I can from tutorials I see.
我不确定这些是否是正确的导入,我只是想从我看到的教程中学到我能做的。
I have the numpy matrix Mwith all of the values in it (as described above). In the end, i want it to look something like this:
我有一个M包含所有值的 numpy 矩阵(如上所述)。最后,我希望它看起来像这样:


obviously, my data will be different, so my plot should look different. Can anyone give me a hand?
显然,我的数据会有所不同,所以我的情节应该有所不同。谁能帮我一把?
Edit:For my purposes, Hooked's example below, using matshow, is exactly what I'm looking for.
编辑:出于我的目的,下面使用 matshow 的Hooked示例正是我正在寻找的。
采纳答案by Hooked
If you have the raw data from the counts, you could use plt.hexbinto create the plots for you (IMHO this is better than a square lattice): Adapted from the example of hexbin:
如果您有计数的原始数据,您可以用来plt.hexbin为您创建绘图(恕我直言,这比方形格子更好):改编自以下示例hexbin:
import numpy as np
import matplotlib.pyplot as plt
n = 100000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
plt.hexbin(x,y)
plt.show()


If you already have the Z-values in a matrix as you mention, just use plt.imshowor plt.matshow:
如果您提到的矩阵中已经有 Z 值,只需使用plt.imshow或plt.matshow:
XB = np.linspace(-1,1,20)
YB = np.linspace(-1,1,20)
X,Y = np.meshgrid(XB,YB)
Z = np.exp(-(X**2+Y**2))
plt.imshow(Z,interpolation='none')


回答by unutbu
If you have not only the 2D histogram matrix but also the underlying (x, y)data, then you could make a scatter plot of the (x, y)points and color each point according to its binned count value in the 2D-histogram matrix:
如果您不仅有 2D 直方图矩阵,还有基础(x, y)数据,那么您可以制作点的散点图,(x, y)并根据其在 2D 直方图矩阵中的分箱计数值为每个点着色:
import numpy as np
import matplotlib.pyplot as plt
n = 10000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
xedges, yedges = np.linspace(-4, 4, 42), np.linspace(-25, 25, 42)
hist, xedges, yedges = np.histogram2d(x, y, (xedges, yedges))
xidx = np.clip(np.digitize(x, xedges), 0, hist.shape[0]-1)
yidx = np.clip(np.digitize(y, yedges), 0, hist.shape[1]-1)
c = hist[xidx, yidx]
plt.scatter(x, y, c=c)
plt.show()


回答by TheoryX
@unutbu's answercontains a mistake: xidxand yidxare calculated the wrong way (at least on my data sample). The correct way should be:
@unutbu 的回答包含一个错误:xidx并且yidx以错误的方式计算(至少在我的数据样本中)。正确的方法应该是:
xidx = np.clip(np.digitize(x, xedges) - 1, 0, hist.shape[0] - 1)
yidx = np.clip(np.digitize(y, yedges) - 1, 0, hist.shape[1] - 1)
As the return dimension of np.digitizethat we are interested in is between 1and len(xedges) - 1, but the c = hist[xidx, yidx]needs indices between 0and hist.shape - 1.
由于np.digitize我们感兴趣的返回维度是介于1和之间len(xedges) - 1,但c = hist[xidx, yidx]需要介于0和之间的索引hist.shape - 1。
Below is the comparison of results. As you can see you get similar but not the same result.
下面是结果的比较。如您所见,您得到了相似但不相同的结果。
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
n = 10000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
xedges, yedges = np.linspace(-4, 4, 42), np.linspace(-25, 25, 42)
hist, xedges, yedges = np.histogram2d(x, y, (xedges, yedges))
xidx = np.clip(np.digitize(x, xedges), 0, hist.shape[0] - 1)
yidx = np.clip(np.digitize(y, yedges), 0, hist.shape[1] - 1)
c = hist[xidx, yidx]
old = ax1.scatter(x, y, c=c, cmap='jet')
xidx = np.clip(np.digitize(x, xedges) - 1, 0, hist.shape[0] - 1)
yidx = np.clip(np.digitize(y, yedges) - 1, 0, hist.shape[1] - 1)
c = hist[xidx, yidx]
new = ax2.scatter(x, y, c=c, cmap='jet')
plt.show()
回答by farenorth
I'm a big fan of the 'scatter histogram', but I don't think the other solutions fully do them justice. Here is a functionthat implements them. The major advantage of this function compared to the other solutions is that it sorts the points by the hist data (see the modeargument). This means that the result looks more like a traditional histogram (i.e., you don't get the chaotic overlap of markers in different bins).

我是“散点直方图”的忠实粉丝,但我认为其他解决方案并不能完全正确对待它们。这是一个实现它们的函数。与其他解决方案相比,此函数的主要优点是它根据历史数据对点进行排序(请参阅mode参数)。这意味着结果看起来更像传统的直方图(即,您不会在不同的 bin 中得到标记的混乱重叠)。

MCVE for this figure (using my function):
此图的 MCVE(使用我的函数):
import numpy as np
import matplotlib.pyplot as plt
from hist_scatter import scatter_hist2d
fig = plt.figure(figsize=[5, 4])
ax = plt.gca()
x = randgen.randn(npoint)
y = 2 + 3 * x + 4 * randgen.randn(npoint)
scat = scatter_hist2d(x, y,
bins=[np.linspace(-4, 4, 42),
np.linspace(-25, 25, 42)],
s=5,
cmap=plt.get_cmap('viridis'))
ax.axhline(0, color='k', linestyle='--', zorder=3, linewidth=0.5)
ax.axvline(0, color='k', linestyle='--', zorder=3, linewidth=0.5)
plt.colorbar(scat)
Room for improvement?
改进的空间?
The primary drawback of this approach is that the points in the densest areas overlap the points in lower density areas, leading to somewhat of a misrepresentation of the areas of each bin. I spent quite a bit of time exploring two approaches for resolving this:
这种方法的主要缺点是最密集区域中的点与较低密度区域中的点重叠,从而导致对每个 bin 的区域的某种程度的错误表示。我花了很多时间探索解决这个问题的两种方法:
1) using smaller markers for higher density bins
1) 对更高密度的 bin 使用较小的标记
2) applying a 'clipping' mask to each bin
2)对每个垃圾箱应用“剪裁”蒙版
The first one gives resultsthat are way too crazy. The second one looks nice -- especially if you only clip bins that have >~20 points -- but it is extremely slow(this figuretook about a minute).
第一个给出的结果太疯狂了。第二个看起来不错——特别是如果你只剪辑 >~20 点的垃圾箱——但它非常慢(这个数字大约需要一分钟)。
So, ultimately I've decidedthat by carefully selecting the marker size and bin size (sand bins), you can get results that are visually pleasing and not too bad in terms of misrepresenting the data. After all, these 2D histograms are usually intended to be visual aids to the underlying data, not strictly quantitative representations of it. Therefore, I think this approach is far superior to 'traditional 2D histograms' (e.g., plt.hist2dor plt.hexbin), and I presume that if you've found this page you're also not a fan of traditional (single color) scatter plots.
因此,最终我决定通过仔细选择标记大小和 bin 大小(s和bins),您可以获得视觉上令人愉悦的结果,并且在歪曲数据方面不会太糟糕。毕竟,这些 2D 直方图通常旨在为基础数据提供视觉辅助,而不是对其进行严格的定量表示。因此,我认为这种方法远优于“传统的 2D 直方图”(例如,plt.hist2d或plt.hexbin),并且我认为如果您找到了此页面,那么您也不是传统(单色)散点图的粉丝。
If I were king of science, I'd make sure all 2D histograms did something like this for the rest of forever.
如果我是科学之王,我会确保所有 2D 直方图永远都做这样的事情。

