Python：从 numpy 矩阵创建二维直方图

Question

提问by Kestrel

I'm new to python.

我是python的新手。

I have a numpy matrix, of dimensions 42x42, with values in the range 0-996. I want to create a 2D histogram using this data. I've been looking at tutorials, but they all seem to show how to create 2D histograms from random data and not a numpy matrix.

我有一个 numpy 矩阵，尺寸为 42x42，值在 0-996 范围内。我想使用这些数据创建一个二维直方图。我一直在看教程，但它们似乎都展示了如何从随机数据而不是 numpy 矩阵创建 2D 直方图。

So far, I have imported:

到目前为止，我已经导入了：

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors

I'm not sure if these are correct imports, I'm just trying to pick up what I can from tutorials I see.

我不确定这些是否是正确的导入，我只是想从我看到的教程中学到我能做的。

I have the numpy matrix Mwith all of the values in it (as described above). In the end, i want it to look something like this:

我有一个M包含所有值的 numpy 矩阵（如上所述）。最后，我希望它看起来像这样：

2D histogram

二维直方图

obviously, my data will be different, so my plot should look different. Can anyone give me a hand?

显然，我的数据会有所不同，所以我的情节应该有所不同。谁能帮我一把？

Edit:For my purposes, Hooked's example below, using matshow, is exactly what I'm looking for.

编辑：出于我的目的，下面使用 matshow 的Hooked示例正是我正在寻找的。

Answer 1

采纳答案by Hooked

If you have the raw data from the counts, you could use plt.hexbinto create the plots for you (IMHO this is better than a square lattice): Adapted from the example of hexbin:

如果您有计数的原始数据，您可以用来plt.hexbin为您创建绘图（恕我直言，这比方形格子更好）：改编自以下示例hexbin：

import numpy as np
import matplotlib.pyplot as plt

n = 100000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
plt.hexbin(x,y)

plt.show()

enter image description here

在此处输入图片说明

If you already have the Z-values in a matrix as you mention, just use plt.imshowor plt.matshow:

如果您提到的矩阵中已经有 Z 值，只需使用plt.imshow或plt.matshow：

XB = np.linspace(-1,1,20)
YB = np.linspace(-1,1,20)
X,Y = np.meshgrid(XB,YB)
Z = np.exp(-(X**2+Y**2))
plt.imshow(Z,interpolation='none')

enter image description here

在此处输入图片说明

Answer 2

回答by unutbu

If you have not only the 2D histogram matrix but also the underlying (x, y)data, then you could make a scatter plot of the (x, y)points and color each point according to its binned count value in the 2D-histogram matrix:

如果您不仅有 2D 直方图矩阵，还有基础(x, y)数据，那么您可以制作点的散点图，(x, y)并根据其在 2D 直方图矩阵中的分箱计数值为每个点着色：

import numpy as np
import matplotlib.pyplot as plt

n = 10000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
xedges, yedges = np.linspace(-4, 4, 42), np.linspace(-25, 25, 42)
hist, xedges, yedges = np.histogram2d(x, y, (xedges, yedges))
xidx = np.clip(np.digitize(x, xedges), 0, hist.shape[0]-1)
yidx = np.clip(np.digitize(y, yedges), 0, hist.shape[1]-1)
c = hist[xidx, yidx]
plt.scatter(x, y, c=c)

plt.show()

Example scatter plot of 2D histogram

二维直方图散点图示例

Answer 3

回答by TheoryX

@unutbu's answercontains a mistake: xidxand yidxare calculated the wrong way (at least on my data sample). The correct way should be:

@unutbu 的回答包含一个错误：xidx并且yidx以错误的方式计算（至少在我的数据样本中）。正确的方法应该是：

xidx = np.clip(np.digitize(x, xedges) - 1, 0, hist.shape[0] - 1)
yidx = np.clip(np.digitize(y, yedges) - 1, 0, hist.shape[1] - 1)

As the return dimension of np.digitizethat we are interested in is between 1and len(xedges) - 1, but the c = hist[xidx, yidx]needs indices between 0and hist.shape - 1.

由于np.digitize我们感兴趣的返回维度是介于1和之间len(xedges) - 1，但c = hist[xidx, yidx]需要介于0和之间的索引hist.shape - 1。

Below is the comparison of results. As you can see you get similar but not the same result.

下面是结果的比较。如您所见，您得到了相似但不相同的结果。

import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)

n = 10000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
xedges, yedges = np.linspace(-4, 4, 42), np.linspace(-25, 25, 42)
hist, xedges, yedges = np.histogram2d(x, y, (xedges, yedges))

xidx = np.clip(np.digitize(x, xedges), 0, hist.shape[0] - 1)
yidx = np.clip(np.digitize(y, yedges), 0, hist.shape[1] - 1)
c = hist[xidx, yidx]
old = ax1.scatter(x, y, c=c, cmap='jet')

xidx = np.clip(np.digitize(x, xedges) - 1, 0, hist.shape[0] - 1)
yidx = np.clip(np.digitize(y, yedges) - 1, 0, hist.shape[1] - 1)

c = hist[xidx, yidx]
new = ax2.scatter(x, y, c=c, cmap='jet')


plt.show()

Answer 4

回答by farenorth

I'm a big fan of the 'scatter histogram', but I don't think the other solutions fully do them justice. Here is a functionthat implements them. The major advantage of this function compared to the other solutions is that it sorts the points by the hist data (see the modeargument). This means that the result looks more like a traditional histogram (i.e., you don't get the chaotic overlap of markers in different bins).

我是“散点直方图”的忠实粉丝，但我认为其他解决方案并不能完全正确对待它们。这是一个实现它们的函数。与其他解决方案相比，此函数的主要优点是它根据历史数据对点进行排序（请参阅mode参数）。这意味着结果看起来更像传统的直方图（即，您不会在不同的 bin 中得到标记的混乱重叠）。

MCVE for this figure (using my function):

此图的 MCVE（使用我的函数）：

import numpy as np
import matplotlib.pyplot as plt
from hist_scatter import scatter_hist2d

fig = plt.figure(figsize=[5, 4])
ax = plt.gca()

x = randgen.randn(npoint)
y = 2 + 3 * x + 4 * randgen.randn(npoint)

scat = scatter_hist2d(x, y,
                      bins=[np.linspace(-4, 4, 42),
                            np.linspace(-25, 25, 42)],
                      s=5,
                      cmap=plt.get_cmap('viridis'))
ax.axhline(0, color='k', linestyle='--', zorder=3, linewidth=0.5)
ax.axvline(0, color='k', linestyle='--', zorder=3, linewidth=0.5)
plt.colorbar(scat)

Room for improvement?

改进的空间？

The primary drawback of this approach is that the points in the densest areas overlap the points in lower density areas, leading to somewhat of a misrepresentation of the areas of each bin. I spent quite a bit of time exploring two approaches for resolving this:

这种方法的主要缺点是最密集区域中的点与较低密度区域中的点重叠，从而导致对每个 bin 的区域的某种程度的错误表示。我花了很多时间探索解决这个问题的两种方法：

1) using smaller markers for higher density bins

1) 对更高密度的 bin 使用较小的标记

2) applying a 'clipping' mask to each bin

2）对每个垃圾箱应用“剪裁”蒙版

The first one gives resultsthat are way too crazy. The second one looks nice -- especially if you only clip bins that have >~20 points -- but it is extremely slow(this figuretook about a minute).

第一个给出的结果太疯狂了。第二个看起来不错——特别是如果你只剪辑 >~20 点的垃圾箱——但它非常慢（这个数字大约需要一分钟）。

So, ultimately I've decidedthat by carefully selecting the marker size and bin size (sand bins), you can get results that are visually pleasing and not too bad in terms of misrepresenting the data. After all, these 2D histograms are usually intended to be visual aids to the underlying data, not strictly quantitative representations of it. Therefore, I think this approach is far superior to 'traditional 2D histograms' (e.g., plt.hist2dor plt.hexbin), and I presume that if you've found this page you're also not a fan of traditional (single color) scatter plots.

因此，最终我决定通过仔细选择标记大小和 bin 大小（s和bins），您可以获得视觉上令人愉悦的结果，并且在歪曲数据方面不会太糟糕。毕竟，这些 2D 直方图通常旨在为基础数据提供视觉辅助，而不是对其进行严格的定量表示。因此，我认为这种方法远优于“传统的 2D 直方图”（例如，plt.hist2d或plt.hexbin），并且我认为如果您找到了此页面，那么您也不是传统（单色）散点图的粉丝。

If I were king of science, I'd make sure all 2D histograms did something like this for the rest of forever.

如果我是科学之王，我会确保所有 2D 直方图永远都做这样的事情。

Python：从 numpy 矩阵创建二维直方图

提问by Kestrel

采纳答案by Hooked

回答by unutbu

回答by TheoryX

回答by farenorth

Room for improvement?

改进的空间？

相关推荐

最近更新

标签

Python：从 numpy 矩阵创建二维直方图

提问by Kestrel

采纳答案by Hooked

回答by unutbu

回答by TheoryX

回答by farenorth

Room for improvement?

改进的空间？

相关推荐

Numpy float64 与 Python 浮点数

如何检查python pandas中列的dtype

Python min() arg 是一个空序列

如何更新 Python OpenCV CV2 的 imshow() 窗口

相关推荐

最近更新

标签