Python 如何在 matplotlib 中制作按密度着色的散点图?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20105364/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:32:13  来源:igfitidea点击:

How can I make a scatter plot colored by density in matplotlib?

pythonmatplotlib

提问by 2964502

I'd like to make a scatter plot where each point is colored by the spatial density of nearby points.

我想制作一个散点图,其中每个点都由附近点的空间密度着色。

I've come across a very similar question, which shows an example of this using R:

我遇到了一个非常相似的问题,它显示了一个使用 R 的例子:

R Scatter Plot: symbol color represents number of overlapping points

R 散点图:符号颜色代表重叠点的数量

What's the best way to accomplish something similar in python using matplotlib?

使用 matplotlib 在 python 中完成类似操作的最佳方法是什么?

采纳答案by Joe Kington

In addition to hist2dor hexbinas @askewchan suggested, you can use the same method that the accepted answer in the question you linked to uses.

除了@askewchan 建议之外hist2dhexbin如@askewchan 建议的那样,您可以使用与链接到的问题中已接受的答案相同的方法。

If you want to do that:

如果你想这样做:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)

# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)

fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=100, edgecolor='')
plt.show()

enter image description here

在此处输入图片说明

If you'd like the points to be plotted in order of density so that the densest points are always on top (similar to the linked example), just sort them by the z-values. I'm also going to use a smaller marker size here as it looks a bit better:

如果您希望按密度顺序绘制点,以便最密集的点始终位于顶部(类似于链接示例),只需按 z 值对它们进行排序。我还将在这里使用较小的标记尺寸,因为它看起来更好一些:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)

# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)

# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]

fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=50, edgecolor='')
plt.show()

enter image description here

在此处输入图片说明

回答by askewchan

You could make a histogram:

你可以做一个直方图:

import numpy as np
import matplotlib.pyplot as plt

# fake data:
a = np.random.normal(size=1000)
b = a*3 + np.random.normal(size=1000)

plt.hist2d(a, b, (50, 50), cmap=plt.cm.jet)
plt.colorbar()

2dhist

2dhist

回答by Guillaume

Also, if the number of point makes KDE calculation too slow, color can be interpolated in np.histogram2d [Update in response to comments: If you wish to show the colorbar, use plt.scatter() instead of ax.scatter() followed by plt.colorbar()]:

此外,如果点数使 KDE 计算速度过慢,则可以在 np.histogram2d 中插入颜色 [根据评论更新:如果您希望显示颜色条,请使用 plt.scatter() 而不是 ax.scatter()通过 plt.colorbar()]:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import Normalize 
from scipy.interpolate import interpn

def density_scatter( x , y, ax = None, sort = True, bins = 20, **kwargs )   :
    """
    Scatter plot colored by 2d histogram
    """
    if ax is None :
        fig , ax = plt.subplots()
    data , x_e, y_e = np.histogram2d( x, y, bins = bins, density = True )
    z = interpn( ( 0.5*(x_e[1:] + x_e[:-1]) , 0.5*(y_e[1:]+y_e[:-1]) ) , data , np.vstack([x,y]).T , method = "splinef2d", bounds_error = False)

    #To be sure to plot all data
    z[np.where(np.isnan(z))] = 0.0

    # Sort the points by density, so that the densest points are plotted last
    if sort :
        idx = z.argsort()
        x, y, z = x[idx], y[idx], z[idx]

    ax.scatter( x, y, c=z, **kwargs )

    norm = Normalize(vmin = np.min(z), vmax = np.max(z))
    cbar = fig.colorbar(cm.ScalarMappable(norm = norm), ax=ax)
    cbar.ax.set_ylabel('Density')

    return ax


if "__main__" == __name__ :

    x = np.random.normal(size=100000)
    y = x * 3 + np.random.normal(size=100000)
    density_scatter( x, y, bins = [30,30] )