如何在 Python 中制作用于聚类的散点图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31137077/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to make a scatter plot for clustering in Python
提问by Zelong
I am carrying out clustering and try to plot the result. A dummy data set is :
我正在执行聚类并尝试绘制结果。一个虚拟数据集是:
data
数据
import numpy as np
X = np.random.randn(10)
Y = np.random.randn(10)
Cluster = np.array([0, 1, 1, 1, 3, 2, 2, 3, 0, 2]) # Labels of cluster 0 to 3
cluster center
集群中心
centers = np.random.randn(4, 2) # 4 centers, each center is a 2D point
Question
题
I want to make a scatter plot to show the points in data
and color the points based on the cluster labels.
我想制作一个散点图来显示点data
并根据集群标签为点着色。
Then I want to superimpose the center
points on the same scatter plot, in another shape (e.g. 'X') and a fifth color (as there are 4 clusters).
然后我想将center
点叠加在同一个散点图上,以另一种形状(例如“X”)和第五种颜色(因为有 4 个簇)。
Comment
评论
- I turned to seaborn0.6.0 but found no API to accomplish the task.
- ggplotby yhat could made the scatter plot nice but the second plot would replace the first one.
- I got confused by the
color
andcmap
in matplotlibso I wonder if I could use seaborn or ggplot to do it.
- 我转向了seaborn0.6.0,但发现没有 API 来完成任务。
- yhat的 ggplot 可以使散点图很好,但第二个图将替换第一个图。
- 我对matplotlib 中的
color
and感到困惑,所以我想知道我是否可以使用 seaborn 或 ggplot 来做到这一点。cmap
采纳答案by ThePredator
The first part of your question can be done using colorbar
and specifying the colours to be the Cluster
array. I have vaguely understood the second part of your question, but I believe this is what you are looking for.
问题的第一部分可以使用colorbar
并指定要作为Cluster
数组的颜色来完成。我对你问题的第二部分有模糊的理解,但我相信这就是你要找的。
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randn(10)
y = np.random.randn(10)
Cluster = np.array([0, 1, 1, 1, 3, 2, 2, 3, 0, 2]) # Labels of cluster 0 to 3
centers = np.random.randn(4, 2)
fig = plt.figure()
ax = fig.add_subplot(111)
scatter = ax.scatter(x,y,c=Cluster,s=50)
for i,j in centers:
ax.scatter(i,j,s=50,c='red',marker='+')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.colorbar(scatter)
fig.show()
which results in:
这导致:
wherein your "centres" have been shown using +
marker. You can specify any colours you want to them in the same way have done for x and y
其中您的“中心”已使用+
标记显示。您可以使用相同的方式为它们指定您想要的任何颜色x and y
回答by jotrocken
Part of this has been answered here. The outline is
部分问题已在此处得到解答。大纲是
plt.scatter(x, y, c=color)
Quoting the documentation of matplotlib:
引用matplotlib的文档:
c : color or sequence of color, optional, default [...] Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however.
c :颜色或颜色序列,可选,默认 [...] 请注意,c 不应是单个数字 RGB 或 RGBA 序列,因为它与要进行颜色映射的值数组无法区分。然而,c 可以是一个二维数组,其中的行是 RGB 或 RGBA。
So in your case, you need a color for each cluster and than fill the color array according to the cluster assignment of each point.
因此,在您的情况下,您需要为每个集群设置一种颜色,然后根据每个点的集群分配填充颜色数组。
red = [1, 0, 0]
green = [0, 1, 0]
blue = [0, 0, 1]
colors = [red, red, green, blue, green]