如何在 Python 中制作用于聚类的散点图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31137077/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:32:17  来源:igfitidea点击:

How to make a scatter plot for clustering in Python

pythonmatplotlibseabornpython-ggplot

提问by Zelong

I am carrying out clustering and try to plot the result. A dummy data set is :

我正在执行聚类并尝试绘制结果。一个虚拟数据集是:

data

数据

import numpy as np

X = np.random.randn(10)
Y = np.random.randn(10)
Cluster = np.array([0, 1, 1, 1, 3, 2, 2, 3, 0, 2])    # Labels of cluster 0 to 3

cluster center

集群中心

 centers = np.random.randn(4, 2)    # 4 centers, each center is a 2D point


Question

I want to make a scatter plot to show the points in dataand color the points based on the cluster labels.

我想制作一个散点图来显示点data并根据集群标签为点着色。

Then I want to superimpose the centerpoints on the same scatter plot, in another shape (e.g. 'X') and a fifth color (as there are 4 clusters).

然后我想将center点叠加在同一个散点图上,以另一种形状(例如“X”)和第五种颜色(因为有 4 个簇)。



Comment

评论

  • I turned to seaborn0.6.0 but found no API to accomplish the task.
  • ggplotby yhat could made the scatter plot nice but the second plot would replace the first one.
  • I got confused by the colorand cmapin matplotlibso I wonder if I could use seaborn or ggplot to do it.
  • 我转向了seaborn0.6.0,但发现没有 API 来完成任务。
  • yhat的 ggplot 可以使散点图很好,但第二个图将替换第一个图。
  • 我对matplotlib 中colorand感到困惑,所以我想知道我是否可以使用 seaborn 或 ggplot 来做到这一点。cmap

采纳答案by ThePredator

The first part of your question can be done using colorbarand specifying the colours to be the Clusterarray. I have vaguely understood the second part of your question, but I believe this is what you are looking for.

问题的第一部分可以使用colorbar并指定要作为Cluster数组的颜色来完成。我对你问题的第二部分有模糊的理解,但我相信这就是你要找的。

import numpy as np
import matplotlib.pyplot as plt

x = np.random.randn(10)
y = np.random.randn(10)
Cluster = np.array([0, 1, 1, 1, 3, 2, 2, 3, 0, 2])    # Labels of cluster 0 to 3
centers = np.random.randn(4, 2) 

fig = plt.figure()
ax = fig.add_subplot(111)
scatter = ax.scatter(x,y,c=Cluster,s=50)
for i,j in centers:
    ax.scatter(i,j,s=50,c='red',marker='+')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.colorbar(scatter)

fig.show()

which results in:

这导致:

enter image description here

在此处输入图片说明

wherein your "centres" have been shown using +marker. You can specify any colours you want to them in the same way have done for x and y

其中您的“中心”已使用+标记显示。您可以使用相同的方式为它们指定您想要的任何颜色x and y

回答by jotrocken

Part of this has been answered here. The outline is

部分问题已在此处得到解答。大纲是

plt.scatter(x, y, c=color)

Quoting the documentation of matplotlib:

引用matplotlib的文档:

c : color or sequence of color, optional, default [...] Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however.

c :颜色或颜色序列,可选,默认 [...] 请注意,c 不应是单个数字 RGB 或 RGBA 序列,因为它与要进行颜色映射的值数组无法区分。然而,c 可以是一个二维数组,其中的行是 RGB 或 RGBA。

So in your case, you need a color for each cluster and than fill the color array according to the cluster assignment of each point.

因此,在您的情况下,您需要为每个集群设置一种颜色,然后根据每个点的集群分配填充颜色数组。

red = [1, 0, 0]
green = [0, 1, 0]
blue = [0, 0, 1]
colors = [red, red, green, blue, green]