Python kmeans 散点图:为每个簇绘制不同的颜色

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28227340/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:56:50  来源:igfitidea点击:

kmeans scatter plot: plot different colors per cluster

pythonnumpymatplotlibscipyk-means

提问by jxn

I am trying to do a scatter plot of a kmeans output which clusters sentences of the same topic together. The problem i am facing is plotting points that belongs to each cluster a certain color.

我正在尝试绘制 kmeans 输出的散点图,将同一主题的句子聚集在一起。我面临的问题是绘制属于每个集群的特定颜色的点。

sentence_list=["Hi how are you", "Good morning" ...] #i have 10 setences
km = KMeans(n_clusters=5, init='k-means++',n_init=10, verbose=1) 
#with 5 cluster, i want 5 different colors
km.fit(vectorized)
km.labels_ # [0,1,2,3,3,4,4,5,2,5]

pipeline = Pipeline([('tfidf', TfidfVectorizer())])
X = pipeline.fit_transform(sentence_list).todense()
pca = PCA(n_components=2).fit(X)
data2D = pca.transform(X)
plt.scatter(data2D[:,0], data2D[:,1])

km.fit(X)
centers2D = pca.transform(km.cluster_centers_)
plt.hold(True)
labels=np.array([km.labels_])
print labels

My problem is in the bottom code for plt.scatter(); what should i use for the parameter c?

我的问题是在plt.scatter()的底部代码中;我应该为参数c使用什么?

  1. when i use c=labelsin the code, i get this error:
  1. 当我c=labels在代码中使用时,出现此错误:

number in rbg sequence outside 0-1 range

number in rbg sequence outside 0-1 range

2.When i set c= km.labels_instead, i get the error:

2.当我设置时c= km.labels_,出现错误:

ValueError: Color array must be two-dimensional

ValueError: Color array must be two-dimensional

plt.scatter(centers2D[:,0], centers2D[:,1], 
            marker='x', s=200, linewidths=3, c=labels)
plt.show()

采纳答案by Hannes Ovrén

The color=or c=property should be a matplotlib color, as mentioned in the documentation for plot.

color=c=属性应该是一个matplotlib颜色,作为文档中提到plot

To map a integer label to a color just do

要将整数标签映射到颜色,只需执行

LABEL_COLOR_MAP = {0 : 'r',
                   1 : 'k',
                   ....,
                   }

label_color = [LABEL_COLOR_MAP[l] for l in labels]
plt.scatter(x, y, c=label_color)

If you don't want to use the builtin one-character color names, you can use other color definitions. See the documentation on matplotlib colors.

如果不想使用内置的单字符颜色名称,可以使用其他颜色定义。请参阅有关 matplotlib 颜色的文档。

回答by user3805442

It should work:

它应该工作:

from sklearn.cluster import KMeans;
cluster = KMeans(10);
cluster.fit(M);

cluster.labels_;

plt.scatter(M[:,0],M[:,1], c=[matplotlib.cm.spectral(float(i) /10) for i in cluster.labels_]);   

回答by Zhenye Na

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Scaling the data to normalize
model = KMeans(n_clusters=5).fit(X)

# Visualize it:
plt.figure(figsize=(8, 6))
plt.scatter(data[:,0], data[:,1], c=model.labels_.astype(float))

Now you have different color for different clusters.

现在不同的簇有不同的颜色。