Python Scikit-learn:如何在一维数组上运行 KMeans?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28416408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:14:46  来源:igfitidea点击:

Scikit-learn: How to run KMeans on a one-dimensional array?

pythonscikit-learndata-miningk-means

提问by Irene

I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeansto only this vector to find the different clusters in which the values are grouped. However, it seems KMeans works with a multidimensional array and not with one-dimensional ones. I guess there is a trick to make it work but I don't know how. I saw that KMeans.fit()accepts "X : array-like or sparse matrix, shape=(n_samples, n_features)", but it wants the n_samplesto be bigger than one

我有一个介于 0 和 1 之间的 13.876(13,876) 个值的数组。我只想应用sklearn.cluster.KMeans到这个向量来查找值分组的不同集群。然而,KMeans 似乎适用于多维数组而不是一维数组。我想有一个技巧可以使它工作,但我不知道如何。我看到KMeans.fit()接受"X : array-like or sparse matrix, shape=(n_samples, n_features)",但它希望n_samples大于一

I tried putting my array on a np.zeros() matrix and run KMeans, but then is putting all the non-null values on class 1 and the rest on class 0.

我尝试将我的数组放在 np.zeros() 矩阵上并运行 KMeans,但随后将所有非空值放在类 1 上,其余放在类 0 上。

Can anyone help in running this algorithm on a one-dimensional array? Thanks a lot!

任何人都可以帮助在一维数组上运行这个算法吗?非常感谢!

采纳答案by ryanpattison

You have many samples of 1 feature, so you can reshape the array to (13,876, 1) using numpy's reshape:

您有许多 1 个特征的样本,因此您可以使用 numpy 的reshape将数组重塑为 (13,876, 1) :

from sklearn.cluster import KMeans
import numpy as np
x = np.random.random(13876)

km = KMeans()
km.fit(x.reshape(-1,1))  # -1 will be calculated to be 13876 here

回答by Frank

Read about Jenks Natural Breaks. Function in Python found the link from the article:

阅读有关詹克斯自然休息的信息。Python中的函数找到了文章中的链接:

def get_jenks_breaks(data_list, number_class):
    data_list.sort()
    mat1 = []
    for i in range(len(data_list) + 1):
        temp = []
        for j in range(number_class + 1):
            temp.append(0)
        mat1.append(temp)
    mat2 = []
    for i in range(len(data_list) + 1):
        temp = []
        for j in range(number_class + 1):
            temp.append(0)
        mat2.append(temp)
    for i in range(1, number_class + 1):
        mat1[1][i] = 1
        mat2[1][i] = 0
        for j in range(2, len(data_list) + 1):
            mat2[j][i] = float('inf')
    v = 0.0
    for l in range(2, len(data_list) + 1):
        s1 = 0.0
        s2 = 0.0
        w = 0.0
        for m in range(1, l + 1):
            i3 = l - m + 1
            val = float(data_list[i3 - 1])
            s2 += val * val
            s1 += val
            w += 1
            v = s2 - (s1 * s1) / w
            i4 = i3 - 1
            if i4 != 0:
                for j in range(2, number_class + 1):
                    if mat2[l][j] >= (v + mat2[i4][j - 1]):
                        mat1[l][j] = i3
                        mat2[l][j] = v + mat2[i4][j - 1]
        mat1[l][1] = 1
        mat2[l][1] = v
    k = len(data_list)
    kclass = []
    for i in range(number_class + 1):
        kclass.append(min(data_list))
    kclass[number_class] = float(data_list[len(data_list) - 1])
    count_num = number_class
    while count_num >= 2:  # print "rank = " + str(mat1[k][count_num])
        idx = int((mat1[k][count_num]) - 2)
        # print "val = " + str(data_list[idx])
        kclass[count_num - 1] = data_list[idx]
        k = int((mat1[k][count_num] - 1))
        count_num -= 1
    return kclass

Use and visualization:

使用和可视化:

import numpy as np
import matplotlib.pyplot as plt

def get_jenks_breaks(...):...

x = np.random.random(30)
breaks = get_jenks_breaks(x, 5)

for line in breaks:
    plt.plot([line for _ in range(len(x))], 'k--')

plt.plot(x)
plt.grid(True)
plt.show()

Result:

结果:

enter image description here

在此处输入图片说明