Java K-均值算法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1055811/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 22:53:01  来源:igfitidea点击:

K- Means algorithm

javaalgorithmmachine-learninggroupingunsupervised-learning

提问by dedalo

I'm trying to program a k-means algorithm in Java. I have calculated a number of arrays, each of them containing a number of coefficients. I need to use a k-means algorithm in order to group all this data. Do you know any implementation of this algorithm?

我正在尝试用 Java 编写 k-means 算法。我计算了许多数组,每个数组都包含许多系数。我需要使用 k-means 算法来对所有这些数据进行分组。你知道这个算法的任何实现吗?

Thanks

谢谢

回答by duffymo

There's a very nice Python implementation of K-means clustering in "Programming Collective Intelligence". I highly recommend it.

“编程集体智能”中有一个非常好的 K-means 聚类的 Python 实现。我强烈推荐它。

I realize that you'll have to translate to Java, but it doesn't look to be too difficult.

我意识到您必须转换为 Java,但这看起来并不难。

回答by jtb

I haven't studied the code myself, but there's a multithreaded K-means implementation given in this JavaWorld articlethat looks pretty instructive.

我自己还没有研究过代码,但是这篇 JavaWorld 文章中给出了一个多线程 K-means 实现,看起来很有启发性。

回答by minoriole

Classification, Clustering and grouping are well developed areas of IR. There is a very good (Java) library/software (open source) hereCalled WEKA. There are several algorithms for clustering there. Although there is a learning curve, it might useful when you encounter harder problems.

分类、聚类和分组是 IR 发展良好的领域。有一个很好的(Java)的库/软件(开源)在这里调用WEKA。那里有几种聚类算法。尽管存在学习曲线,但当您遇到更难的问题时,它可能会很有用。

回答by ldog

It seems everyone who posted forgot to mention the defacto image processing library: OpenCV http://sourceforge.net/projects/opencvlibrary/. You would have to write a JNI wrapper around the C OpenCV code to get KMeans to work but the added benefit would be

似乎每个发帖的人都忘记提及事实上的图像处理库:OpenCV http://sourceforge.net/projects/opencvlibrary/。您必须围绕 C OpenCV 代码编写 JNI 包装器才能使 KMeans 工作,但额外的好处是

  1. You would know that the KMeans algorithm is heavily optimized
  2. OpenCV makes use of your GPU extensively so it runs blazing fast
  1. 您会知道 KMeans 算法经过了大量优化
  2. OpenCV 广泛使用您的 GPU,因此运行速度极快

The main draw back is that you would have to write a JNI wrapper. I once needed a template matching routine and was faced with many alternatives but I found OpenCV to be by far the best, even though I was forced to write a JNI wrapper for it.

主要的缺点是您必须编写 JNI 包装器。我曾经需要一个模板匹配例程并面临许多替代方案,但我发现 OpenCV 是迄今为止最好的,即使我被迫为其编写 JNI 包装器。

回答by Marcin

OpenCV is one of the most horribly written libraries I've ever had to use. On the other hand, Matlab does it very neatly.

OpenCV 是我用过的最糟糕的库之一。另一方面,Matlab 做得非常巧妙。

If you have to code it yourself, the algorithm is incredibly simple for how efficient it is.

如果您必须自己编写代码,那么该算法的效率非常简单。

  1. Pick number of clusters (k)
  2. Make k points (they're going to be the centroids)
  3. Randomize all these points location
  4. Calculate Euclidean distance from each point to all centroids
  5. Assign 'membership' of each point to the nearest centroid
  6. Establish the new centroids by averageing locations of all points belonging to a given cluster
  7. Goto 4 Until convergence is achieved, or changes made are irrelevant.
  1. 选择簇数 (k)
  2. 制作 k 个点(它们将成为质心)
  3. 随机化所有这些点的位置
  4. 计算每个点到所有质心的欧几里德距离
  5. 将每个点的“成员资格”分配给最近的质心
  6. 通过平均属于给定集群的所有点的位置来建立新的质心
  7. Goto 4 直到实现收敛,或者所做的更改都无关紧要。

回答by elcuco

Really, KMeans is a really easy algorithm. Any good reason why not hand coding it yourself? I did it in Qt and then ported the code to plain old STL, without too much problems.

真的,KMeans 是一个非常简单的算法。有什么好的理由不自己动手编码吗?我是在 Qt 中完成的,然后将代码移植到普通的旧 STL,没有太多问题。

I am started to be a fan to Joel's idea: no external dependencies, so please feel free to tell me what's good about a large piece of software you don't control, and others on this question have already mentioned it's not a good piece of software/

我开始喜欢 Joel 的想法:没有外部依赖,所以请随时告诉我你无法控制的大型软件有什么好处,这个问题的其他人已经提到它不是一个好的软件/

Talk is cheap, real man show their code to the world: http://github.com/elcuco/data_mining_demo

说话很便宜,真人向世界展示他们的代码:http: //github.com/elcuco/data_mining_demo

I should clean the code a little to be more generic, and current version is not ported to STL, but it's a start!

我应该稍微清理一下代码以使其更通用,当前版本没有移植到 STL,但这是一个开始!

回答by madth3

Very old question but I noticed there is no mention of the Java Machine Learning Librarywhich has an implementation of K-Meansand includes some documentationabout it's usage.

很老的问题,但我注意到没有提到Java 机器学习库,它有K-Means的实现,并包含一些关于它的用法的文档

The project is not very active but the last version is relatively recent (July 2012)

该项目不是很活跃,但最后一个版本相对较新(2012 年 7 月)

回答by shailendra pathak

//Aim:To implement Kmeans clustering algorithm.
//Program
import java.util.*;
class k_means
{
static int count1,count2,count3;
static int d[];
static int k[][];
static int tempk[][];
static double m[];
static double diff[];
static int n,p;

static int cal_diff(int a) // This method will determine the cluster in which an element go at a particular step.
{
int temp1=0;
for(int i=0;i<p;++i)
{
if(a>m[i])
diff[i]=a-m[i];
else
diff[i]=m[i]-a;
}
int val=0;
double temp=diff[0];
for(int i=0;i<p;++i)
{
if(diff[i]<temp)
{
temp=diff[i];
val=i;
}
}//end of for loop
return val;
}

static void cal_mean() // This method will determine intermediate mean values
{
for(int i=0;i<p;++i)
m[i]=0; // initializing means to 0
int cnt=0;
for(int i=0;i<p;++i)
{
cnt=0;
for(int j=0;j<n-1;++j)
{
if(k[i][j]!=-1)
{
m[i]+=k[i][j];
++cnt;
}}
m[i]=m[i]/cnt;
}
}

static int check1() // This checks if previous k ie. tempk and current k are same.Used as terminating case.
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
if(tempk[i][j]!=k[i][j])
{
return 0;
}
return 1;
}

public static void main(String args[])
{
Scanner scr=new Scanner(System.in);
/* Accepting number of elements */
System.out.println("Enter the number of elements ");
n=scr.nextInt();
d=new int[n];
/* Accepting elements */
System.out.println("Enter "+n+" elements: ");
for(int i=0;i<n;++i)
d[i]=scr.nextInt();
/* Accepting num of clusters */
System.out.println("Enter the number of clusters: ");
p=scr.nextInt();
/* Initialising arrays */
k=new int[p][n];
tempk=new int[p][n];
m=new double[p];
diff=new double[p];
/* Initializing m */
for(int i=0;i<p;++i)
m[i]=d[i];

int temp=0;
int flag=0;
do
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
{
k[i][j]=-1;
}
for(int i=0;i<n;++i) // for loop will cal cal_diff(int) for every element.
{
temp=cal_diff(d[i]);
if(temp==0)
k[temp][count1++]=d[i];
else
if(temp==1)
k[temp][count2++]=d[i];
else
if(temp==2)
k[temp][count3++]=d[i]; 
}
cal_mean(); // call to method which will calculate mean at this step.
flag=check1(); // check if terminating condition is satisfied.
if(flag!=1)
/*Take backup of k in tempk so that you can check for equivalence in next step*/
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
tempk[i][j]=k[i][j];

System.out.println("\n\nAt this step");
System.out.println("\nValue of clusters");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}//end of for loop
System.out.println("\nValue of m ");
for(int i=0;i<p;++i)
System.out.print("m"+(i+1)+"="+m[i]+"  ");

count1=0;count2=0;count3=0;
}
while(flag==0);

System.out.println("\n\n\nThe Final Clusters By Kmeans are as follows: ");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}
}
}
/*
Enter the number of elements
8
Enter 8 elements:
2 3 6 8 12 15 18 22
Enter the number of clusters:
3

At this step
Value of clusters
K1{ 2 }
K2{ 3 }
K3{ 6 8 12 15 18 22 }
Value of m
m1=2.0  m2=3.0  m3=13.5

At this step
Value of clusters
K1{ 2 }
K2{ 3 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.0  m2=5.666666666666667  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

The Final Clusters By Kmeans are as follows:
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 } */