Java 中的 PCA 实现

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10604507/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 01:48:18  来源:igfitidea点击:

PCA Implementation in Java

javapca

提问by Trup

I need implementation of PCA in Java. I am interested in finding something that's well documented, practical and easy to use. Any recommendations?

我需要在 Java 中实现 PCA。我有兴趣找到有据可查、实用且易于使用的东西。有什么建议吗?

回答by LotiLotiLoti

There are now a number of Principal Component Analysis implementations for Java.

现在有许多 Java 的主成分分析实现。

  1. Apache Spark: https://spark.apache.org/docs/2.1.0/mllib-dimensionality-reduction.html#principal-component-analysis-pca

    SparkConf conf = new SparkConf().setAppName("PCAExample").setMaster("local");
    try (JavaSparkContext sc = new JavaSparkContext(conf)) {
        //Create points as Spark Vectors
        List<Vector> vectors = Arrays.asList(
                Vectors.dense( -1.0, -1.0 ),
                Vectors.dense( -1.0, 1.0 ),
                Vectors.dense( 1.0, 1.0 ));
    
        //Create Spark MLLib RDD
        JavaRDD<Vector> distData = sc.parallelize(vectors);
        RDD<Vector> vectorRDD = distData.rdd();
    
        //Execute PCA Projection to 2 dimensions
        PCA pca = new PCA(2); 
        PCAModel pcaModel = pca.fit(vectorRDD);
        Matrix matrix = pcaModel.pc();
    }
    
  2. ND4J: http://nd4j.org/doc/org/nd4j/linalg/dimensionalityreduction/PCA.html

    //Create points as NDArray instances
    List<INDArray> ndArrays = Arrays.asList(
            new NDArray(new float [] {-1.0F, -1.0F}),
            new NDArray(new float [] {-1.0F, 1.0F}),
            new NDArray(new float [] {1.0F, 1.0F}));
    
    //Create matrix of points (rows are observations; columns are features)
    INDArray matrix = new NDArray(ndArrays, new int [] {3,2});
    
    //Execute PCA - again to 2 dimensions
    INDArray factors = PCA.pca_factor(matrix, 2, false);
    
  3. Apache Commons Math (single threaded; no framework)

    //create points in a double array
    double[][] pointsArray = new double[][] { 
        new double[] { -1.0, -1.0 }, 
        new double[] { -1.0, 1.0 },
        new double[] { 1.0, 1.0 } };
    
    //create real matrix
    RealMatrix realMatrix = MatrixUtils.createRealMatrix(pointsArray);
    
    //create covariance matrix of points, then find eigen vectors
    //see https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues
    
    Covariance covariance = new Covariance(realMatrix);
    RealMatrix covarianceMatrix = covariance.getCovarianceMatrix();
    EigenDecomposition ed = new EigenDecomposition(covarianceMatrix);
    
  1. Apache Spark:https: //spark.apache.org/docs/2.1.0/mllib-Dimensionity-reduction.html#principal-component-analysis-pca

    SparkConf conf = new SparkConf().setAppName("PCAExample").setMaster("local");
    try (JavaSparkContext sc = new JavaSparkContext(conf)) {
        //Create points as Spark Vectors
        List<Vector> vectors = Arrays.asList(
                Vectors.dense( -1.0, -1.0 ),
                Vectors.dense( -1.0, 1.0 ),
                Vectors.dense( 1.0, 1.0 ));
    
        //Create Spark MLLib RDD
        JavaRDD<Vector> distData = sc.parallelize(vectors);
        RDD<Vector> vectorRDD = distData.rdd();
    
        //Execute PCA Projection to 2 dimensions
        PCA pca = new PCA(2); 
        PCAModel pcaModel = pca.fit(vectorRDD);
        Matrix matrix = pcaModel.pc();
    }
    
  2. ND4J:http://nd4j.org/doc/org/nd4j/linalg/Dimensionityreduction/PCA.html

    //Create points as NDArray instances
    List<INDArray> ndArrays = Arrays.asList(
            new NDArray(new float [] {-1.0F, -1.0F}),
            new NDArray(new float [] {-1.0F, 1.0F}),
            new NDArray(new float [] {1.0F, 1.0F}));
    
    //Create matrix of points (rows are observations; columns are features)
    INDArray matrix = new NDArray(ndArrays, new int [] {3,2});
    
    //Execute PCA - again to 2 dimensions
    INDArray factors = PCA.pca_factor(matrix, 2, false);
    
  3. Apache Commons Math(单线程;无框架)

    //create points in a double array
    double[][] pointsArray = new double[][] { 
        new double[] { -1.0, -1.0 }, 
        new double[] { -1.0, 1.0 },
        new double[] { 1.0, 1.0 } };
    
    //create real matrix
    RealMatrix realMatrix = MatrixUtils.createRealMatrix(pointsArray);
    
    //create covariance matrix of points, then find eigen vectors
    //see https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues
    
    Covariance covariance = new Covariance(realMatrix);
    RealMatrix covarianceMatrix = covariance.getCovarianceMatrix();
    EigenDecomposition ed = new EigenDecomposition(covarianceMatrix);
    

Note, Singular Value Decomposition, which can also be used to find Principal Components, has equivalent implementations.

请注意,奇异值分解也可用于查找主成分,具有等效的实现。

回答by NPE

Here is one: PCA Class.

这是一个:PCA 类

This class contains the methods necessary for a basic Principal Component Analysis with a varimax rotation. Options are available for an analysis using either the covariance or the correlation martix. A parallel analysis, using Monte Carlo simulations, is performed. Extraction criteria based on eigenvalues greater than unity, greater than a Monte Carlo eigenvalue percentile or greater than the Monte Carlo eigenvalue means are available.

此类包含具有最大方差旋转的基本主成分分析所需的方法。选项可用于使用协方差或相关性 martix 的分析。执行使用蒙特卡罗模拟的并行分析。基于大于统一值、大于蒙特卡罗特征值百分位数或大于蒙特卡罗特征值均值的特征值的提取标准是可用的。

回答by sash

check http://weka.sourceforge.net/doc.stable/weka/attributeSelection/PrincipalComponents.htmlweka in fact have many other algorithm that could be used with along with PCA and also weka is adding more algorithm from time to time. so i thing, if you are working on java then switch to weka api.

检查http://weka.sourceforge.net/doc.stable/weka/attributeSelection/PrincipalComponents.htmlweka 实际上有许多其他算法可以与 PCA 一起使用,并且 weka 不时添加更多算法。所以我想,如果您正在使用 Java,那么请切换到 weka api。

回答by hrzafer

Smileis a full-fledged ML library for java. You give its PCA implementation a try. Please see: https://haifengl.github.io/smile/api/java/smile/projection/PCA.html

Smile是一个成熟的 Java 机器学习库。您可以尝试一下它的 PCA 实现。请参阅:https: //haifengl.github.io/smile/api/java/smile/projection/PCA.html

There is also PCA tutorialwith Smile but the tutorial uses Scala.

还有Smile 的PCA教程,但该教程使用 Scala。

回答by Vlad11

You can see a few implementations of PCA in the DataMelt project:

您可以在 DataMelt 项目中看到 PCA 的一些实现:

https://jwork.org/dmelt/code/index.php?keyword=PCA

https://jwork.org/dmelt/code/index.php?keyword=PCA

(they are rewritten in Jython). They include some graphical examples for dimensionality reduction. They show the usage of several Java packages, such as JSAT, DatumBox and others.

(它们是用 Jython 重写的)。它们包括一些降维的图形示例。它们展示了几个 Java 包的用法,例如 JSAT、DatumBox 等。