Python scikit 学习 pca.explained_variance_ratio_ cutoff

Question

提问by Chubaka

When choosing the number of principal components (k), we choose k to be the smallest value so that for example, 99% of variance, is retained.

在选择主成分 (k) 的数量时，我们选择 k 作为最小值，以便例如保留 99% 的方差。

However, in the Python Scikit learn, I am not 100% sure pca.explained_variance_ratio_ = 0.99is equal to "99% of variance is retained"? Could anyone enlighten? Thanks.

但是，在 Python Scikit 学习中，我不是 100% 确定pca.explained_variance_ratio_ = 0.99等于“保留了 99% 的方差”？有谁能开导吗？谢谢。

The Python Scikit learn PCA manual is here

Python Scikit 学习 PCA 手册在这里

http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA

Answer 1

采纳答案by Curt F.

Yes, you are nearly right. The pca.explained_variance_ratio_parameter returns a vector of the variance explained by each dimension. Thus pca.explained_variance_ratio_[i]gives the variance explained solely by the i+1st dimension.

是的，你几乎是对的。该pca.explained_variance_ratio_参数返回由每个维度解释的方差的向量。因此pca.explained_variance_ratio_[i]给出了仅由第 i+1 个维度解释的方差。

You probably want to do pca.explained_variance_ratio_.cumsum(). That will return a vector xsuch that x[i]returns the cumulativevariance explained by the first i+1 dimensions.

你可能想要做pca.explained_variance_ratio_.cumsum()。这将返回一个向量x，以便x[i]返回由前 i+1 个维度解释的累积方差。

import numpy as np
from sklearn.decomposition import PCA

np.random.seed(0)
my_matrix = np.random.randn(20, 5)

my_model = PCA(n_components=5)
my_model.fit_transform(my_matrix)

print my_model.explained_variance_
print my_model.explained_variance_ratio_
print my_model.explained_variance_ratio_.cumsum()

[ 1.50756565  1.29374452  0.97042041  0.61712667  0.31529082]
[ 0.32047581  0.27502207  0.20629036  0.13118776  0.067024  ]
[ 0.32047581  0.59549787  0.80178824  0.932976    1.        ]

So in my random toy data, if I picked k=4I would retain 93.3% of the variance.

所以在我的随机玩具数据中，如果我选择k=4我会保留 93.3% 的方差。

Answer 2

回答by Yannic Klem

Although this question is older than 2 years i want to provide an update on this. I wanted to do the same and it looks like sklearn now provides this feature out of the box.

虽然这个问题已经超过 2 年了，但我想对此提供更新。我想做同样的事情，看起来 sklearn 现在提供了开箱即用的功能。

As stated in the docs

如文档中所述

if 0 < n_components < 1 and svd_solver == ‘full', select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components

如果 0 < n_components < 1 且 svd_solver == 'full'，则选择需要解释的方差量大于 n_components 指定的百分比的分量数

So the code required is now

所以现在需要的代码是

my_model = PCA(n_components=0.99, svd_solver='full')
my_model.fit_transform(my_matrix)

Answer 3

回答by Julian

This worked for me with even less typing in the PCA section. The rest is added for convenience. Only 'data' needs to be defined in an earlier stage.

这对我有用，在 PCA 部分打字更少。为方便起见，添加其余部分。只需在较早的阶段定义“数据”。

import sklearn as sl
from sklearn.preprocessing import StandardScaler as ss
from sklearn.decomposition import PCA 

st = ss().fit_transform(data)
pca = PCA(0.80)
pc = pca.fit_transform(st) # << to retain the components in an object
pc

#pca.explained_variance_ratio_
print ( "Components = ", pca.n_components_ , ";\nTotal explained variance = ",
      round(pca.explained_variance_ratio_.sum(),5)  )

Python scikit 学习 pca.explained_variance_ratio_ cutoff

提问by Chubaka

采纳答案by Curt F.

回答by Yannic Klem

回答by Julian

相关推荐

最近更新

标签

Python scikit 学习 pca.explained_variance_ratio_ cutoff

提问by Chubaka

采纳答案by Curt F.

回答by Yannic Klem

回答by Julian

相关推荐

Python 在外壳中清除屏幕

可以同时安装 Python 2.7 和 3.5 吗？

如何将嵌套列表列表转换为 python 3.3 中的元组列表？

如何使用shell脚本（和makefile？）执行python程序

相关推荐

最近更新

标签