Python scikit 学习 pca.explained_variance_ratio_ cutoff
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32857029/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python scikit learn pca.explained_variance_ratio_ cutoff
提问by Chubaka
When choosing the number of principal components (k), we choose k to be the smallest value so that for example, 99% of variance, is retained.
在选择主成分 (k) 的数量时,我们选择 k 作为最小值,以便例如保留 99% 的方差。
However, in the Python Scikit learn, I am not 100% sure pca.explained_variance_ratio_ = 0.99
is equal to "99% of variance is retained"? Could anyone enlighten? Thanks.
但是,在 Python Scikit 学习中,我不是 100% 确定pca.explained_variance_ratio_ = 0.99
等于“保留了 99% 的方差”?有谁能开导吗?谢谢。
- The Python Scikit learn PCA manual is here
- Python Scikit 学习 PCA 手册在这里
采纳答案by Curt F.
Yes, you are nearly right. The pca.explained_variance_ratio_
parameter returns a vector of the variance explained by each dimension. Thus pca.explained_variance_ratio_[i]
gives the variance explained solely by the i+1st dimension.
是的,你几乎是对的。该pca.explained_variance_ratio_
参数返回由每个维度解释的方差的向量。因此pca.explained_variance_ratio_[i]
给出了仅由第 i+1 个维度解释的方差。
You probably want to do pca.explained_variance_ratio_.cumsum()
. That will return a vector x
such that x[i]
returns the cumulativevariance explained by the first i+1 dimensions.
你可能想要做pca.explained_variance_ratio_.cumsum()
。这将返回一个向量x
,以便x[i]
返回由前 i+1 个维度解释的累积方差。
import numpy as np
from sklearn.decomposition import PCA
np.random.seed(0)
my_matrix = np.random.randn(20, 5)
my_model = PCA(n_components=5)
my_model.fit_transform(my_matrix)
print my_model.explained_variance_
print my_model.explained_variance_ratio_
print my_model.explained_variance_ratio_.cumsum()
[ 1.50756565 1.29374452 0.97042041 0.61712667 0.31529082]
[ 0.32047581 0.27502207 0.20629036 0.13118776 0.067024 ]
[ 0.32047581 0.59549787 0.80178824 0.932976 1. ]
So in my random toy data, if I picked k=4
I would retain 93.3% of the variance.
所以在我的随机玩具数据中,如果我选择k=4
我会保留 93.3% 的方差。
回答by Yannic Klem
Although this question is older than 2 years i want to provide an update on this. I wanted to do the same and it looks like sklearn now provides this feature out of the box.
虽然这个问题已经超过 2 年了,但我想对此提供更新。我想做同样的事情,看起来 sklearn 现在提供了开箱即用的功能。
As stated in the docs
如文档中所述
if 0 < n_components < 1 and svd_solver == ‘full', select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components
如果 0 < n_components < 1 且 svd_solver == 'full',则选择需要解释的方差量大于 n_components 指定的百分比的分量数
So the code required is now
所以现在需要的代码是
my_model = PCA(n_components=0.99, svd_solver='full')
my_model.fit_transform(my_matrix)
回答by Julian
This worked for me with even less typing in the PCA section. The rest is added for convenience. Only 'data' needs to be defined in an earlier stage.
这对我有用,在 PCA 部分打字更少。为方便起见,添加其余部分。只需在较早的阶段定义“数据”。
import sklearn as sl
from sklearn.preprocessing import StandardScaler as ss
from sklearn.decomposition import PCA
st = ss().fit_transform(data)
pca = PCA(0.80)
pc = pca.fit_transform(st) # << to retain the components in an object
pc
#pca.explained_variance_ratio_
print ( "Components = ", pca.n_components_ , ";\nTotal explained variance = ",
round(pca.explained_variance_ratio_.sum(),5) )