Python 带有 matplotlib 的 PCA 的基本示例
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18299523/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Basic example for PCA with matplotlib
提问by Tyrax
I trying to do a simple principal component analysis with matplotlib.mlab.PCA but with the attributes of the class I can't get a clean solution to my problem. Here's an example:
我试图用 matplotlib.mlab.PCA 做一个简单的主成分分析,但是对于这个类的属性,我无法为我的问题找到一个干净的解决方案。下面是一个例子:
Get some dummy data in 2D and start PCA:
获取一些 2D 虚拟数据并启动 PCA:
from matplotlib.mlab import PCA
import numpy as np
N = 1000
xTrue = np.linspace(0,1000,N)
yTrue = 3*xTrue
xData = xTrue + np.random.normal(0, 100, N)
yData = yTrue + np.random.normal(0, 100, N)
xData = np.reshape(xData, (N, 1))
yData = np.reshape(yData, (N, 1))
data = np.hstack((xData, yData))
test2PCA = PCA(data)
Now, I just want to get the principal components as vectors in my original coordinates and plot them as arrows onto my data.
现在,我只想将主成分作为原始坐标中的向量,并将它们作为箭头绘制到我的数据上。
What is a quick and clean way to get there?
什么是到达那里的快速而干净的方法?
Thanks, Tyrax
谢谢,泰拉克斯
采纳答案by unutbu
I don't think the mlab.PCA
class is appropriate for what you want to do. In particular, the PCA
class rescales the data before finding the eigenvectors:
我认为这mlab.PCA
门课不适合你想做的事情。特别是,PCA
该类在找到特征向量之前重新调整数据:
a = self.center(a)
U, s, Vh = np.linalg.svd(a, full_matrices=False)
The center
method divides by sigma
:
该center
方法除以sigma
:
def center(self, x):
'center the data using the mean and sigma from training set a'
return (x - self.mu)/self.sigma
This results in eigenvectors, pca.Wt
, like this:
这导致特征向量,pca.Wt
,如下所示:
[[-0.70710678 -0.70710678]
[-0.70710678 0.70710678]]
They are perpendicular, but not directly relevant to the principal axes of your original data. They are principal axes with respect to massaged data.
它们是垂直的,但与原始数据的主轴没有直接关系。它们是关于按摩数据的主轴。
Perhaps it might be easier to code what you want directly (without the use of the mlab.PCA
class):
也许直接编写您想要的代码可能更容易(不使用mlab.PCA
类):
import numpy as np
import matplotlib.pyplot as plt
N = 1000
xTrue = np.linspace(0, 1000, N)
yTrue = 3 * xTrue
xData = xTrue + np.random.normal(0, 100, N)
yData = yTrue + np.random.normal(0, 100, N)
xData = np.reshape(xData, (N, 1))
yData = np.reshape(yData, (N, 1))
data = np.hstack((xData, yData))
mu = data.mean(axis=0)
data = data - mu
# data = (data - mu)/data.std(axis=0) # Uncommenting this reproduces mlab.PCA results
eigenvectors, eigenvalues, V = np.linalg.svd(data.T, full_matrices=False)
projected_data = np.dot(data, eigenvectors)
sigma = projected_data.std(axis=0).mean()
print(eigenvectors)
fig, ax = plt.subplots()
ax.scatter(xData, yData)
for axis in eigenvectors:
start, end = mu, mu + sigma * axis
ax.annotate(
'', xy=end, xycoords='data',
xytext=start, textcoords='data',
arrowprops=dict(facecolor='red', width=2.0))
ax.set_aspect('equal')
plt.show()