Python 带有 matplotlib 的 PCA 的基本示例

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18299523/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:25:45  来源:igfitidea点击:

Basic example for PCA with matplotlib

pythonmatplotlibpca

提问by Tyrax

I trying to do a simple principal component analysis with matplotlib.mlab.PCA but with the attributes of the class I can't get a clean solution to my problem. Here's an example:

我试图用 matplotlib.mlab.PCA 做一个简单的主成分分析,但是对于这个类的属性,我无法为我的问题找到一个干净的解决方案。下面是一个例子:

Get some dummy data in 2D and start PCA:

获取一些 2D 虚拟数据并启动 PCA:

from matplotlib.mlab import PCA
import numpy as np

N     = 1000
xTrue = np.linspace(0,1000,N)
yTrue = 3*xTrue

xData = xTrue + np.random.normal(0, 100, N)
yData = yTrue + np.random.normal(0, 100, N)
xData = np.reshape(xData, (N, 1))
yData = np.reshape(yData, (N, 1))
data  = np.hstack((xData, yData))
test2PCA = PCA(data)

Now, I just want to get the principal components as vectors in my original coordinates and plot them as arrows onto my data.

现在,我只想将主成分作为原始坐标中的向量,并将它们作为箭头绘制到我的数据上。

What is a quick and clean way to get there?

什么是到达那里的快速而干净的方法?

Thanks, Tyrax

谢谢,泰拉克斯

采纳答案by unutbu

I don't think the mlab.PCAclass is appropriate for what you want to do. In particular, the PCAclass rescales the data before finding the eigenvectors:

我认为这mlab.PCA门课不适合你想做的事情。特别是,PCA该类在找到特征向量之前重新调整数据:

a = self.center(a)
U, s, Vh = np.linalg.svd(a, full_matrices=False)

The centermethod divides by sigma:

center方法除以sigma

def center(self, x):
    'center the data using the mean and sigma from training set a'
    return (x - self.mu)/self.sigma

This results in eigenvectors, pca.Wt, like this:

这导致特征向量,pca.Wt,如下所示:

[[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]

They are perpendicular, but not directly relevant to the principal axes of your original data. They are principal axes with respect to massaged data.

它们是垂直的,但与原始数据的主轴没有直接关系。它们是关于按摩数据的主轴。

Perhaps it might be easier to code what you want directly (without the use of the mlab.PCAclass):

也许直接编写您想要的代码可能更容易(不使用mlab.PCA类):

import numpy as np
import matplotlib.pyplot as plt

N = 1000
xTrue = np.linspace(0, 1000, N)
yTrue = 3 * xTrue
xData = xTrue + np.random.normal(0, 100, N)
yData = yTrue + np.random.normal(0, 100, N)
xData = np.reshape(xData, (N, 1))
yData = np.reshape(yData, (N, 1))
data = np.hstack((xData, yData))

mu = data.mean(axis=0)
data = data - mu
# data = (data - mu)/data.std(axis=0)  # Uncommenting this reproduces mlab.PCA results
eigenvectors, eigenvalues, V = np.linalg.svd(data.T, full_matrices=False)
projected_data = np.dot(data, eigenvectors)
sigma = projected_data.std(axis=0).mean()
print(eigenvectors)

fig, ax = plt.subplots()
ax.scatter(xData, yData)
for axis in eigenvectors:
    start, end = mu, mu + sigma * axis
    ax.annotate(
        '', xy=end, xycoords='data',
        xytext=start, textcoords='data',
        arrowprops=dict(facecolor='red', width=2.0))
ax.set_aspect('equal')
plt.show()

enter image description here

在此处输入图片说明