Python 带有 matplotlib 的 PCA 的基本示例

Question

提问by Tyrax

I trying to do a simple principal component analysis with matplotlib.mlab.PCA but with the attributes of the class I can't get a clean solution to my problem. Here's an example:

我试图用 matplotlib.mlab.PCA 做一个简单的主成分分析，但是对于这个类的属性，我无法为我的问题找到一个干净的解决方案。下面是一个例子：

Get some dummy data in 2D and start PCA:

获取一些 2D 虚拟数据并启动 PCA：

from matplotlib.mlab import PCA
import numpy as np

N     = 1000
xTrue = np.linspace(0,1000,N)
yTrue = 3*xTrue

xData = xTrue + np.random.normal(0, 100, N)
yData = yTrue + np.random.normal(0, 100, N)
xData = np.reshape(xData, (N, 1))
yData = np.reshape(yData, (N, 1))
data  = np.hstack((xData, yData))
test2PCA = PCA(data)

Now, I just want to get the principal components as vectors in my original coordinates and plot them as arrows onto my data.

现在，我只想将主成分作为原始坐标中的向量，并将它们作为箭头绘制到我的数据上。

What is a quick and clean way to get there?

什么是到达那里的快速而干净的方法？

Thanks, Tyrax

谢谢，泰拉克斯

Answer 1

采纳答案by unutbu

I don't think the mlab.PCAclass is appropriate for what you want to do. In particular, the PCAclass rescales the data before finding the eigenvectors:

我认为这mlab.PCA门课不适合你想做的事情。特别是，PCA该类在找到特征向量之前重新调整数据：

a = self.center(a)
U, s, Vh = np.linalg.svd(a, full_matrices=False)

The centermethod divides by sigma:

该center方法除以sigma：

def center(self, x):
    'center the data using the mean and sigma from training set a'
    return (x - self.mu)/self.sigma

This results in eigenvectors, pca.Wt, like this:

这导致特征向量，pca.Wt，如下所示：

[[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]

They are perpendicular, but not directly relevant to the principal axes of your original data. They are principal axes with respect to massaged data.

它们是垂直的，但与原始数据的主轴没有直接关系。它们是关于按摩数据的主轴。

Perhaps it might be easier to code what you want directly (without the use of the mlab.PCAclass):

也许直接编写您想要的代码可能更容易（不使用mlab.PCA类）：

import numpy as np
import matplotlib.pyplot as plt

N = 1000
xTrue = np.linspace(0, 1000, N)
yTrue = 3 * xTrue
xData = xTrue + np.random.normal(0, 100, N)
yData = yTrue + np.random.normal(0, 100, N)
xData = np.reshape(xData, (N, 1))
yData = np.reshape(yData, (N, 1))
data = np.hstack((xData, yData))

mu = data.mean(axis=0)
data = data - mu
# data = (data - mu)/data.std(axis=0)  # Uncommenting this reproduces mlab.PCA results
eigenvectors, eigenvalues, V = np.linalg.svd(data.T, full_matrices=False)
projected_data = np.dot(data, eigenvectors)
sigma = projected_data.std(axis=0).mean()
print(eigenvectors)

fig, ax = plt.subplots()
ax.scatter(xData, yData)
for axis in eigenvectors:
    start, end = mu, mu + sigma * axis
    ax.annotate(
        '', xy=end, xycoords='data',
        xytext=start, textcoords='data',
        arrowprops=dict(facecolor='red', width=2.0))
ax.set_aspect('equal')
plt.show()

enter image description here

在此处输入图片说明

Python 带有 matplotlib 的 PCA 的基本示例

提问by Tyrax

采纳答案by unutbu

相关推荐

最近更新

标签

Python 带有 matplotlib 的 PCA 的基本示例

提问by Tyrax

采纳答案by unutbu

相关推荐

Python XLWT 多种款式

Python 导入模块在终端中工作，但在 IDLE 中无效

Python 导入错误：matplotlib 需要 dateutil

python 2.7：没有名为 configparser 的模块

相关推荐

最近更新

标签