Python 仅使用 NumPy 计算马氏距离

Question

提问by Borys

I am looking for NumPy way of calculating Mahalanobis distance between two numpy arrays (x and y). The following code can correctly calculate the same using cdist function of Scipy. Since this function calculates unnecessary matix in my case, I want more straight way of calculating it using NumPy only.

我正在寻找计算两个 numpy 数组（x 和 y）之间的马氏距离的 NumPy 方法。以下代码可以使用 Scipy 的 cdist 函数正确计算相同。由于这个函数在我的情况下计算不必要的 matix，我想要更直接的方法来计算它只使用 NumPy。

import numpy as np
from scipy.spatial.distance import cdist

x = np.array([[[1,2,3,4,5],
               [5,6,7,8,5],
               [5,6,7,8,5]],
              [[11,22,23,24,5],
               [25,26,27,28,5],
               [5,6,7,8,5]]])
i,j,k = x.shape

xx = x.reshape(i,j*k).T


y = np.array([[[31,32,33,34,5],
               [35,36,37,38,5],
               [5,6,7,8,5]],
              [[41,42,43,44,5],
               [45,46,47,48,5],
               [5,6,7,8,5]]])


yy = y.reshape(i,j*k).T

results =  cdist(xx,yy,'mahalanobis')
results = np.diag(results)
print results



[ 2.28765854  2.75165028  2.75165028  2.75165028  0.          2.75165028
  2.75165028  2.75165028  2.75165028  0.          0.          0.          0.
  0.          0.        ]

My trial:

我的审判：

VI = np.linalg.inv(np.cov(xx,yy))

print np.sqrt(np.dot(np.dot((xx-yy),VI),(xx-yy).T))

Could anybody correct this method?

有人可以纠正这种方法吗？

Here is formula for it:

这是它的公式：

http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.mahalanobis.html#scipy.spatial.distance.mahalanobis

Answer 1

采纳答案by xnx

I think your problem lies in the construction of your covariance matrix. Try:

我认为你的问题在于你的协方差矩阵的构建。尝试：

X = np.vstack([xx,yy])
V = np.cov(X.T)
VI = np.linalg.inv(V)
print np.diag(np.sqrt(np.dot(np.dot((xx-yy),VI),(xx-yy).T)))

Output:

输出：

[ 2.28765854  2.75165028  2.75165028  2.75165028  0.          2.75165028
  2.75165028  2.75165028  2.75165028  0.          0.          0.          0.
  0.          0.        ]

To do this without the intermediate array implicitly created here, you might have to sacrifice a C loop for a Python one:

要在此处不隐式创建中间数组的情况下执行此操作，您可能必须为 Python 循环牺牲一个 C 循环：

A = np.dot((xx-yy),VI)
B = (xx-yy).T
n = A.shape[0]
D = np.empty(n)
for i in range(n):
    D[i] = np.sqrt(np.sum(A[i] * B[:,i]))

EDIT: actually, with np.einsumvoodoo you can remove the Python loop and speed it up a lot (on my system, from 84.3 μs to 2.9 μs):

编辑：实际上，使用np.einsumvoodoo，您可以删除 Python 循环并大大加快速度（在我的系统上，从 84.3 μs 到 2.9 μs）：

D = np.sqrt(np.einsum('ij,ji->i', A, B))

EDIT: As @Warren Weckesser points out, einsumcan be used to do away with the intermediate Aand Barrays too:

编辑：正如@Warren Weckesser 指出的那样，einsum也可以用来消除中间体A和B数组：

delta = xx - yy
D = np.sqrt(np.einsum('nj,jk,nk->n', delta, VI, delta))

Answer 2

回答by David

Another simple solution which is just as fast as the einsum

另一个与 einsum 一样快的简单解决方案

e = xx-yy
X = np.vstack([xx,yy])
V = np.cov(X.T) 
p = np.linalg.inv(V)
D = np.sqrt(np.sum(np.dot(e,p) * e, axis = 1))

Python 仅使用 NumPy 计算马氏距离

提问by Borys

采纳答案by xnx

回答by David

相关推荐

最近更新

标签

Python 仅使用 NumPy 计算马氏距离

提问by Borys

采纳答案by xnx

回答by David

相关推荐

如何使用 Python 从文本文件中返回唯一的单词

Python 使用 nltk.download() 下载错误

python错误：TypeError：需要一个整数

Python numpy中的3维数组

相关推荐

最近更新

标签