Python 仅使用 NumPy 计算马氏距离
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27686240/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculate Mahalanobis distance using NumPy only
提问by Borys
I am looking for NumPy way of calculating Mahalanobis distance between two numpy arrays (x and y). The following code can correctly calculate the same using cdist function of Scipy. Since this function calculates unnecessary matix in my case, I want more straight way of calculating it using NumPy only.
我正在寻找计算两个 numpy 数组(x 和 y)之间的马氏距离的 NumPy 方法。以下代码可以使用 Scipy 的 cdist 函数正确计算相同。由于这个函数在我的情况下计算不必要的 matix,我想要更直接的方法来计算它只使用 NumPy。
import numpy as np
from scipy.spatial.distance import cdist
x = np.array([[[1,2,3,4,5],
[5,6,7,8,5],
[5,6,7,8,5]],
[[11,22,23,24,5],
[25,26,27,28,5],
[5,6,7,8,5]]])
i,j,k = x.shape
xx = x.reshape(i,j*k).T
y = np.array([[[31,32,33,34,5],
[35,36,37,38,5],
[5,6,7,8,5]],
[[41,42,43,44,5],
[45,46,47,48,5],
[5,6,7,8,5]]])
yy = y.reshape(i,j*k).T
results = cdist(xx,yy,'mahalanobis')
results = np.diag(results)
print results
[ 2.28765854 2.75165028 2.75165028 2.75165028 0. 2.75165028
2.75165028 2.75165028 2.75165028 0. 0. 0. 0.
0. 0. ]
My trial:
我的审判:
VI = np.linalg.inv(np.cov(xx,yy))
print np.sqrt(np.dot(np.dot((xx-yy),VI),(xx-yy).T))
Could anybody correct this method?
有人可以纠正这种方法吗?
Here is formula for it:
这是它的公式:
采纳答案by xnx
I think your problem lies in the construction of your covariance matrix. Try:
我认为你的问题在于你的协方差矩阵的构建。尝试:
X = np.vstack([xx,yy])
V = np.cov(X.T)
VI = np.linalg.inv(V)
print np.diag(np.sqrt(np.dot(np.dot((xx-yy),VI),(xx-yy).T)))
Output:
输出:
[ 2.28765854 2.75165028 2.75165028 2.75165028 0. 2.75165028
2.75165028 2.75165028 2.75165028 0. 0. 0. 0.
0. 0. ]
To do this without the intermediate array implicitly created here, you might have to sacrifice a C loop for a Python one:
要在此处不隐式创建中间数组的情况下执行此操作,您可能必须为 Python 循环牺牲一个 C 循环:
A = np.dot((xx-yy),VI)
B = (xx-yy).T
n = A.shape[0]
D = np.empty(n)
for i in range(n):
D[i] = np.sqrt(np.sum(A[i] * B[:,i]))
EDIT: actually, with np.einsumvoodoo you can remove the Python loop and speed it up a lot (on my system, from 84.3 μs to 2.9 μs):
编辑:实际上,使用np.einsumvoodoo,您可以删除 Python 循环并大大加快速度(在我的系统上,从 84.3 μs 到 2.9 μs):
D = np.sqrt(np.einsum('ij,ji->i', A, B))
EDIT: As @Warren Weckesser points out, einsumcan be used to do away with the intermediate Aand Barrays too:
编辑:正如@Warren Weckesser 指出的那样,einsum也可以用来消除中间体A和B数组:
delta = xx - yy
D = np.sqrt(np.einsum('nj,jk,nk->n', delta, VI, delta))
回答by David
Another simple solution which is just as fast as the einsum
另一个与 einsum 一样快的简单解决方案
e = xx-yy
X = np.vstack([xx,yy])
V = np.cov(X.T)
p = np.linalg.inv(V)
D = np.sqrt(np.sum(np.dot(e,p) * e, axis = 1))

