Python “范数”是否等同于“欧几里得距离”?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32141856/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is "norm" equivalent to "Euclidean distance"?
提问by J_yang
I am not sure whether "norm" and "Euclidean distance" mean the same thing. Please could you help me with this distinction.
我不确定“范数”和“欧几里得距离”是否是同一个意思。请你能帮我解决这个区别吗?
I have an n
by m
array a
, where m
> 3. I want to calculate the Eculidean distance between the second data point a[1,:]
to all the other points (including itself). So I used the np.linalg.norm
, which outputs the norm of two given points. But I don't know if this is the right way of getting the EDs.
我有一个n
bym
数组a
,其中m
> 3。我想计算第二个数据点a[1,:]
到所有其他点(包括它自己)之间的欧几里得距离。所以我使用了np.linalg.norm
,它输出两个给定点的范数。但我不知道这是否是获得 ED 的正确方法。
import numpy as np
a = np.array([[0, 0, 0 ,0 ], [1, 1 , 1, 1],[2,2, 2, 3], [3,5, 1, 5]])
N = a.shape[0] # number of row
pos = a[1,:] # pick out the second data point.
dist = np.zeros((N,1), dtype=np.float64)
for i in range(N):
dist[i]= np.linalg.norm(a[i,:] - pos)
采纳答案by ali_m
A normis a function that takes a vector as an input and returns a scalar value that can be interpreted as the "size", "length" or "magnitude" of that vector. More formally, norms are defined as having the following mathematical properties:
甲范数是一个函数,它的矢量作为输入,并返回可被解释为“大小”,“长度”,或者说矢量的“大小”的标量值。更正式地,范数被定义为具有以下数学特性:
- They scale multiplicatively, i.e. Norm(a·v) = |a|·Norm(v)for any scalar a
- They satisfy the triangle inequality, i.e. Norm(u+ v) ≤ Norm(u) + Norm(v)
- The norm of a vector is zero if and only if it is the zero vector, i.e. Norm(v) = 0 ? v= 0
- 它们按乘法缩放,即Norm(a· v) = |a|·Norm( v)对于任何标量a
- 它们满足三角不等式,即Norm( u+ v) ≤ Norm( u) + Norm( v)
- 向量的范数为零当且仅当它是零向量,即Norm( v) = 0 ?v= 0
The Euclidean norm (also known as the L2 norm) is just one of many different norms - there is also the max norm, the Manhattan norm etc. The L2 norm of a single vector is equivalent to the Euclidean distance from that point to the origin, and the L2 norm of the difference between two vectors is equivalent to the Euclidean distance between the two points.
欧几里德范数(也称为 L2 范数)只是众多不同范数中的一种 - 还有最大范数、曼哈顿范数等。单个向量的 L2 范数等价于从该点到原点的欧几里德距离,并且两个向量之间的差值的L2范数等价于两点之间的欧几里德距离。
As @nobar's answer says, np.linalg.norm(x - y, ord=2)
(or just np.linalg.norm(x - y)
) will give you Euclidean distance between the vectors x
and y
.
正如@nobar的回答所说,np.linalg.norm(x - y, ord=2)
(或只是np.linalg.norm(x - y)
)会给你向量x
和之间的欧几里德距离y
。
Since you want to compute the Euclidean distance between a[1, :]
and every other row in a
, you could do this a lot faster by eliminating the for
loop and broadcasting over the rows of a
:
由于您想计算 中的a[1, :]
每行之间的欧几里德距离a
,您可以通过消除for
循环并在 的行上广播来更快地完成此操作a
:
dist = np.linalg.norm(a[1:2] - a, axis=1)
It's also easy to compute the Euclidean distance yourself using broadcasting:
使用广播自己计算欧几里得距离也很容易:
dist = np.sqrt(((a[1:2] - a) ** 2).sum(1))
The fastest method is probably scipy.spatial.distance.cdist
:
最快的方法可能是scipy.spatial.distance.cdist
:
from scipy.spatial.distance import cdist
dist = cdist(a[1:2], a)[0]
Some timings for a (1000, 1000) array:
(1000, 1000) 数组的一些时序:
a = np.random.randn(1000, 1000)
%timeit np.linalg.norm(a[1:2] - a, axis=1)
# 100 loops, best of 3: 5.43 ms per loop
%timeit np.sqrt(((a[1:2] - a) ** 2).sum(1))
# 100 loops, best of 3: 5.5 ms per loop
%timeit cdist(a[1:2], a)[0]
# 1000 loops, best of 3: 1.38 ms per loop
# check that all 3 methods return the same result
d1 = np.linalg.norm(a[1:2] - a, axis=1)
d2 = np.sqrt(((a[1:2] - a) ** 2).sum(1))
d3 = cdist(a[1:2], a)[0]
assert np.allclose(d1, d2) and np.allclose(d1, d3)
回答by nobar
The concept of a "norm" is a generalized idea in mathematics which, when applied to vectors (or vector differences), broadly represents some measure of length. There are various different approaches to computing a norm, but the one called Euclidean distance is called the "2-norm" and is based on applying an exponent of 2 (the "square"), and after summing applying an exponent of 1/2 (the "square root").
“范数”的概念是数学中的一个广义概念,当应用于向量(或向量差)时,广泛地表示某种长度的度量。有多种不同的计算范数的方法,但一种称为欧几里得距离的方法称为“2-范数”,它基于应用 2 的指数(“平方”),并在求和后应用 1/2 的指数(“平方根”)。
It's a bit cryptic in the docs, but you get Euclidean distance between two vectors by setting the parameter ord=2
.
在文档中它有点神秘,但是您可以通过设置参数来获得两个向量之间的欧几里德距离ord=2
。
sum(abs(x)**ord)**(1./ord)
sum(abs(x)**ord)**(1./ord)
becomes sqrt(sum(x**2))
.
变成sqrt(sum(x**2))
.
Note: as pointed out by @Holt, the default value is ord=None
, which is documented to compute the "2-norm" for vectors. This is, therefore, equivalent to ord=2
(Euclidean distance).
注意:正如@Holt 所指出的,默认值为ord=None
,它被记录为计算向量的“2-范数”。因此,这等效于ord=2
(欧几里得距离)。