Python 创建距离矩阵?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29481485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating a Distance Matrix?
提问by Jeremy
I am currently reading in data into a dataframe that looks like this.
我目前正在将数据读入一个看起来像这样的数据帧。
City XCord YCord
Boston 5 2
Phoenix 7 3
New York 8 1
..... . .
I want to to create a Euclidean Distance Matrix from this data showing the distance between all city pairs so I get a resulting matrix like:
我想从这个数据中创建一个欧几里得距离矩阵,显示所有城市对之间的距离,所以我得到一个结果矩阵,如:
Boston Phoenix New York
Boston 0 2.236 3.162
Phoenix 2.236 0 2.236
New York 3.162 2.236 0
There are many more cities and coordinates in my actual data frame so i need to to be able to somehow iterate over all of the city pairs and create a distance matrix like the one I have shown above but I am not sure how to pair all of the cites together and apply the Euclidean Distance formula? Any help would be appreciated.
在我的实际数据框中还有更多的城市和坐标,所以我需要能够以某种方式迭代所有城市对并创建一个像我上面显示的那样的距离矩阵,但我不知道如何配对所有的一起引用并应用欧几里得距离公式?任何帮助,将不胜感激。
采纳答案by Andrew
I think you are intrested in distance_matrix.
我认为你对distance_matrix感兴趣。
For example:
例如:
Create data:
创建数据:
import pandas as pd
from scipy.spatial import distance_matrix
data = [[5, 7], [7, 3], [8, 1]]
ctys = ['Boston', 'Phoenix', 'New York']
df = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys)
Output:
输出:
xcord ycord
Boston 5 7
Phoenix 7 3
New York 8 1
Using the distance matrix function:
使用距离矩阵函数:
pd.DataFrame(distance_matrix(df.values, df.values), index=df.index, columns=df.index)
Results:
结果:
Boston Phoenix New York
Boston 0.000000 4.472136 6.708204
Phoenix 4.472136 0.000000 2.236068
New York 6.708204 2.236068 0.000000
回答by pkacprzak
I will give a method in pure python.
我将给出一个纯python的方法。
Import a sqrt function from math module:
从 math 模块导入 sqrt 函数:
from math import sqrt
from math import sqrt
Let assume that you have your coordinates in cords table in the following way:
假设您通过以下方式在cords表中获得坐标:
cords['Boston'] = (5, 2)
cords['Boston'] = (5, 2)
Define a function to compute Euclidean distance of two given 2d points:
定义一个函数来计算两个给定 2d 点的欧几里得距离:
def dist(a, b):
d = [a[0] - b[0], a[1] - b[1]]
return sqrt(d[0] * d[0] + d[1] * d[1])
Initialize the resulting matrix as a dictionary:
将结果矩阵初始化为字典:
D = {}
for city1, cords1 in cords.items():
D[city1] = {}
for city2, cords2 in cords.items():
D[city1][city2] = dist(cords1, cords2)
D is your resulting matrix
D 是你的结果矩阵
The full source is below along with printed result:
完整来源以及打印结果如下:
from math import sqrt
cords = {}
cords['Boston'] = (5, 2)
cords['Phoenix'] = (7, 3)
cords['New York'] = (8, 1)
def dist(a, b):
d = [a[0] - b[0], a[1] - b[1]]
return sqrt(d[0] * d[0] + d[1] * d[1])
D = {}
for city1, cords1 in cords.items():
D[city1] = {}
for city2, cords2 in cords.items():
D[city1][city2] = dist(cords1, cords2)
for city1, v in D.items():
for city2, d in v.items():
print city1, city2, d
Results:
结果:
Boston Boston 0.0
Boston New York 3.16227766017
Boston Phoenix 2.2360679775
New York Boston 3.16227766017
New York New York 0.0
New York Phoenix 2.2360679775
Phoenix Boston 2.2360679775
Phoenix New York 2.2360679775
Phoenix Phoenix 0.0
回答by Maassa
There's the function in scipy: scipy.spatial.distance.cdist()
scipy 中有一个函数:scipy.spatial.distance.cdist()
回答by francesco lc
if you don't want to use scipy you can exploit list comprehension in this way:
如果您不想使用 scipy,您可以通过以下方式利用列表理解:
dist = lambda p1, p2: sqrt(((p1-p2)**2).sum())
dm = np.asarray([[dist(p1, p2) for p2 in xy_list] for p1 in xy_list])
回答by Surya Gaur
data = [[5, 7], [7, 3], [8, 1]]
ctys = ['Boston', 'Phoenix', 'New York']
df = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys)
n_df=(df.values)
n_df
(df.values).shape
matrix=np.zeros(((df.values).shape[0],(df.values).shape[0]))
matrix
for i in range((df.values).shape[0]):
for j in range((df.values).shape[0]):
matrix[i,j]=np.sqrt(np.sum((n_df[i]-n_df[j])**2))
#print('i',i,'j',j)
print(matrix)