Python 创建距离矩阵？

Question

提问by Jeremy

I am currently reading in data into a dataframe that looks like this.

我目前正在将数据读入一个看起来像这样的数据帧。

City         XCord    YCord   
Boston         5        2
Phoenix        7        3
New York       8        1
.....          .        .

I want to to create a Euclidean Distance Matrix from this data showing the distance between all city pairs so I get a resulting matrix like:

我想从这个数据中创建一个欧几里得距离矩阵，显示所有城市对之间的距离，所以我得到一个结果矩阵，如：

             Boston    Phoenix   New York
Boston         0        2.236      3.162
Phoenix        2.236      0        2.236
New York       3.162    2.236        0

There are many more cities and coordinates in my actual data frame so i need to to be able to somehow iterate over all of the city pairs and create a distance matrix like the one I have shown above but I am not sure how to pair all of the cites together and apply the Euclidean Distance formula? Any help would be appreciated.

在我的实际数据框中还有更多的城市和坐标，所以我需要能够以某种方式迭代所有城市对并创建一个像我上面显示的那样的距离矩阵，但我不知道如何配对所有的一起引用并应用欧几里得距离公式？任何帮助，将不胜感激。

Answer 1

采纳答案by Andrew

I think you are intrested in distance_matrix.

我认为你对distance_matrix感兴趣。

For example:

例如：

Create data:

创建数据：

import pandas as pd
from scipy.spatial import distance_matrix

data = [[5, 7], [7, 3], [8, 1]]
ctys = ['Boston', 'Phoenix', 'New York']
df = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys)

Output:

输出：

          xcord ycord
Boston      5   7
Phoenix     7   3
New York    8   1

Using the distance matrix function:

使用距离矩阵函数：

 pd.DataFrame(distance_matrix(df.values, df.values), index=df.index, columns=df.index)

Results:

结果：

          Boston    Phoenix     New York
Boston    0.000000  4.472136    6.708204
Phoenix   4.472136  0.000000    2.236068
New York  6.708204  2.236068    0.000000

Answer 2

回答by pkacprzak

I will give a method in pure python.

我将给出一个纯python的方法。

Import a sqrt function from math module:

从 math 模块导入 sqrt 函数：

from math import sqrt

Let assume that you have your coordinates in cords table in the following way:

假设您通过以下方式在cords表中获得坐标：

cords['Boston'] = (5, 2)

Define a function to compute Euclidean distance of two given 2d points:

定义一个函数来计算两个给定 2d 点的欧几里得距离：

def dist(a, b):
    d = [a[0] - b[0], a[1] - b[1]]
    return sqrt(d[0] * d[0] + d[1] * d[1])

Initialize the resulting matrix as a dictionary:

将结果矩阵初始化为字典：

D = {}

for city1, cords1 in cords.items():
    D[city1] = {}
    for city2, cords2 in cords.items():
        D[city1][city2] = dist(cords1, cords2)

D is your resulting matrix

D 是你的结果矩阵

The full source is below along with printed result:

完整来源以及打印结果如下：

from math import sqrt

cords = {}
cords['Boston'] = (5, 2)
cords['Phoenix'] = (7, 3)
cords['New York'] = (8, 1)

def dist(a, b):
    d = [a[0] - b[0], a[1] - b[1]]
    return sqrt(d[0] * d[0] + d[1] * d[1]) 

D = {}

for city1, cords1 in cords.items():
    D[city1] = {}
    for city2, cords2 in cords.items():
        D[city1][city2] = dist(cords1, cords2)   

for city1, v in D.items():
    for city2, d in v.items():
        print city1, city2, d

Results:

结果：

Boston Boston 0.0
Boston New York 3.16227766017
Boston Phoenix 2.2360679775
New York Boston 3.16227766017
New York New York 0.0
New York Phoenix 2.2360679775
Phoenix Boston 2.2360679775
Phoenix New York 2.2360679775
Phoenix Phoenix 0.0

Answer 3

回答by Maassa

There's the function in scipy: scipy.spatial.distance.cdist()

scipy 中有一个函数：scipy.spatial.distance.cdist()

Answer 4

回答by francesco lc

if you don't want to use scipy you can exploit list comprehension in this way:

如果您不想使用 scipy，您可以通过以下方式利用列表理解：

dist = lambda p1, p2: sqrt(((p1-p2)**2).sum())
dm = np.asarray([[dist(p1, p2) for p2 in xy_list] for p1 in xy_list])

Answer 5

回答by Surya Gaur

data = [[5, 7], [7, 3], [8, 1]]
ctys = ['Boston', 'Phoenix', 'New York']
df = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys)

n_df=(df.values)
n_df

(df.values).shape

matrix=np.zeros(((df.values).shape[0],(df.values).shape[0]))
matrix


for i in range((df.values).shape[0]):
    for j in range((df.values).shape[0]):
        matrix[i,j]=np.sqrt(np.sum((n_df[i]-n_df[j])**2))
        #print('i',i,'j',j)


print(matrix)

Python 创建距离矩阵？

提问by Jeremy

采纳答案by Andrew

回答by pkacprzak

回答by Maassa

回答by francesco lc

回答by Surya Gaur

相关推荐

最近更新

标签

Python 创建距离矩阵？

提问by Jeremy

采纳答案by Andrew

回答by pkacprzak

回答by Maassa

回答by francesco lc

回答by Surya Gaur

相关推荐

Python 数据框到 Excel 工作表

如何通过python中的套接字发送数组

检查字符串是否只是字母和空格 - Python

在python中按下CTRL + C时如何优雅地终止循环

相关推荐

最近更新

标签