使用 Pandas 的欧几里德距离矩阵
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39203662/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Euclidean Distance Matrix Using Pandas
提问by Abacus
I have a .csv file that contains city, latitude and longitude data in the below format:
我有一个 .csv 文件,其中包含以下格式的城市、纬度和经度数据:
CITY|LATITUDE|LONGITUDE
A|40.745392|-73.978364
B|42.562786|-114.460503
C|37.227928|-77.401924
D|41.245708|-75.881241
E|41.308273|-72.927887
I need to create a distance matrix in the below format (please ignore the dummy values):
我需要以下面的格式创建一个距离矩阵(请忽略虚拟值):
A B C D E
A 0.000000 6.000000 5.744563 6.082763 5.656854
B 6.000000 0.000000 6.082763 5.385165 5.477226
C 1.744563 6.082763 0.000000 6.000000 5.385165
D 6.082763 5.385165 6.000000 0.000000 5.385165
E 5.656854 5.477226 5.385165 5.385165 0.000000
I have loaded the data into a pandas dataframe and have created a cross join as below:
我已将数据加载到 Pandas 数据框中,并创建了一个交叉连接,如下所示:
import pandas as pd
df_A = pd.read_csv('lat_lon.csv', delimiter='|', encoding="utf-8-sig")
df_B = df_A
df_A['key'] = 1
df_B['key'] = 1
df_C = pd.merge(df_A, df_B, on='key')
- Can you please help me create the above matrix structure?
- Also, is it possible to avoid step involving cross join?
- 你能帮我创建上面的矩阵结构吗?
- 另外,是否可以避免涉及交叉连接的步骤?
回答by MaxU
You can use pdistand squareformmethods from scipy.spatial.distance:
您可以使用pdist和squareform从方法scipy.spatial.distance:
In [12]: df
Out[12]:
CITY LATITUDE LONGITUDE
0 A 40.745392 -73.978364
1 B 42.562786 -114.460503
2 C 37.227928 -77.401924
3 D 41.245708 -75.881241
4 E 41.308273 -72.927887
In [13]: from scipy.spatial.distance import squareform, pdist
In [14]: pd.DataFrame(squareform(pdist(df.iloc[:, 1:])), columns=df.CITY.unique(), index=df.CITY.unique())
Out[14]:
A B C D E
A 0.000000 40.522913 4.908494 1.967551 1.191779
B 40.522913 0.000000 37.440606 38.601738 41.551558
C 4.908494 37.440606 0.000000 4.295932 6.055264
D 1.967551 38.601738 4.295932 0.000000 2.954017
E 1.191779 41.551558 6.055264 2.954017 0.000000
回答by simplyPTA
the matrix can be directly created with cdist
in scipy.spatial.distance
:
矩阵可以直接用cdist
in创建scipy.spatial.distance
:
from scipy.spatial.distance import cdist
df_array = df[["LATITUDE", "LONGITUDE"]].to_numpy()
dist_mat = cdist(df_array, df_array)
pd.DataFrame(dist_mat, columns = df["CITY"], index = df["CITY"])
回答by Himaprasoon
for i in df["CITY"]:
for j in df["CITY"]:
row = df[df["CITY"] == j][["LATITUDE", "LONGITUDE"]]
latitude = row["LATITUDE"].tolist()[0]
longitude = row["LONGITUDE"].tolist()[0]
df.loc[df['CITY'] == i, j] = ((df["LATITUDE"] - latitude)**2 + (df["LONGITUDE"] - longitude)**2)**0.5
df = df.drop(["CITY", "LATITUDE", "LONGITUDE"], axis=1)
This works
这有效