pandas 在python中计算*多*组地理坐标之间的距离

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36696613/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:04:08  来源:igfitidea点击:

Calculating distance between *multiple* sets of geo coordinates in python

pythonnumpypandasdistancegeopy

提问by Colin

I am struggling to calculate the distance between multiplesets of latitude and longitude coordinates. In, short, I have found numerous tutorials that either use math or geopy. These tutorials work great when I just want to find the distance between ONE set of coordindates (or two unique locations). However, my objective is to scan a data set that has 400k combinations of origin and destination coordinates. One example of the code I have used is listed below, but it seems I am getting errors when my arrays are > 1 record. Any helpful tips would be much appreciated. Thank you.

我正在努力计算组经纬度坐标之间的距离。简而言之,我发现了许多使用数学或地理的教程。当我只想找到一组坐标(或两个唯一位置)之间的距离时,这些教程非常有用。但是,我的目标是扫描具有 400k 原始坐标和目标坐标组合的数据集。下面列出了我使用的代码的一个示例,但是当我的数组 > 1 条记录时,我似乎遇到了错误。任何有用的提示将不胜感激。谢谢你。

# starting dataframe is df

lat1 = df.lat1.as_matrix()
long1 = df.long1.as_matrix()
lat2 = df.lat2.as_matrix()
long2 = df.df_long2.as_matrix()

from geopy.distance import vincenty
point1 = (lat1, long1)
point2 = (lat2, long2)
print(vincenty(point1, point2).miles)

回答by urschrei

Edit: here's a simple notebook example

编辑:这是一个简单的笔记本示例

A general approach, assuming that you have a DataFrame column containing points, and you want to calculate distances between all of them (If you have separate columns, first combine them into (lon, lat)tuples, for instance). Name the new column coords.

一种通用方法,假设您有一个包含点的 DataFrame 列,并且您想要计算所有这些点之间的距离(例如,如果您有单独的列,首先将它们组合成(lon, lat)元组)。命名新列coords

import pandas as pd
import numpy as np
from geopy.distance import vincenty


# assumes your DataFrame is named df, and its lon and lat columns are named lon and lat. Adjust as needed.
df['coords'] = zip(df.lat, df.lon)
# first, let's create a square DataFrame (think of it as a matrix if you like)
square = pd.DataFrame(
    np.zeros(len(df) ** 2).reshape(len(df), len(df)),
    index=df.index, columns=df.index)

This function looks up our 'end' coordinates from the dfDataFrame using the input column name, then applies the geopy vincenty()function to each row in the input column, using the square.coordscolumn as the first argument. This works because the function is applied column-wise from right to left.

此函数df使用输入列名称从DataFrame 中查找我们的“结束”坐标,然后将 geopyvincenty()函数应用于输入列中的每一行,使用该square.coords列作为第一个参数。这是有效的,因为该函数是从右到左按列应用的。

def get_distance(col):
    end = df.ix[col.name]['coords']
    return df['coords'].apply(vincenty, args=(end,), ellipsoid='WGS-84')

Now we're ready to calculate all the distances.
We're transposing the DataFrame (.T), because the loc[]method we'll be using to retrieve distances refers to index label, row label. However, our inner apply function (see above) populates a column with retrieved values

现在我们已准备好计算所有距离。
我们正在转置 DataFrame ( .T),因为loc[]我们将用于检索距离的方法是指索引标签、行标签。然而,我们的内部应用函数(见上文)用检索到的值填充一列

distances = square.apply(get_distance, axis=1).T

Your geopyvalues are (IIRC) returned in kilometres, so you may need to convert these to whatever unit you want to use using .meters, .milesetc.

你的geopy价值是(IIRC)在返回公里,所以你可能需要将这些转换为任何单位你想用用.meters.miles等等。

Something like the following should work:

类似以下的内容应该可以工作:

def units(input_instance):
    return input_instance.meters

distances_meters = distances.applymap(units)

You can now index into your distance matrix using e.g. loc[row_index, column_index]. You should be able to adapt the above fairly easily. You might have to adjust the applycall in the get_distancefunction to ensure you're passing the correct values to great_circle. The pandas applydocs might be useful, in particular with regard to passing positional arguments using args(you'll need a recent pandas version for this to work).

您现在可以使用例如索引到您的距离矩阵loc[row_index, column_index]。您应该能够相当容易地适应上述内容。您可能需要调整函数apply中的get_distance调用以确保将正确的值传递给great_circle. Pandasapply文档可能很有用,特别是在使用传递位置参数方面args(您需要一个最新的 Pandas 版本才能工作)。

This code hasn't been profiled, and there are probably much faster ways to do it, but it should be fairly quick for 400k distance calculations.

这段代码没有被分析过,可能有更快的方法来做到这一点,但对于 400k 距离计算,它应该相当快。

Oh and also

哦还有

I can't remember whether geopy expects coordinates as (lon, lat)or (lat, lon). I bet it's the latter (sigh).

我不记得 geopy 是否期望坐标为(lon, lat)(lat, lon)。我敢打赌是后者(叹气)。