pandas 计算数据帧中纬度和经度之间的距离

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44446862/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:45:22  来源:igfitidea点击:

Calculate distance between latitude and longitude in dataframe

pythonpandasgeopy

提问by Harikrishna

I have 4 columns in my dataframe containing the following data:
Start_latitude
Start_longitude
Stop_latitude
Stop_longitude

我的数据
框中有 4 列包含以下数据:Start_latitude
Start_longitude
Stop_latitude
Stop_longitude

I need to compute distance between the latitude longitude pair and create a new column with the distance computed.

我需要计算纬度经度对之间的距离并使用计算的距离创建一个新列。

I came across a package (geopy) which can do this for me. But I need to pass a tuple to geopy. How do i apply this function (geopy) across the dataframe in pandas for all the records?

我遇到了一个可以为我做这件事的包 (geopy)。但我需要将一个元组传递给 geopy。我如何在 Pandas 的数据帧中为所有记录应用这个函数(geopy)?

回答by Richard

I'd recommend you use pyproj instead of geopy. geopy relies on online services whereas pyproj is local (meaning it will be faster and won't rely on an internet connection) and more transparent about its methods (see herefor instance), which are based on the Proj4 codebase that underlies essentially all open-source GIS software and, probably, many of the web services you'd use.

我建议您使用 pyproj 而不是 geopy。geopy 依赖于在线服务,而 pyproj 是本地的(意味着它会更快并且不依赖于互联网连接)并且其方法更加透明(例如参见此处),这些方法基于 Proj4 代码库,该代码库基本上是所有开放的基础-source GIS 软件,可能还有您使用的许多网络服务。

#!/usr/bin/env python3

import pandas as pd
import numpy as np
from pyproj import Geod

wgs84_geod = Geod(ellps='WGS84') #Distance will be measured on this ellipsoid - more accurate than a spherical method

#Get distance between pairs of lat-lon points
def Distance(lat1,lon1,lat2,lon2):
  az12,az21,dist = wgs84_geod.inv(lon1,lat1,lon2,lat2) #Yes, this order is correct
  return dist

#Create test data
lat1 = np.random.uniform(-90,90,100)
lon1 = np.random.uniform(-180,180,100)
lat2 = np.random.uniform(-90,90,100)
lon2 = np.random.uniform(-180,180,100)

#Package as a dataframe
df = pd.DataFrame({'lat1':lat1,'lon1':lon1,'lat2':lat2,'lon2':lon2})

#Add/update a column to the data frame with the distances (in metres)
df['dist'] = Distance(df['lat1'].tolist(),df['lon1'].tolist(),df['lat2'].tolist(),df['lon2'].tolist())

PyProj has some documentation here.

PyProj 有一些文档here

回答by Remy Kabel

From the documentation of geopy: https://pypi.python.org/pypi/geopy. You can do this by doing:

来自 geopy 的文档:https://pypi.python.org/pypi/geopy 。你可以这样做:

from geopy.distance import vincenty

# Define the two points
start = (start_latitute, start_longitude)
stop = (stop_latitude, stop_longitude)

# Print the vincenty distance
print(vincenty(start, stop).meters)

# Print the great circle distance
print(great_circle(start, stop).meters)

Combining this with Pandas. Assuming you have a dataframe df. We first create the function:

将其与 Pandas 结合起来。假设您有一个数据框df。我们首先创建函数:

def distance_calc (row):
    start = (row['start_latitute'], row['start_longitude'])
    stop = (row['stop_latitude'], row['stop_longitude'])

    return vincenty(start, stop).meters

And then apply it to the dataframe:

然后将其应用于数据帧:

df['distance'] = df.apply (lambda row: distance_calc (row),axis=1)

Note the axis=1 specifier, that means that the application is done at a row, rather than a column level.

请注意 axis=1 说明符,这意味着应用程序是在行级别而不是列级别完成的。