pandas 使用 Geopandas 计算到最近特征的距离

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30740046/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:27:01  来源:igfitidea点击:

Calculate Distance to Nearest Feature with Geopandas

pythonpandasshapelygeopandas

提问by AJG519

I'm looking to do the equivalent of the ArcPy Generate Near Tableusing Geopandas / Shapely. I'm very new to Geopandas and Shapely and have developed a methodology that works, but I'm wondering if there is a more efficient way of doing it.

我正在寻找使用 Geopandas / Shapely做相当于 ArcPy Generate Near Table 的工作。我对 Geopandas 和 Shapely 非常陌生,并且已经开发出一种有效的方法,但我想知道是否有更有效的方法来做到这一点。

I have two point file datasets - Census Block Centroids and restaurants. I'm looking to find, for each Census Block centroid, the distance to it's closest restaurant. There are no restrictions in terms of same restaurant being the closest restaurant for multiple blocks.

我有两个点文件数据集 - 人口普查块质心和餐馆。我正在寻找,对于每个人口普查区块质心,到它最近的餐厅的距离。同一餐厅是多个街区最近的餐厅没有限制。

The reason this becomes a bit more complicated for me is because the Geopandas Distance functioncalculates elementwise, matching based on index. Therefore, my general methodology is to turn the Restaurants file into a multipoint file and then set the index of the blocks file to all be the same value. Then all of the block centroids and the restaurants have the same index value.

这对我来说变得有点复杂的原因是因为Geopandas 距离函数根据索引计算元素匹配。因此,我的一般方法是将餐厅文件变成多点文件,然后将块文件的索引设置为相同的值。然后所有的块质心和餐馆都具有相同的索引值。

import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, Point, MultiPoint

Now read in the Block Centroid and Restaurant Shapefiles:

现在阅读块质心和餐厅形状文件:

Blocks=gpd.read_file(BlockShp)
Restaurants=gpd.read_file(RestaurantShp)

Since the Geopandas distance function calculates distance elementwise, I convert the Restaurant GeoSeries to a MultiPoint GeoSeries:

由于 Geopandas 距离函数按元素计算距离,我将 Restaurant GeoSeries 转换为 MultiPoint GeoSeries:

RestMulti=gpd.GeoSeries(Restaurants.unary_union)
RestMulti.crs=Restaurants.crs
RestMulti.reset_index(drop=True)

Then I set the index for the Blocks equal to 0 (the same value as the Restaurants multipoint) as a work around for the elementwise calculation.

然后我将 Blocks 的索引设置为等于 0(与餐馆多点的值相同)作为元素计算的解决方法。

Blocks.index=[0]*len(Blocks)

Lastly, I use the Geopandas distance function to calculate the distance to the nearest restaurant for each Block centroid.

最后,我使用 Geopandas 距离函数来计算每个 Block 质心到最近餐厅的距离。

Blocks['Distance']=Blocks.distance(RestMulti)

Please offer any suggestions on how any aspect of this could be improved. I'm not tied to using Geopandas or Shapely, but I am looking to learn an alternative to ArcPy.

请就如何改进这方面的任何方面提出任何建议。我与使用 Geopandas 或 Shapely 无关,但我正在寻找 ArcPy 的替代方案。

Thanks for the help!

谢谢您的帮助!

回答by cd98

If I understand correctly your issue, Blocks and Restaurants can have very different dimensions. For this reason, it's probably a bad approach to try to force into a table format by reindexing.

如果我正确理解您的问题,街区和餐厅可能有非常不同的维度。出于这个原因,尝试通过重新索引来强制转换为表格格式可能是一种不好的方法。

I would just loop over blocks and get the minimum distance to restaurants (just as @shongololo was suggesting).

我只会遍历块并获得到餐馆的最小距离(正如@shongololo 所建议的那样)。

I'm going to be slightly more general (because I already have this code written down) and do a distance from points to lines, but the same code should work from points to points or from polygons to polygons. I'll start with a GeoDataFramefor the points and I'll create a new column which has the minimum distance to lines.

我会稍微通用一点(因为我已经写下了这段代码)并且从点到线做一段距离,但是相同的代码应该从点到点或从多边形到多边形。我将从GeoDataFrame点的a 开始,然后创建一个与线的距离最小的新列。

%matplotlib inline
import matplotlib.pyplot as plt
import shapely.geometry as geom
import numpy as np
import pandas as pd
import geopandas as gpd

lines = gpd.GeoSeries(
    [geom.LineString(((1.4, 3), (0, 0))),
        geom.LineString(((1.1, 2.), (0.1, 0.4))),
        geom.LineString(((-0.1, 3.), (1, 2.)))])

# 10 points
n  = 10
points = gpd.GeoSeries([geom.Point(x, y) for x, y in np.random.uniform(0, 3, (n, 2))])

# Put the points in a dataframe, with some other random column
df_points = gpd.GeoDataFrame(np.array([points, np.random.randn(n)]).T)
df_points.columns = ['Geometry', 'Property1']

points.plot()
lines.plot()

enter image description here

enter image description here

Now get the distance from points to lines and only save the minimum distance for each point (see below for a version with apply)

现在获取点到线的距离,并只保存每个点的最小距离(请参阅下面的应用版本)

min_dist = np.empty(n)
for i, point in enumerate(points):
    min_dist[i] = np.min([point.distance(line) for line in lines])
df_points['min_dist_to_lines'] = min_dist
df_points.head(3)

which gives

这使

    Geometry                                       Property1    min_dist_to_lines
0   POINT (0.2479424516236574 2.944916965334865)    2.621823    0.193293
1   POINT (1.465768457667432 2.605673714922998)     0.6074484   0.226353
2   POINT (2.831645235202689 1.125073838462032)     0.657191    1.940127

---- EDIT ----

- - 编辑 - -

(taken from a github issue) Using applyis nicer and more consistent with how you'd do it in pandas:

(取自 github 问题) Usingapply更好,更符合您的操作方式pandas

def min_distance(point, lines):
    return lines.distance(point).min()

df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, df_lines)

EDIT: As of at least 2019-10-04 it seems that a change in pandas requires a different input in the last code block, making use of the argsparameters in .apply():

编辑:至少从 2019-10-04 开始,pandas 的变化似乎需要在最后一个代码块中使用不同的输入,利用以下args参数.apply()

df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, args=(df_lines,))

回答by Marcos Tenório

Your code is missing a detail, args = (df_lines)

您的代码缺少一个细节, args = (df_lines)

def min_distance(point, lines):
    return lines.distance(point).min()

df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, args=(df_lines,))# Notice the change to this line