pandas 在 Python 中使用 geopy 进行地理编码时出现错误 (429) 请求过多

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49640197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:24:51  来源:igfitidea点击:

error (429) Too Many Requests while geocoding with geopy in Python

pythonpandasgeocodinggeopy

提问by seizethedata

I have a Pandasdataframe with ~20k rows, and I am trying to geocode by address column into lat/long coordinates.

我有一个Pandas约 20k 行的数据框,我试图通过地址列将地理编码为纬度/经度坐标。

How do I use time.sleep()or maybe other function to stop OSM Nominatim from Too Many Requests 429 errorthat I am getting now?

我如何使用time.sleep()或其他功能来阻止Too Many Requests 429 error我现在得到的OSM Nominatim ?

Here's the code I use for this:

这是我用于此的代码:

from geopy.geocoders import Nominatim
from geopy.distance import vincenty

geolocator = Nominatim()
df['coord'] = df['address'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df.head()

Thanks in advance!

提前致谢!

回答by KostyaEsmukov

geopy since 1.16.0 includes a RateLimiterclass which provides a convenient way to deal with the Too Many Requests 429 errorby adding delays between the queries and retrying the failed requests.

geopy 自 1.16.0 起包含一个RateLimiter类,该类Too Many Requests 429 error通过在查询之间添加延迟和重试失败的请求提供了一种方便的方法来处理。

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="specify_your_app_name_here")

from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)

df['coord'] = df['address'].apply(geocode).apply(lambda location: (location.latitude, location.longitude))
df.head()

Docs: https://geopy.readthedocs.io/en/1.16.0/#usage-with-pandas

文档:https: //geopy.readthedocs.io/en/1.16.0/#usage-with-pandas

回答by Martin Bobak

I would imagine you use a for loop. Without seeing your data, it would look something like this.

我想你使用 for 循环。没有看到您的数据,它看起来像这样。

x = df['address'].tolist()
names = []

for item in x:
    d={}
    a = geolocator.geocode(item, exactly_one=True, timeout=60)
    try:
        d["Latitude"] = a.latitude
    except:
        pass
    try:
        d["Longitude"] = a.longitude
    except:
        pass
    time.sleep(2)
    names.append(d)

d

This is how you would implement sleep to wait 2 seconds before running the loop again. Also, in the event that the geolocator cannot find the latitude and longitude, it will pass instead of exiting out of the loop and having you start over.

这就是在再次运行循环之前实现 sleep 等待 2 秒的方式。此外,如果地理定位器找不到纬度和经度,它将通过而不是退出循环并让您重新开始。