Python 如何向 tweepy 模块添加位置过滤器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22889122/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to add a location filter to tweepy module
提问by gdogg371
I have found the following piece of code that works pretty well for letting me view in Python Shell the standard 1% of the twitter firehose:
我发现以下代码非常适合让我在 Python Shell 中查看标准的 1% 的 twitter firehose:
import sys
import tweepy
consumer_key=""
consumer_secret=""
access_key = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
print status.text
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['manchester united'])
How do I add a filter to only parse tweets from a certain location? Ive seen people adding GPS to other twitter related Python code but I cant find anything specific to sapi within the Tweepy module.
如何添加过滤器以仅解析来自某个位置的推文?我见过有人将 GPS 添加到其他与 Twitter 相关的 Python 代码中,但我在 Tweepy 模块中找不到任何特定于 sapi 的内容。
Any ideas?
有任何想法吗?
Thanks
谢谢
采纳答案by Juan E.
The streaming API doesn't allow to filter by location AND keyword simultaneously.
流式 API 不允许同时按位置和关键字进行过滤。
Bounding boxes do not act as filters for other filter parameters. For example track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any tweets containing the term Twitter (even non-geo tweets) OR coming from the San Francisco area.
边界框不充当其他过滤器参数的过滤器。例如 track=twitter&locations=-122.75,36.8,-121.75,37.8 将匹配包含术语 Twitter(甚至非地理推文)或来自旧金山地区的任何推文。
Source: https://dev.twitter.com/docs/streaming-apis/parameters#locations
来源:https: //dev.twitter.com/docs/streaming-apis/parameters#locations
What you can do is ask the streaming API for keyword or located tweets and then filter the resulting stream in your app by looking into each tweet.
您可以做的是向流式 API 询问关键字或定位的推文,然后通过查看每条推文来过滤应用程序中的结果流。
If you modify the code as follows you will capture tweets in United Kingdom, then those tweets get filtered to only show those that contain "manchester united"
如果您按如下方式修改代码,您将捕获英国的推文,然后这些推文将被过滤以仅显示包含“曼联”的推文
import sys
import tweepy
consumer_key=""
consumer_secret=""
access_key=""
access_secret=""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
if 'manchester united' in status.text.lower():
print status.text
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(locations=[-6.38,49.87,1.77,55.81])
回答by gdogg371
sapi.filter(track=['manchester united'],locations=['GPS Coordinates'])
sapi.filter(track=['manchester United'],locations=['GPS 坐标'])
回答by Kristian Rother
Juan gave the correct answer. I'm filtering for Germany only using this:
胡安给出了正确的答案。我只使用这个过滤德国:
# Bounding boxes for geolocations
# Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/
GEOBOX_WORLD = [-180,-90,180,90]
GEOBOX_GERMANY = [5.0770049095, 47.2982950435, 15.0403900146, 54.9039819757]
stream.filter(locations=GEOBOX_GERMANY)
This is a pretty crude box that includes parts of some other countries. If you want a finer grain you can combine multiple boxes to fill out the location you need.
这是一个非常粗糙的盒子,其中包括其他一些国家的部分地区。如果你想要更细的颗粒,你可以组合多个盒子来填充你需要的位置。
It should be noted though that you limit the number of tweets quite a bit if you filter by geotags. This is from roughly 5 million Tweets from my test database (the query should return the %age of tweets that actually contain a geolocation):
不过应该注意的是,如果您按 geotags 过滤,则会对推文的数量进行相当多的限制。这是来自我的测试数据库中大约 500 万条推文(查询应返回实际包含地理位置的推文的百分比):
> db.tweets.find({coordinates:{$ne:null}}).count() / db.tweets.count()
0.016668392651547598
So only 1.67% of my sample of the 1% stream include a geotag. However there's other ways of figuring out a user's location: http://arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf
所以我的 1% 流样本中只有 1.67% 包含地理标签。但是,还有其他方法可以确定用户的位置:http: //arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf
回答by Clovis
You can't filter it while streaming but you could filter it at the output stage, if you were writing the tweets to a file.
您无法在流式传输时对其进行过滤,但如果您将推文写入文件,则可以在输出阶段对其进行过滤。