Python 仅按语言过滤 Twitter 提要
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26890605/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter Twitter feeds only by language
提问by Sudo
I am using Tweepy API for extracting Twitter feeds. I want to extract all Twitter feeds of a specific language only. The language filter works only if trackfilter is provided. The following code returns 406 error:
我正在使用 Tweepy API 来提取 Twitter 提要。我只想提取特定语言的所有 Twitter 提要。语言过滤器仅在track提供过滤器时才起作用。以下代码返回 406 错误:
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(languages=["en"])
How can I extract allthe tweets from certain language using Tweepy?
如何使用 Tweepy 从某种语言中提取所有推文?
回答by Luigi
You can't (without special access). Streaming allthe tweets (unfiltered) requires a connection to the firehose, which is granted only in specific use cases by Twitter. Honestly, the firehose isn't really necessary--proper use of trackcan get you more tweets than you know what to do with.
你不能(没有特殊访问权限)。流式传输所有推文(未过滤)需要连接到firehose,这仅在 Twitter 的特定用例中授予。老实说,firehose 并不是真正必要的——正确使用它track可以让你得到比你知道该怎么做的更多的推文。
Try using something like this:
尝试使用这样的东西:
stream.filter(languages=["en"], track=["a", "the", "i", "you", "u"]) # etc
Filtering by words like that will get you many, many tweets. If you want real data for the most-used words, check out this article from Time: The 500 Most Frequently Used Words on Twitter. You can use up to 400keywords, but that will likely approach the 1% limit of tweets at a given time interval. If your trackparameter matches 60% of all tweets at a given time, you will still only get 1% (which is a LOT of tweets).
按这样的词过滤会给你很多很多的推文。如果您想要最常用词的真实数据,请查看时间:Twitter 上最常用的 500 个词中的这篇文章。您最多可以使用400 个关键字,但这可能会接近给定时间间隔内 1% 的推文限制。如果您的track参数在给定时间匹配所有推文的 60%,您仍然只会得到 1%(这是很多推文)。
回答by Jay Mehta
Other than getting filtered tweets directly, you can filter it after getting all tweets of different languages by:
除了直接获取过滤的推文之外,您还可以在获取所有不同语言的推文后通过以下方式对其进行过滤:
tweets = api.search("python")
for tweet in tweets:
if tweet.lang == "en":
print(tweet.text)
#Do the stuff here
Hope it helps.
希望能帮助到你。
回答by Aziz Alto
Try lang='en'param in Cursor()e.g.
lang='en'在Cursor()例如尝试参数
tweepy.Cursor(.. lang='en')
tweepy.Cursor(.. lang='en')
回答by Walker Rowe
You can see the arguments for the track method in the github code https://github.com/tweepy/tweepy/blob/master/tweepy/streaming.py
您可以在 github 代码https://github.com/tweepy/tweepy/blob/master/tweepy/streaming.py 中看到 track 方法的参数
Put languages in a array of ISO_639-1_codes.
将语言放入 ISO_639-1_codes 数组中。
They are:
他们是:
filter(self, follow=None, track=None, is_async=False, locations=None,
stall_warnings=False, languages=None, encoding='utf8', filter_level=None):
So to track by languages just put:
因此,要按语言进行跟踪,只需输入:
class Listener(StreamListener):
def on_data(self, data):
j = json.loads(data)
t = {
'screenName' : j['user']['screen_name'],
'text:': j['text']
}
print(t)
return(True)
def on_status(self, status):
print(status.text)
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth=auth, listener=Listener(),wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
stream.filter(track=['Trump'],languages=["en","fr","es"])
回答by Smit Jethwa
This worked for me.
这对我有用。
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
a=input("Enter Tag: ")
tweets = api.search(a, count=200)
a=[]
for tweet in tweets:
if tweet.lang == "en":
a.append(tweet.text)
回答by Vishal Kharde
Tweepy searchallows to fetch tweets for specific language. You can use ISO 639-1 code to specify the value for language parameter. Following code will fetch tweets with full text in specified language (English for below example)
Tweepy 搜索允许获取特定语言的推文。您可以使用 ISO 639-1 代码来指定语言参数的值。以下代码将获取指定语言的全文推文(以下示例为英文)
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
tweets = api.search(q = keywordtosearch, lang = 'en', count = 100, truncated = False, tweet_mode = 'extended')
for tweet in tweets:
print(tweet.full_text)
#add your code
回答by Abhishek Kumar
With the help of GetOldTweets3 (https://pypi.org/project/GetOldTweets3/), you can download tweets (even old ones) by filtering over few criteria, as shown below:
在 GetOldTweets3 ( https://pypi.org/project/GetOldTweets3/)的帮助下,您可以通过过滤几个条件来下载推文(甚至是旧推文),如下所示:
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('')\
.setSince("2020-02-15")\
.setUntil("2020-03-29")\
.setMaxTweets(5)\
.setNear('India')\
.setLang('en')
tweets = got.manager.TweetManager.getTweets(tweetCriteria)
for tweet in tweets:
print(tweet.text)
print(tweet.date)
print(tweet.geo)
print(tweet.id)
print(tweet.permalink)
print(tweet.username)
print(tweet.retweets)
print(tweet.favorites)
print(tweet.mentions)
print(tweet.hashtags)
print('*'*50)

