Python 仅按语言过滤 Twitter 提要

Question

提问by Sudo

I am using Tweepy API for extracting Twitter feeds. I want to extract all Twitter feeds of a specific language only. The language filter works only if trackfilter is provided. The following code returns 406 error:

我正在使用 Tweepy API 来提取 Twitter 提要。我只想提取特定语言的所有 Twitter 提要。语言过滤器仅在track提供过滤器时才起作用。以下代码返回 406 错误：

l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(languages=["en"])

How can I extract allthe tweets from certain language using Tweepy?

如何使用 Tweepy 从某种语言中提取所有推文？

Answer 1

回答by Luigi

You can't (without special access). Streaming allthe tweets (unfiltered) requires a connection to the firehose, which is granted only in specific use cases by Twitter. Honestly, the firehose isn't really necessary--proper use of trackcan get you more tweets than you know what to do with.

你不能（没有特殊访问权限）。流式传输所有推文（未过滤）需要连接到firehose，这仅在 Twitter 的特定用例中授予。老实说，firehose 并不是真正必要的——正确使用它track可以让你得到比你知道该怎么做的更多的推文。

Try using something like this:

尝试使用这样的东西：

stream.filter(languages=["en"], track=["a", "the", "i", "you", "u"]) # etc

Filtering by words like that will get you many, many tweets. If you want real data for the most-used words, check out this article from Time: The 500 Most Frequently Used Words on Twitter. You can use up to 400keywords, but that will likely approach the 1% limit of tweets at a given time interval. If your trackparameter matches 60% of all tweets at a given time, you will still only get 1% (which is a LOT of tweets).

按这样的词过滤会给你很多很多的推文。如果您想要最常用词的真实数据，请查看时间：Twitter 上最常用的 500 个词中的这篇文章。您最多可以使用400 个关键字，但这可能会接近给定时间间隔内 1% 的推文限制。如果您的track参数在给定时间匹配所有推文的 60%，您仍然只会得到 1%（这是很多推文）。

Answer 2

回答by Jay Mehta

Other than getting filtered tweets directly, you can filter it after getting all tweets of different languages by:

除了直接获取过滤的推文之外，您还可以在获取所有不同语言的推文后通过以下方式对其进行过滤：

tweets = api.search("python")
for tweet in tweets:
   if tweet.lang == "en":
      print(tweet.text)
      #Do the stuff here

Hope it helps.

希望能帮助到你。

Answer 3

回答by Aziz Alto

Try lang='en'param in Cursor()e.g.

lang='en'在Cursor()例如尝试参数

tweepy.Cursor(.. lang='en')

Answer 4

回答by Walker Rowe

You can see the arguments for the track method in the github code https://github.com/tweepy/tweepy/blob/master/tweepy/streaming.py

您可以在 github 代码https://github.com/tweepy/tweepy/blob/master/tweepy/streaming.py 中看到 track 方法的参数

Put languages in a array of ISO_639-1_codes.

将语言放入 ISO_639-1_codes 数组中。

They are:

他们是：

filter(self, follow=None, track=None, is_async=False, locations=None,
               stall_warnings=False, languages=None, encoding='utf8', filter_level=None):

So to track by languages just put:

因此，要按语言进行跟踪，只需输入：

class Listener(StreamListener):

    def on_data(self, data):
        j = json.loads(data)
        t = {
          'screenName' : j['user']['screen_name'],
          'text:': j['text']
          }
        print(t)
        return(True)

    def on_status(self, status):
        print(status.text)


auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

stream = Stream(auth=auth, listener=Listener(),wait_on_rate_limit=True,wait_on_rate_limit_notify=True)

stream.filter(track=['Trump'],languages=["en","fr","es"])

Answer 5

回答by Smit Jethwa

This worked for me.

这对我有用。

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
a=input("Enter Tag: ")
tweets = api.search(a, count=200)
a=[]
for tweet in tweets:
    if tweet.lang == "en":
        a.append(tweet.text)

Answer 6

回答by Vishal Kharde

Tweepy searchallows to fetch tweets for specific language. You can use ISO 639-1 code to specify the value for language parameter. Following code will fetch tweets with full text in specified language (English for below example)

Tweepy 搜索允许获取特定语言的推文。您可以使用 ISO 639-1 代码来指定语言参数的值。以下代码将获取指定语言的全文推文（以下示例为英文）

    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)
    tweets = api.search(q = keywordtosearch, lang = 'en', count = 100, truncated = False, tweet_mode = 'extended')
    for tweet in tweets:
        print(tweet.full_text)
        #add your code

Answer 7

回答by Abhishek Kumar

With the help of GetOldTweets3 (https://pypi.org/project/GetOldTweets3/), you can download tweets (even old ones) by filtering over few criteria, as shown below:

在 GetOldTweets3 ( https://pypi.org/project/GetOldTweets3/)的帮助下，您可以通过过滤几个条件来下载推文（甚至是旧推文），如下所示：

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('')\
                                       .setSince("2020-02-15")\
                                       .setUntil("2020-03-29")\
                                       .setMaxTweets(5)\
                                       .setNear('India')\
                                       .setLang('en')
tweets = got.manager.TweetManager.getTweets(tweetCriteria)
for tweet in tweets:
    print(tweet.text)
    print(tweet.date)
    print(tweet.geo)
    print(tweet.id)
    print(tweet.permalink)
    print(tweet.username)
    print(tweet.retweets)
    print(tweet.favorites)
    print(tweet.mentions)
    print(tweet.hashtags)
    print('*'*50)

Python 仅按语言过滤 Twitter 提要

提问by Sudo

回答by Luigi

回答by Jay Mehta

回答by Aziz Alto

回答by Walker Rowe

回答by Smit Jethwa

回答by Vishal Kharde

回答by Abhishek Kumar

相关推荐

最近更新

标签

Python 仅按语言过滤 Twitter 提要

提问by Sudo

回答by Luigi

回答by Jay Mehta

回答by Aziz Alto

回答by Walker Rowe

回答by Smit Jethwa

回答by Vishal Kharde

回答by Abhishek Kumar

相关推荐

Python 如何访问配对列表中配对的每个元素？

Python Pandas 用空白/空字符串替换 NaN

Python 如何将样式表应用于 PyQt 中的自定义小部件

Python 导入相对路径

相关推荐

最近更新

标签