Python 管理 Tweepy API 搜索

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22469713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:59:57  来源:igfitidea点击:

Managing Tweepy API Search

pythontwittertweepy

提问by user3075934

Please forgive me if this is a gross repeat of a question previously answered elsewhere, but I am lost on how to use the tweepy API search function. Is there any documentation available on how to search for tweets using the api.search()function?

如果这是之前在别处回答的问题的粗暴重复,请原谅我,但我对如何使用 tweepy API 搜索功能迷失了方向。是否有关于如何使用该api.search()功能搜索推文的文档?

Is there any way I can control features such as number of tweets returned, results type etc.?

有什么方法可以控制返回的推文数量、结果类型等功能?

The results seem to max out at 100 for some reason.

由于某种原因,结果似乎最大为 100。

the code snippet I use is as follows

我使用的代码片段如下

searched_tweets = self.api.search(q=query,rpp=100,count=1000)

searched_tweets = self.api.search(q=query,rpp=100,count=1000)

回答by Yuva Raj

There's a problem in your code. Based on Twitter Documentation for GET search/tweets,

你的代码有问题。基于GET 搜索/推文的Twitter 文档,

The number of tweets to return per page, up to a maximum of 100. Defaults to 15. This was   
formerly the "rpp" parameter in the old Search API.

Your code should be,

你的代码应该是,

CONSUMER_KEY = '....'
CONSUMER_SECRET = '....'
ACCESS_KEY = '....'
ACCESS_SECRET = '....'

auth = tweepy.auth.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
search_results = api.search(q="hello", count=100)

for i in search_results:
    # Do Whatever You need to print here

回答by gumption

I originally worked out a solution based on Yuva Raj's suggestionto use additional parameters in GET search/tweets- the max_idparameter in conjunction with the idof the last tweet returned in each iteration of a loop that also checks for the occurrence of a TweepError.

我最初根据Yuva Raj建议制定了一个解决方案,即在GET 搜索/推文中使用其他参数- 该max_id参数与id循环的每次迭代中返回的最后一条推文的结合使用,该循环还检查TweepError.

However, I discovered there is a far simpler way to solve the problem using a tweepy.Cursor(see tweepy Cursor tutorialfor more on using Cursor).

但是,我发现有一种更简单的方法可以使用 来解决问题tweepy.Cursor(有关使用 的更多信息,请参阅tweepy Cursor 教程Cursor)。

The following code fetches the most recent 1000 mentions of 'python'.

以下代码获取最近 1000 次提及的'python'.

import tweepy
# assuming twitter_authentication.py contains each of the 4 oauth elements (1 per line)
from twitter_authentication import API_KEY, API_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET

auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

api = tweepy.API(auth)

query = 'python'
max_tweets = 1000
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]

Update: in response to Andre Petre's comment about potential memory consumption issues with tweepy.Cursor, I'll include my original solution, replacing the single statement list comprehension used above to compute searched_tweetswith the following:

更新:为了回应Andre Petre关于潜在内存消耗问题的评论tweepy.Cursor,我将包括我的原始解决方案,用以下内容替换上面使用的单语句列表理解searched_tweets

searched_tweets = []
last_id = -1
while len(searched_tweets) < max_tweets:
    count = max_tweets - len(searched_tweets)
    try:
        new_tweets = api.search(q=query, count=count, max_id=str(last_id - 1))
        if not new_tweets:
            break
        searched_tweets.extend(new_tweets)
        last_id = new_tweets[-1].id
    except tweepy.TweepError as e:
        # depending on TweepError.code, one may want to retry or wait
        # to keep things simple, we will give up on an error
        break

回答by Lucas

The other questions are old and the API changed a lot.

其他问题很旧,API 发生了很大变化。

Easy way, with Cursor (see the Cursor tutorial). Pagesreturns a list of elements (You can limit how many pages it returns. .pages(5)only returns 5 pages):

使用 Cursor 的简单方法(请参阅Cursor 教程)。Pages返回一个元素列表(你可以限制它返回的页数。.pages(5)只返回 5 页):

for page in tweepy.Cursor(api.search, q='python', count=100, tweet_mode='extended').pages():
    # process status here
    process_page(page)

Where qis the query, counthow many will it bring for requests (100 is the maximum for requests) and tweet_mode='extended'is to have the full text. (without this the text is truncated to 140 characters) More info here. RTs are truncated as confirmed jaycech3n.

q查询在哪里,count它会为请求带来多少(100 是最大的请求),并tweet_mode='extended'有全文。(没有这个,文本被截断为 140 个字符)更多信息在这里。RT 被截断为确认的jaycech3n

If you don't want to use tweepy.Cursor, you need to indicate max_idto bring the next chunk. Seefor more info.

如果不想使用tweepy.Cursor,则需要指明max_id带下一个chunk。查看更多信息。

last_id = None
result = True
while result:
    result = api.search(q='python', count=100, tweet_mode='extended', max_id=last_id)
    process_result(result)
    # we subtract one to not have the same again.
    last_id = result[-1]._json['id'] - 1

回答by Ritesh Soni

You can search the tweets with specific strings as showed below:

您可以使用特定字符串搜索推文,如下所示:

tweets = api.search('Artificial Intelligence', count=200)

回答by hansrajSwapnil

I am working on extracting twitter data for around a location( in here, around India), for all tweets which include a special keyword or a list of keywords.

我正在为所有包含特殊关键字或关键字列表的推文提取某个位置(在这里,印度附近)周围的 Twitter 数据。

import tweepy
import credentials    ## all my twitter API credentials are in this file, this should be in the same directory as is this script

## set API connection
auth = tweepy.OAuthHandler(credentials.consumer_key, 
                            credentials.consumer_secret)
auth.set_access_secret(credentials.access_token, 
                        credentials.access_secret)

api = tweepy.API(auth, wait_on_rate_limit=True)    # set wait_on_rate_limit =True; as twitter may block you from querying if it finds you exceeding some limits

search_words = ["#covid19", "2020", "lockdown"]

date_since = "2020-05-21"

tweets = tweepy.Cursor(api.search, =search_words,
                       geocode="20.5937,78.9629,3000km",
                       lang="en", since=date_since).items(10)
## the geocode is for India; format for geocode="lattitude,longitude,radius"
## radius should be in miles or km


for tweet in tweets:
    print("created_at: {}\nuser: {}\ntweet text: {}\ngeo_location: {}".
            format(tweet.created_at, tweet.user.screen_name, tweet.text, tweet.user.location))
    print("\n")
## tweet.user.location will give you the general location of the user and not the particular location for the tweet itself, as it turns out, most of the users do not share the exact location of the tweet

RESULTS ---- created_at: 2020-05-28 16:48:23 user: XXXXXXXXX tweet text: RT @Eatala_Rajender: Media Bulletin on status of positive cases #COVID19 in Telangana. (Dated. 28.05.2020)

结果 ---- created_at:2020-05-28 16:48:23 用户:XXXXXXXXX 推文文本:RT @Eatala_Rajender:关于 Telangana #COVID19 阳性病例状态的媒体公告。(日期。28.05.2020)

TelanganaFightsCorona

TelanganaFightsCorona

StayHom…

住家…

geo_location: Hyderabad, India

geo_location:印度海得拉巴