Python 管理 Tweepy API 搜索

Question

提问by user3075934

Please forgive me if this is a gross repeat of a question previously answered elsewhere, but I am lost on how to use the tweepy API search function. Is there any documentation available on how to search for tweets using the api.search()function?

如果这是之前在别处回答的问题的粗暴重复，请原谅我，但我对如何使用 tweepy API 搜索功能迷失了方向。是否有关于如何使用该api.search()功能搜索推文的文档？

Is there any way I can control features such as number of tweets returned, results type etc.?

有什么方法可以控制返回的推文数量、结果类型等功能？

The results seem to max out at 100 for some reason.

由于某种原因，结果似乎最大为 100。

the code snippet I use is as follows

我使用的代码片段如下

searched_tweets = self.api.search(q=query,rpp=100,count=1000)

Answer 1

回答by Yuva Raj

There's a problem in your code. Based on Twitter Documentation for GET search/tweets,

你的代码有问题。基于GET 搜索/推文的Twitter 文档，

The number of tweets to return per page, up to a maximum of 100. Defaults to 15. This was   
formerly the "rpp" parameter in the old Search API.

Your code should be,

你的代码应该是，

CONSUMER_KEY = '....'
CONSUMER_SECRET = '....'
ACCESS_KEY = '....'
ACCESS_SECRET = '....'

auth = tweepy.auth.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
search_results = api.search(q="hello", count=100)

for i in search_results:
    # Do Whatever You need to print here

Answer 2

回答by gumption

I originally worked out a solution based on Yuva Raj's suggestionto use additional parameters in GET search/tweets- the max_idparameter in conjunction with the idof the last tweet returned in each iteration of a loop that also checks for the occurrence of a TweepError.

我最初根据Yuva Raj的建议制定了一个解决方案，即在GET 搜索/推文中使用其他参数- 该max_id参数与id循环的每次迭代中返回的最后一条推文的结合使用，该循环还检查TweepError.

However, I discovered there is a far simpler way to solve the problem using a tweepy.Cursor(see tweepy Cursor tutorialfor more on using Cursor).

但是，我发现有一种更简单的方法可以使用来解决问题tweepy.Cursor（有关使用的更多信息，请参阅tweepy Cursor 教程Cursor）。

The following code fetches the most recent 1000 mentions of 'python'.

以下代码获取最近 1000 次提及的'python'.

import tweepy
# assuming twitter_authentication.py contains each of the 4 oauth elements (1 per line)
from twitter_authentication import API_KEY, API_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET

auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

api = tweepy.API(auth)

query = 'python'
max_tweets = 1000
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]

Update: in response to Andre Petre's comment about potential memory consumption issues with tweepy.Cursor, I'll include my original solution, replacing the single statement list comprehension used above to compute searched_tweetswith the following:

更新：为了回应Andre Petre关于潜在内存消耗问题的评论tweepy.Cursor，我将包括我的原始解决方案，用以下内容替换上面使用的单语句列表理解searched_tweets：

searched_tweets = []
last_id = -1
while len(searched_tweets) < max_tweets:
    count = max_tweets - len(searched_tweets)
    try:
        new_tweets = api.search(q=query, count=count, max_id=str(last_id - 1))
        if not new_tweets:
            break
        searched_tweets.extend(new_tweets)
        last_id = new_tweets[-1].id
    except tweepy.TweepError as e:
        # depending on TweepError.code, one may want to retry or wait
        # to keep things simple, we will give up on an error
        break

Answer 3

回答by Lucas

The other questions are old and the API changed a lot.

其他问题很旧，API 发生了很大变化。

Easy way, with Cursor (see the Cursor tutorial). Pagesreturns a list of elements (You can limit how many pages it returns. .pages(5)only returns 5 pages):

使用 Cursor 的简单方法（请参阅Cursor 教程）。Pages返回一个元素列表（你可以限制它返回的页数。.pages(5)只返回 5 页）：

for page in tweepy.Cursor(api.search, q='python', count=100, tweet_mode='extended').pages():
    # process status here
    process_page(page)

Where qis the query, counthow many will it bring for requests (100 is the maximum for requests) and tweet_mode='extended'is to have the full text. (without this the text is truncated to 140 characters) More info here. RTs are truncated as confirmed jaycech3n.

q查询在哪里，count它会为请求带来多少（100 是最大的请求），并tweet_mode='extended'有全文。（没有这个，文本被截断为 140 个字符）更多信息在这里。RT 被截断为确认的jaycech3n。

If you don't want to use tweepy.Cursor, you need to indicate max_idto bring the next chunk. Seefor more info.

如果不想使用tweepy.Cursor，则需要指明max_id带下一个chunk。查看更多信息。

last_id = None
result = True
while result:
    result = api.search(q='python', count=100, tweet_mode='extended', max_id=last_id)
    process_result(result)
    # we subtract one to not have the same again.
    last_id = result[-1]._json['id'] - 1

Answer 4

回答by Ritesh Soni

You can search the tweets with specific strings as showed below:

您可以使用特定字符串搜索推文，如下所示：

tweets = api.search('Artificial Intelligence', count=200)

Answer 5

回答by hansrajSwapnil

I am working on extracting twitter data for around a location( in here, around India), for all tweets which include a special keyword or a list of keywords.

我正在为所有包含特殊关键字或关键字列表的推文提取某个位置（在这里，印度附近）周围的 Twitter 数据。

import tweepy
import credentials    ## all my twitter API credentials are in this file, this should be in the same directory as is this script

## set API connection
auth = tweepy.OAuthHandler(credentials.consumer_key, 
                            credentials.consumer_secret)
auth.set_access_secret(credentials.access_token, 
                        credentials.access_secret)

api = tweepy.API(auth, wait_on_rate_limit=True)    # set wait_on_rate_limit =True; as twitter may block you from querying if it finds you exceeding some limits

search_words = ["#covid19", "2020", "lockdown"]

date_since = "2020-05-21"

tweets = tweepy.Cursor(api.search, =search_words,
                       geocode="20.5937,78.9629,3000km",
                       lang="en", since=date_since).items(10)
## the geocode is for India; format for geocode="lattitude,longitude,radius"
## radius should be in miles or km


for tweet in tweets:
    print("created_at: {}\nuser: {}\ntweet text: {}\ngeo_location: {}".
            format(tweet.created_at, tweet.user.screen_name, tweet.text, tweet.user.location))
    print("\n")
## tweet.user.location will give you the general location of the user and not the particular location for the tweet itself, as it turns out, most of the users do not share the exact location of the tweet

RESULTS ---- created_at: 2020-05-28 16:48:23 user: XXXXXXXXX tweet text: RT @Eatala_Rajender: Media Bulletin on status of positive cases #COVID19 in Telangana. (Dated. 28.05.2020)

结果 ---- created_at：2020-05-28 16:48:23 用户：XXXXXXXXX 推文文本：RT @Eatala_Rajender：关于 Telangana #COVID19 阳性病例状态的媒体公告。（日期。28.05.2020）

Python 管理 Tweepy API 搜索

提问by user3075934

回答by Yuva Raj

回答by gumption

回答by Lucas

回答by Ritesh Soni

回答by hansrajSwapnil

TelanganaFightsCorona

TelanganaFightsCorona

StayHom…

住家…

相关推荐

最近更新

标签

Python 管理 Tweepy API 搜索

提问by user3075934

回答by Yuva Raj

回答by gumption

回答by Lucas

回答by Ritesh Soni

回答by hansrajSwapnil

TelanganaFightsCorona

TelanganaFightsCorona

StayHom…

住家…

相关推荐

Python Django Rest 框架和 JSONField

Python 漂亮打印到文件？

Python写入文件而不覆盖当前的txt

Python NOT NULL 约束失败错误

相关推荐

最近更新

标签