Python 使用 Tweepy 避免 Twitter API 限制

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21308762/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:33:17  来源:igfitidea点击:

Avoid Twitter API limitation with Tweepy

pythonpython-2.7twittertweepy

提问by 4m1nh4j1

I saw in some question on Stack Exchange that the limitation can be a function of the number of requests per 15 minutes and depends also on the complexity of the algorithm, except that this is not a complex one.

我在 Stack Exchange 上的一些问题中看到,限制可以是每 15 分钟请求数的函数,还取决于算法的复杂性,只是这不是一个复杂的算法。

So I use this code:

所以我使用这个代码:

import tweepy
import sqlite3
import time

db = sqlite3.connect('data/MyDB.db')

# Get a cursor object
cursor = db.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS MyTable(id INTEGER PRIMARY KEY, name TEXT, geo TEXT, image TEXT, source TEXT, timestamp TEXT, text TEXT, rt INTEGER)''')
db.commit()

consumer_key = ""
consumer_secret = ""
key = ""
secret = ""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(key, secret)

api = tweepy.API(auth)

search = "#MyHashtag"

for tweet in tweepy.Cursor(api.search,
                           q=search,
                           include_entities=True).items():
    while True:
        try:
            cursor.execute('''INSERT INTO MyTable(name, geo, image, source, timestamp, text, rt) VALUES(?,?,?,?,?,?,?)''',(tweet.user.screen_name, str(tweet.geo), tweet.user.profile_image_url, tweet.source, tweet.created_at, tweet.text, tweet.retweet_count))
        except tweepy.TweepError:
                time.sleep(60 * 15)
                continue
        break
db.commit()
db.close()

I always get the Twitter limitation error:

我总是收到 Twitter 限制错误:

Traceback (most recent call last):
  File "stream.py", line 25, in <module>
    include_entities=True).items():
  File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 153, in next
    self.current_page = self.page_iterator.next()
  File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 98, in next
    data = self.method(max_id = max_id, *self.args, **self.kargs)
  File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 200, in _call
    return method.execute()
  File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 176, in execute
    raise TweepError(error_msg, resp)
tweepy.error.TweepError: [{'message': 'Rate limit exceeded', 'code': 88}]

采纳答案by Aaron Hill

The problem is that your try: except:block is in the wrong place. Inserting data into the database will never raise a TweepError- it's iterating over Cursor.items()that will. I would suggest refactoring your code to call the nextmethod of Cursor.items()in an infinite loop. That call should be placed in the try: except:block, as it can raise an error.

问题是您的try: except:块位于错误的位置。将数据插入数据库永远不会引发TweepError- 它正在迭代Cursor.items()那个意志。我建议重构您的代码以在无限循环中调用 的next方法Cursor.items()。该调用应该放在try: except:块中,因为它可能会引发错误。

Here's (roughly) what the code should look like:

这是(大致)代码的样子:

# above omitted for brevity
c = tweepy.Cursor(api.search,
                       q=search,
                       include_entities=True).items()
while True:
    try:
        tweet = c.next()
        # Insert into db
    except tweepy.TweepError:
        time.sleep(60 * 15)
        continue
    except StopIteration:
        break

This works because when Tweepy raises a TweepError, it hasn't updated any of the cursor data. The next time it makes the request, it will use the same parameters as the request which triggered the rate limit, effectively repeating it until it goes though.

这是有效的,因为当 Tweepy 引发 a 时TweepError,它没有更新任何游标数据。下次它发出请求时,它将使用与触发速率限制的请求相同的参数,有效地重复它直到它通过。

回答by Till Hoffmann

If you want to avoid errors and respect the rate limit you can use the following function which takes your apiobject as an argument. It retrieves the number of remaining requests of the same type as the last requestand waits until the rate limit has been reset if desired.

如果您想避免错误并遵守速率限制,您可以使用以下函数,它将您的api对象作为参数。它检索与上次请求相同类型的剩余请求数,并等待直到速率限制被重置(如果需要)。

def test_rate_limit(api, wait=True, buffer=.1):
    """
    Tests whether the rate limit of the last request has been reached.
    :param api: The `tweepy` api instance.
    :param wait: A flag indicating whether to wait for the rate limit reset
                 if the rate limit has been reached.
    :param buffer: A buffer time in seconds that is added on to the waiting
                   time as an extra safety margin.
    :return: True if it is ok to proceed with the next request. False otherwise.
    """
    #Get the number of remaining requests
    remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
    #Check if we have reached the limit
    if remaining == 0:
        limit = int(api.last_response.getheader('x-rate-limit-limit'))
        reset = int(api.last_response.getheader('x-rate-limit-reset'))
        #Parse the UTC time
        reset = datetime.fromtimestamp(reset)
        #Let the user know we have reached the rate limit
        print "0 of {} requests remaining until {}.".format(limit, reset)

        if wait:
            #Determine the delay and sleep
            delay = (reset - datetime.now()).total_seconds() + buffer
            print "Sleeping for {}s...".format(delay)
            sleep(delay)
            #We have waited for the rate limit reset. OK to proceed.
            return True
        else:
            #We have reached the rate limit. The user needs to handle the rate limit manually.
            return False 

    #We have not reached the rate limit
    return True

回答by Dan Nguyen

For anyone who stumbles upon this on Google, tweepy 3.2+ has additional parameters for the tweepy.apiclass, in particular:

对于在 Google 上偶然发现此问题的任何人,tweepy 3.2+ 为tweepy.api类提供了额外的参数,特别是:

  • wait_on_rate_limit– Whether or not to automatically wait for rate limits to replenish
  • wait_on_rate_limit_notify– Whether or not to print a notification when Tweepy is waiting for rate limits to replenish
  • wait_on_rate_limit– 是否自动等待限速补货
  • wait_on_rate_limit_notify– 是否在 Tweepy 等待速率限制补充时打印通知

Setting these flags to Truewill delegate the waiting to the API instance, which is good enough for most simple use cases.

将这些标志设置为True会将等待委托给 API 实例,这对于大多数简单用例来说已经足够了。

回答by Mayank Khullar

Just replace

只需更换

api = tweepy.API(auth)

with

api = tweepy.API(auth, wait_on_rate_limit=True)

回答by Malik Faiq

import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# will notify user on ratelimit and will wait by it self no need of sleep.
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)