Python 通过 Tweepy 在 Twitter 中获取所有关注者 ID
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17431807/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get All Follower IDs in Twitter by Tweepy
提问by user1056824
Is it possible to get the full follower list of an account who has more than one million followers, like McDonald's?
是否有可能获得一个拥有超过一百万粉丝的账户的完整粉丝列表,比如麦当劳?
I use Tweepy and follow the code:
我使用 Tweepy 并遵循以下代码:
c = tweepy.Cursor(api.followers_ids, id = 'McDonalds')
ids = []
for page in c.pages():
ids.append(page)
I also try this:
我也试试这个:
for id in c.items():
ids.append(id)
But I always got the 'Rate limit exceeded' error and there were only 5000 follower ids.
但我总是收到“超出速率限制”错误,并且只有 5000 个关注者 ID。
采纳答案by alecxe
In order to avoid rate limit, you can/should wait before the next follower page request. Looks hacky, but works:
为了避免速率限制,您可以/应该在下一个关注者页面请求之前等待。看起来很hacky,但有效:
import time
import tweepy
auth = tweepy.OAuthHandler(..., ...)
auth.set_access_token(..., ...)
api = tweepy.API(auth)
ids = []
for page in tweepy.Cursor(api.followers_ids, screen_name="McDonalds").pages():
ids.extend(page)
time.sleep(60)
print len(ids)
Hope that helps.
希望有帮助。
回答by aspiringGuru
Use the rate limiting arguments when making the connection. The api will self control within the rate limit.
建立连接时使用速率限制参数。api 会在速率限制内自我控制。
The sleep pause is not bad, I use that to simulate a human and to spread out activity over a time frame with the api rate limiting as a final control.
睡眠暂停还不错,我用它来模拟人类并在一个时间范围内分散活动,并将 api 速率限制作为最终控制。
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)
also add try/except to capture and control errors.
还添加 try/except 来捕获和控制错误。
example code https://github.com/aspiringguru/twitterDataAnalyse/blob/master/sample_rate_limit_w_cursor.py
示例代码 https://github.com/aspiringguru/twitterDataAnalyse/blob/master/sample_rate_limit_w_cursor.py
I put my keys in an external file to make management easier.
我将我的密钥放在一个外部文件中,以便于管理。
https://github.com/aspiringguru/twitterDataAnalyse/blob/master/keys.py
https://github.com/aspiringguru/twitterDataAnalyse/blob/master/keys.py
回答by irritable_phd_syndrom
The answer from alecxe is good, however no one has referred to the docs. The correct information and explanation to answer the question lives in the Twitter API documentation. From the documentation :
来自 alecxe 的回答很好,但是没有人提到文档。回答问题的正确信息和解释位于Twitter API 文档中。从文档:
Results are given in groups of 5,000 user IDs and multiple “pages” of results can be navigated through using the next_cursor value in subsequent requests.
结果以 5,000 个用户 ID 为一组给出,并且可以在后续请求中使用 next_cursor 值浏览多个结果“页面”。
回答by zana saedpanah
I use this code and it works for a large number of followers : there are two functions one for saving followers id after every sleep period and another one to get the list : it is a little missy but I hope to be useful.
我使用此代码,它适用于大量关注者:有两个功能,一个用于在每个睡眠期后保存关注者 ID,另一个用于获取列表:它有点想念,但我希望有用。
def save_followers_status(filename,foloowersid):
path='//content//drive//My Drive//Colab Notebooks//twitter//'+filename
if not (os.path.isfile(path+'_followers_status.csv')):
with open(path+'_followers_status.csv', 'wb') as csvfile:
filewriter = csv.writer(csvfile, delimiter=',')
if len(foloowersid)>0:
print("save followers status of ", filename)
file = path + '_followers_status.csv'
# https: // stackoverflow.com / questions / 3348460 / csv - file - written -with-python - has - blank - lines - between - each - row
with open(file, mode='a', newline='') as csv_file:
writer = csv.writer(csv_file, delimiter=',')
for row in foloowersid:
writer.writerow(np.array(row))
csv_file.closed
def get_followers_id(person):
foloowersid = []
count=0
influencer=api.get_user( screen_name=person)
influencer_id=influencer.id
number_of_followers=influencer.followers_count
print("number of followers count : ",number_of_followers,'\n','user id : ',influencer_id)
status = tweepy.Cursor(api.followers_ids, screen_name=person, tweet_mode="extended").items()
for i in range(0,number_of_followers):
try:
user=next(status)
foloowersid.append([user])
count += 1
except tweepy.TweepError:
print('error limite of twiter sleep for 15 min')
timestamp = time.strftime("%d.%m.%Y %H:%M:%S", time.localtime())
print(timestamp)
if len(foloowersid)>0 :
print('the number get until this time :', count,'all folloers count is : ',number_of_followers)
foloowersid = np.array(str(foloowersid))
save_followers_status(person, foloowersid)
foloowersid = []
time.sleep(15*60)
next(status)
except :
print('end of foloowers ', count, 'all followers count is : ', number_of_followers)
foloowersid = np.array(str(foloowersid))
save_followers_status(person, foloowersid)
foloowersid = []
save_followers_status(person, foloowersid)
# foloowersid = np.array(map(str,foloowersid))
return foloowersid