Python Twitter API - 获取具有特定 id 的推文

Question

提问by Crista23

I have a list of tweet ids for which I would like to download their text content. Is there any easy solution to do this, preferably through a Python script? I had a look at other libraries like Tweepy and things don't appear to work so simple, and downloading them manually is out of the question since my list is very long.

我有一个推文 ID 列表，我想下载它们的文本内容。是否有任何简单的解决方案可以做到这一点，最好是通过 Python 脚本？我查看了其他库，如 Tweepy，但事情似乎并不那么简单，手动下载它们是不可能的，因为我的列表很长。

Answer 1

采纳答案by Martijn Pieters

You can access specific tweets by their id with the statuses/show/:idAPI route. Most Python Twitter libraries follow the exact same patterns, or offer 'friendly' names for the methods.

您可以使用statuses/show/:idAPI 路由通过 id 访问特定推文。大多数 Python Twitter 库都遵循完全相同的模式，或者为方法提供“友好”名称。

For example, Twythonoffers several show_*methods, including Twython.show_status()that lets you load specific tweets:

例如，Twython提供了多种show_*方法，包括Twython.show_status()让您加载特定推文的方法：

CONSUMER_KEY = "<consumer key>"
CONSUMER_SECRET = "<consumer secret>"
OAUTH_TOKEN = "<application key>"
OAUTH_TOKEN_SECRET = "<application secret"
twitter = Twython(
    CONSUMER_KEY, CONSUMER_SECRET,
    OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

tweet = twitter.show_status(id=id_of_tweet)
print(tweet['text'])

and the returned dictionary follows the Tweet object definitiongiven by the API.

并且返回的字典遵循API 给出的Tweet 对象定义。

The tweepylibraryuses tweepy.get_status():

该tweepy库的用途tweepy.get_status()：

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
api = tweepy.API(auth)

tweet = api.get_status(id_of_tweet)
print(tweet.text)

where it returns a slightly richer object, but the attributes on it again reflect the published API.

它返回一个稍微丰富的对象，但它的属性再次反映了发布的 API。

Answer 2

回答by verdverm

You can access tweets in bulk (up to 100 at a time) with the status/lookup endpoint: https://dev.twitter.com/rest/reference/get/statuses/lookup

您可以使用 status/lookup 端点批量访问推文（一次最多 100 条）：https: //dev.twitter.com/rest/reference/get/statuses/lookup

Answer 3

回答by chrisinmtown

Sharing my work that was vastly accelerated by the previous answers (thank you). This Python 2.7 script fetches the text for tweet IDs stored in a file. Adjust get_tweet_id() for your input data format; original configured for data at https://github.com/mdredze/twitter_sandy

分享我之前的答案大大加快了我的工作（谢谢）。这个 Python 2.7 脚本获取存储在文件中的推文 ID 的文本。为您的输入数据格式调整 get_tweet_id()；原始数据配置在https://github.com/mdredze/twitter_sandy

Update April 2018:responding late to @someone bug report (thank you). This script no longer discards every 100th tweet ID (that was my bug). Please note that if a tweet is unavailable for whatever reason, the bulk fetch silently skips it. The script now warns if the response size is different from the request size.

2018 年 4 月更新：对@someone 错误报告的回应迟到（谢谢）。此脚本不再丢弃每 100 个推文 ID（这是我的错误）。请注意，如果推文由于某种原因不可用，批量提取会默默地跳过它。如果响应大小与请求大小不同，脚本现在会发出警告。

'''
Gets text content for tweet IDs
'''

# standard
from __future__ import print_function
import getopt
import logging
import os
import sys
# import traceback
# third-party: `pip install tweepy`
import tweepy

# global logger level is configured in main()
Logger = None

# Generate your own at https://apps.twitter.com/app
CONSUMER_KEY = 'Consumer Key (API key)'
CONSUMER_SECRET = 'Consumer Secret (API Secret)'
OAUTH_TOKEN = 'Access Token'
OAUTH_TOKEN_SECRET = 'Access Token Secret'

# batch size depends on Twitter limit, 100 at this time
batch_size=100

def get_tweet_id(line):
    '''
    Extracts and returns tweet ID from a line in the input.
    '''
    (tagid,_timestamp,_sandyflag) = line.split('\t')
    (_tag, _search, tweet_id) = tagid.split(':')
    return tweet_id

def get_tweets_single(twapi, idfilepath):
    '''
    Fetches content for tweet IDs in a file one at a time,
    which means a ton of HTTPS requests, so NOT recommended.

    `twapi`: Initialized, authorized API object from Tweepy
    `idfilepath`: Path to file containing IDs
    '''
    # process IDs from the file
    with open(idfilepath, 'rb') as idfile:
        for line in idfile:
            tweet_id = get_tweet_id(line)
            Logger.debug('get_tweets_single: fetching tweet for ID %s', tweet_id)
            try:
                tweet = twapi.get_status(tweet_id)
                print('%s,%s' % (tweet_id, tweet.text.encode('UTF-8')))
            except tweepy.TweepError as te:
                Logger.warn('get_tweets_single: failed to get tweet ID %s: %s', tweet_id, te.message)
                # traceback.print_exc(file=sys.stderr)
        # for
    # with

def get_tweet_list(twapi, idlist):
    '''
    Invokes bulk lookup method.
    Raises an exception if rate limit is exceeded.
    '''
    # fetch as little metadata as possible
    tweets = twapi.statuses_lookup(id_=idlist, include_entities=False, trim_user=True)
    if len(idlist) != len(tweets):
        Logger.warn('get_tweet_list: unexpected response size %d, expected %d', len(tweets), len(idlist))
    for tweet in tweets:
        print('%s,%s' % (tweet.id, tweet.text.encode('UTF-8')))

def get_tweets_bulk(twapi, idfilepath):
    '''
    Fetches content for tweet IDs in a file using bulk request method,
    which vastly reduces number of HTTPS requests compared to above;
    however, it does not warn about IDs that yield no tweet.

    `twapi`: Initialized, authorized API object from Tweepy
    `idfilepath`: Path to file containing IDs
    '''    
    # process IDs from the file
    tweet_ids = list()
    with open(idfilepath, 'rb') as idfile:
        for line in idfile:
            tweet_id = get_tweet_id(line)
            Logger.debug('Enqueing tweet ID %s', tweet_id)
            tweet_ids.append(tweet_id)
            # API limits batch size
            if len(tweet_ids) == batch_size:
                Logger.debug('get_tweets_bulk: fetching batch of size %d', batch_size)
                get_tweet_list(twapi, tweet_ids)
                tweet_ids = list()
    # process remainder
    if len(tweet_ids) > 0:
        Logger.debug('get_tweets_bulk: fetching last batch of size %d', len(tweet_ids))
        get_tweet_list(twapi, tweet_ids)

def usage():
    print('Usage: get_tweets_by_id.py [options] file')
    print('    -s (single) makes one HTTPS request per tweet ID')
    print('    -v (verbose) enables detailed logging')
    sys.exit()

def main(args):
    logging.basicConfig(level=logging.WARN)
    global Logger
    Logger = logging.getLogger('get_tweets_by_id')
    bulk = True
    try:
        opts, args = getopt.getopt(args, 'sv')
    except getopt.GetoptError:
        usage()
    for opt, _optarg in opts:
        if opt in ('-s'):
            bulk = False
        elif opt in ('-v'):
            Logger.setLevel(logging.DEBUG)
            Logger.debug("main: verbose mode on")
        else:
            usage()
    if len(args) != 1:
        usage()
    idfile = args[0]
    if not os.path.isfile(idfile):
        print('Not found or not a file: %s' % idfile, file=sys.stderr)
        usage()

    # connect to twitter
    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
    api = tweepy.API(auth)

    # hydrate tweet IDs
    if bulk:
        get_tweets_bulk(api, idfile)
    else:
        get_tweets_single(api, idfile)

if __name__ == '__main__':
    main(sys.argv[1:])

Answer 4

回答by Someone

I don't have enough reputation to add an actual comment so sadly this is the way to go:

我没有足够的声誉来添加实际评论，很遗憾，这是要走的路：

I found a bug and a strange thing in chrisinmtown answer:

我在 chrisinmtown 答案中发现了一个错误和一件奇怪的事情：

Every 100th tweet will be skipped due to the bug. Here is a simple solution:

由于该错误，每 100 条推文都会被跳过。这是一个简单的解决方案：

        if len(tweet_ids) < 100:
            tweet_ids.append(tweet_id)
        else:
            tweet_ids.append(tweet_id)
            get_tweet_list(twapi, tweet_ids)
            tweet_ids = list()

Using is better since it works even past the rate limit.

使用更好，因为它甚至可以超过速率限制。

api = tweepy.API(auth_handler=auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

Python Twitter API - 获取具有特定 id 的推文

提问by Crista23

采纳答案by Martijn Pieters

回答by verdverm

回答by chrisinmtown

回答by Someone

相关推荐

最近更新

标签

Python Twitter API - 获取具有特定 id 的推文

提问by Crista23

采纳答案by Martijn Pieters

回答by verdverm

回答by chrisinmtown

回答by Someone

相关推荐

Python 将下拉菜单中的值传递给 Flask 模板

Python 导入错误：没有名为“加密”的模块

Python 如何从 scikit-learn 解释决策树

如何在 Python 中创建全零数据框

相关推荐

最近更新

标签