Python请求参数/处理api分页

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17777845/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:10:30  来源:igfitidea点击:

Python requests arguments/dealing with api pagination

pythonapihttppaginationpython-requests

提问by crock1255

I'm playing around with the Angel List (AL) API and want to pull all jobs in San San Francisco. Since I couldn't find an active Python wrapper for the api (if I make any headway, I think I'd like to make my own), I'm using the requests library.

我正在尝试使用 Angel List (AL) API 并希望拉取旧金山的所有工作。由于我找不到 api 的活动 Python 包装器(如果我取得任何进展,我想我想自己做),我正在使用 requests 库。

The AL API's results are paginated, and I can't figure out how to move beyond the first page of the results.

AL API 的结果是分页的,我不知道如何移出结果的第一页。

Here is my code:

这是我的代码:

import requests
r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
r_sanfran.keys()
# returns [u'per_page', u'last_page', u'total', u'jobs', u'page']
r_sanfran['last_page']
#returns 16
r_sanfran['page']
# returns 1

I tried adding arguments to requests.get, but that didn't work. I also tried something really dumb - changing the value of the 'page' key like that was magically going to paginate for me.

我尝试向 中添加参数requests.get,但这没有用。我还尝试了一些非常愚蠢的事情 - 像这样更改“页面”键的值对我来说神奇地进行分页。

eg. r_sanfran['page'] = 2

例如。 r_sanfran['page'] = 2

I'm guessing it's something relatively simple, but I can't seem to figure it out so any help would be awesome.

我猜这是相对简单的事情,但我似乎无法弄清楚,所以任何帮助都会很棒。

Thanks as always.

一如既往的感谢。

Angel List API documentationif it's helpful.

如果有帮助,请参阅 Angel List API 文档

采纳答案by alecxe

Read last_pageand make a get request for each page in the range:

读取last_page范围内的每个页面并发出一个 get 请求:

import requests

r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
num_pages = r_sanfran['last_page']

for page in range(2, num_pages + 1):
    r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs", params={'page': page}).json()
    print r_sanfran['page']
    # TODO: extract the data

回答by dh762

Improving on @alecxe's answer: if you use a Python Generator and a requests HTTP session you can improve the performance and resource usage if you are querying lots of pages or very large pages.

改进@alecxe 的回答:如果您使用 Python 生成器和请求 HTTP 会话,如果您要查询大量页面或非常大的页面,则可以提高性能和资源使用率。

import requests

session = requests.Session()

def get_jobs():
    url = "https://api.angel.co/1/tags/1664/jobs" 
    first_page = session.get(url).json()
    yield first_page
    num_pages = first_page['last_page']

    for page in range(2, num_pages + 1):
        next_page = session.get(url, params={'page': page}).json()
        yield next_page

for page in get_jobs():
    # TODO: process the page

回答by joshlsullivan

I came across a scenario where the API didn't return pages but rather a min/max value. I created this, and I think it will work for both situations. This will automatically increase the increment until it reaches the end, and then it will stop the while loop.

我遇到过 API 不返回页面而是返回最小值/最大值的情况。我创建了这个,我认为它适用于两种情况。这将自动增加增量直到它到达末尾,然后它将停止 while 循环。

max_version = [1]
while len(max_version) > 0:
    r = requests.get(url, headers=headers, params={"page": max_version[0]}).json()
    next_page = r['page']
    if next_page is not None:
        max_version[0] = next_page
        Process data...
    else:
        max_version.clear() # Stop the while loop