Python请求参数/处理api分页

Question

提问by crock1255

I'm playing around with the Angel List (AL) API and want to pull all jobs in San San Francisco. Since I couldn't find an active Python wrapper for the api (if I make any headway, I think I'd like to make my own), I'm using the requests library.

我正在尝试使用 Angel List (AL) API 并希望拉取旧金山的所有工作。由于我找不到 api 的活动 Python 包装器（如果我取得任何进展，我想我想自己做），我正在使用 requests 库。

The AL API's results are paginated, and I can't figure out how to move beyond the first page of the results.

AL API 的结果是分页的，我不知道如何移出结果的第一页。

Here is my code:

这是我的代码：

import requests
r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
r_sanfran.keys()
# returns [u'per_page', u'last_page', u'total', u'jobs', u'page']
r_sanfran['last_page']
#returns 16
r_sanfran['page']
# returns 1

I tried adding arguments to requests.get, but that didn't work. I also tried something really dumb - changing the value of the 'page' key like that was magically going to paginate for me.

我尝试向中添加参数requests.get，但这没有用。我还尝试了一些非常愚蠢的事情 - 像这样更改“页面”键的值对我来说神奇地进行分页。

eg. r_sanfran['page'] = 2

例如。 r_sanfran['page'] = 2

I'm guessing it's something relatively simple, but I can't seem to figure it out so any help would be awesome.

我猜这是相对简单的事情，但我似乎无法弄清楚，所以任何帮助都会很棒。

Thanks as always.

一如既往的感谢。

Angel List API documentationif it's helpful.

如果有帮助，请参阅 Angel List API 文档。

Answer 1

采纳答案by alecxe

Read last_pageand make a get request for each page in the range:

读取last_page范围内的每个页面并发出一个 get 请求：

import requests

r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
num_pages = r_sanfran['last_page']

for page in range(2, num_pages + 1):
    r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs", params={'page': page}).json()
    print r_sanfran['page']
    # TODO: extract the data

Answer 2

回答by dh762

Improving on @alecxe's answer: if you use a Python Generator and a requests HTTP session you can improve the performance and resource usage if you are querying lots of pages or very large pages.

改进@alecxe 的回答：如果您使用 Python 生成器和请求 HTTP 会话，如果您要查询大量页面或非常大的页面，则可以提高性能和资源使用率。

import requests

session = requests.Session()

def get_jobs():
    url = "https://api.angel.co/1/tags/1664/jobs" 
    first_page = session.get(url).json()
    yield first_page
    num_pages = first_page['last_page']

    for page in range(2, num_pages + 1):
        next_page = session.get(url, params={'page': page}).json()
        yield next_page

for page in get_jobs():
    # TODO: process the page

Answer 3

回答by joshlsullivan

I came across a scenario where the API didn't return pages but rather a min/max value. I created this, and I think it will work for both situations. This will automatically increase the increment until it reaches the end, and then it will stop the while loop.

我遇到过 API 不返回页面而是返回最小值/最大值的情况。我创建了这个，我认为它适用于两种情况。这将自动增加增量直到它到达末尾，然后它将停止 while 循环。

max_version = [1]
while len(max_version) > 0:
    r = requests.get(url, headers=headers, params={"page": max_version[0]}).json()
    next_page = r['page']
    if next_page is not None:
        max_version[0] = next_page
        Process data...
    else:
        max_version.clear() # Stop the while loop

Python请求参数/处理api分页

提问by crock1255

采纳答案by alecxe

回答by dh762

回答by joshlsullivan

相关推荐

最近更新

标签

Python请求参数/处理api分页

提问by crock1255

采纳答案by alecxe

回答by dh762

回答by joshlsullivan

相关推荐

Python PySpark 从 TimeStampType 列向 DataFrame 添加一列

Python 简单的 ttk ComboBox 演示

Python PyMysql 更新查询

Python Errno 22:invalid mode('rb') or filename:' ' 使用 pyinstaller 运行规范文件时

相关推荐

最近更新

标签