执行多个请求时如何加速 Python 的 urllib2

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2009243/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 23:33:01  来源:igfitidea点击:

How to Speed Up Python's urllib2 when doing multiple requests

pythonhttpurllib2

提问by speedplane

I am making several http requests to a particular host using python's urllib2 library. Each time a request is made a new tcp and http connection is created which takes a noticeable amount of time. Is there any way to keep the tcp/http connection alive using urllib2?

我正在使用 python 的 urllib2 库向特定主机发出几个 http 请求。每次发出请求时都会创建一个新的 tcp 和 http 连接,这需要花费大量时间。有什么方法可以使用 urllib2 保持 tcp/http 连接有效?

回答by Corey Goldberg

If you switch to httplib, you will have finer control over the underlying connection.

如果您切换到httplib,您将对底层连接有更好的控制。

For example:

例如:

import httplib

conn = httplib.HTTPConnection(url)

conn.request('GET', '/foo')
r1 = conn.getresponse()
r1.read()

conn.request('GET', '/bar')
r2 = conn.getresponse()
r2.read()

conn.close()

This would send 2 HTTP GETs on the same underlying TCP connection.

这将在同一个底层 TCP 连接上发送 2 个 HTTP GET。

回答by Greg Haskins

I've used the third-party urllib3library to good effect in the past. It's designed to complement urllib2by pooling connections for reuse.

以前用过第三方urllib3库,效果不错。它旨在urllib2通过汇集连接以供重用来补充。

Modified example from the wiki:

来自维基的修改示例:

>>> from urllib3 import HTTPConnectionPool
>>> # Create a connection pool for a specific host
... http_pool = HTTPConnectionPool('www.google.com')
>>> # simple GET request, for example
... r = http_pool.urlopen('GET', '/')
>>> print r.status, len(r.data)
200 28050
>>> r = http_pool.urlopen('GET', '/search?q=hello+world')
>>> print r.status, len(r.data)
200 79124

回答by Collin Anderson

If you need something more automatic than plain httplib, this might help, though it's not threadsafe.

如果您需要比普通 httplib 更自动的东西,这可能会有所帮助,尽管它不是线程安全的。

try:
    from http.client import HTTPConnection, HTTPSConnection
except ImportError:
    from httplib import HTTPConnection, HTTPSConnection
import select
connections = {}


def request(method, url, body=None, headers={}, **kwargs):
    scheme, _, host, path = url.split('/', 3)
    h = connections.get((scheme, host))
    if h and select.select([h.sock], [], [], 0)[0]:
        h.close()
        h = None
    if not h:
        Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection
        h = connections[(scheme, host)] = Connection(host, **kwargs)
    h.request(method, '/' + path, body, headers)
    return h.getresponse()


def urlopen(url, data=None, *args, **kwargs):
    resp = request('POST' if data else 'GET', url, data, *args, **kwargs)
    assert resp.status < 400, (resp.status, resp.reason, resp.read())
    return resp