Python 为什么 requests.get() 不返回？requests.get() 使用的默认超时是多少？

Question

提问by Nawaz

In my script, requests.getnever returns:

在我的脚本中，requests.get永远不会返回：

import requests

print ("requesting..")

# This call never returns!
r = requests.get(
    "http://www.some-site.com",
    proxies = {'http': '222.255.169.74:8080'},
)

print(r.ok)

What could be the possible reason(s)? Any remedy? What is the default timeout that getuses?

可能的原因是什么？有什么补救办法吗？get使用的默认超时是多少？

Answer 1

采纳答案by ron rothman

What is the default timeout that get uses?

get 使用的默认超时是多少？

The default timeout is None, which means it'll wait (hang) until the connection is closed.

默认超时为None，这意味着它将等待（挂起）直到连接关闭。

What happens when you pass in a timeout value?

当您传入超时值时会发生什么？

r = requests.get(
    'http://www.justdial.com',
    proxies={'http': '222.255.169.74:8080'},
    timeout=5
)

Answer 2

回答by Hieu

From requests documentation:

从请求文档：

You can tell Requests to stop waiting for a response after a given number of seconds with the timeout parameter:
>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
Note:
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds).

您可以使用 timeout 参数告诉 Requests 在给定秒数后停止等待响应：
>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
笔记：
timeout 不是整个响应下载的时间限制；相反，如果服务器在 timeout 秒内没有发出响应（更准确地说，如果在 timeout 秒内没有在底层套接字上接收到字节），则会引发异常。

It happens a lot to me that requests.get() takes a very long time to return even if the timeoutis 1 second. There are a few way to overcome this problem:

我经常遇到 requests.get() 需要很长时间才能返回，即使timeout是 1 秒。有几种方法可以克服这个问题：

1. Use the TimeoutSauceinternal class

1.使用TimeoutSauce内部类

From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896

来自：https: //github.com/kennethreitz/requests/issues/1928#issuecomment-35811896

import requests from requests.adapters import TimeoutSauce

class MyTimeout(TimeoutSauce):
    def __init__(self, *args, **kwargs):
        if kwargs['connect'] is None:
            kwargs['connect'] = 5
        if kwargs['read'] is None:
            kwargs['read'] = 5
        super(MyTimeout, self).__init__(*args, **kwargs)

requests.adapters.TimeoutSauce = MyTimeout
This code should cause us to set the read timeout as equal to the connect timeout, which is the timeout value you pass on your Session.get() call. (Note that I haven't actually tested this code, so it may need some quick debugging, I just wrote it straight into the GitHub window.)

import requests from requests.adapters import TimeoutSauce

class MyTimeout(TimeoutSauce):
    def __init__(self, *args, **kwargs):
        if kwargs['connect'] is None:
            kwargs['connect'] = 5
        if kwargs['read'] is None:
            kwargs['read'] = 5
        super(MyTimeout, self).__init__(*args, **kwargs)

requests.adapters.TimeoutSauce = MyTimeout
此代码应该使我们将读取超时设置为等于连接超时，这是您在 Session.get() 调用中传递的超时值。（请注意，我还没有真正测试过这段代码，所以它可能需要一些快速调试，我只是直接把它写进了 GitHub 窗口。）

2. Use a fork of requests from kevinburke:https://github.com/kevinburke/requests/tree/connect-timeout

2. 使用来自 kevinburke 的请求分叉：https : //github.com/kevinburke/requests/tree/connect-timeout

From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst

从其文档：https: //github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst

If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:
r = requests.get('https://github.com', timeout=(3.05, 27))

如果为超时指定单个值，如下所示：
r = requests.get('https://github.com', timeout=5)
超时值将应用于连接和读取超时。如果要单独设置值，请指定一个元组：
r = requests.get('https://github.com', timeout=(3.05, 27))

NOTE: The change has since been merged to the main Requests project.

注意：此更改已合并到主请求项目中。

3. Using evenletor signalas already mentioned in the similar question:Timeout for python requests.get entire response

3.使用evenlet或signal在类似问题中已经提到：Timeout for python requests.get整个响应

Answer 3

回答by Alex Polekha

Reviewed all the answers and came to conclusion that the problem still exists. On some sites requests may hang infinitely and using multiprocessing seems to be overkill. Here's my approach(Python 3.5+):

查看所有答案并得出结论，问题仍然存在。在某些站点上，请求可能会无限挂起，使用多处理似乎有点过头了。这是我的方法（Python 3.5+）：

import asyncio

import aiohttp


async def get_http(url):
    async with aiohttp.ClientSession(conn_timeout=1, read_timeout=3) as client:
        try:
            async with client.get(url) as response:
                content = await response.text()
                return content, response.status
        except Exception:
            pass


loop = asyncio.get_event_loop()
task = loop.create_task(get_http('http://example.com'))
loop.run_until_complete(task)
result = task.result()
if result is not None:
    content, status = task.result()
    if status == 200:
        print(content)

UPDATE

更新

If you receive a deprecation warning about using conn_timeout and read_timeout, check near the bottom of THISreference for how to use the ClientTimeout data structure. One simple way to apply this data structure per the linked reference to the original code above would be:

如果您收到有关使用 conn_timeout 和 read_timeout 的弃用警告，请在此参考的底部附近查看如何使用 ClientTimeout 数据结构。根据对上述原始代码的链接引用来应用此数据结构的一种简单方法是：

async def get_http(url):
    timeout = aiohttp.ClientTimeout(total=60)
    async with aiohttp.ClientSession(timeout=timeout) as client:
        try:
            etc.

Answer 4

回答by Tim Richardson

I wanted a default timeout easily added to a bunch of code (assuming that timeout solves your problem)

我想要一个默认超时很容易添加到一堆代码中（假设超时解决了你的问题）

This is the solution I picked up from a ticket submitted to the repository for Requests.

这是我从提交给请求存储库的票证中获取的解决方案。

credit: https://github.com/kennethreitz/requests/issues/2011#issuecomment-477784399

信用：https: //github.com/kennethreitz/requests/issues/2011#issuecomment-477784399

The solution is the last couple of lines here, but I show more code for better context. I like to use a session for retry behaviour.

解决方案是这里的最后几行，但我展示了更多代码以获得更好的上下文。我喜欢使用会话进行重试行为。

import requests
import functools
from requests.adapters import HTTPAdapter,Retry


def requests_retry_session(
        retries=10,
        backoff_factor=2,
        status_forcelist=(500, 502, 503, 504),
        session=None,
        ) -> requests.Session:
    session = session or requests.Session()
    retry = Retry(
            total=retries,
            read=retries,
            connect=retries,
            backoff_factor=backoff_factor,
            status_forcelist=status_forcelist,
            )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    # set default timeout
    for method in ('get', 'options', 'head', 'post', 'put', 'patch', 'delete'):
        setattr(session, method, functools.partial(getattr(session, method), timeout=30))
    return session

then you can do something like this:

那么你可以做这样的事情：

requests_session = requests_retry_session()
r = requests_session.get(url=url,...

Answer 5

回答by Fruit

In my case, the reason of "requests.get never returns" is because requests.get()attempt to connect to the host resolved with ipv6 ip first. If something went wrong to connect that ipv6 ip and get stuck, then it retries ipv4 iponly if I explicit set timeout=<N seconds>and hit the timeout.

在我的情况下，“requests.get never returns”的原因是因为requests.get()尝试连接到使用 ipv6 ip first 解析的主机。如果连接该 ipv6 ip 时出现问题并卡住，那么只有在我明确设置并达到超时时，它才会重试ipv4 iptimeout=<N seconds>。

My solution is monkey-patchingthe python socketto ignore ipv6(or ipv4 if ipv4 not working), either this answeror this answerare works for me.

我的解决方案是对 python进行猴子修补socket以忽略 ipv6（如果 ipv4 不起作用，则为 ipv4），无论是这个答案还是这个答案都适合我。

You might wondering why curlcommand is works, because curlconnect ipv4 without waiting for ipv6 complete. You can trace the socket syscalls with strace -ff -e network -s 10000 -- curl -vLk '<your url>'command. For python, strace -ff -e network -s 10000 -- python3 <your python script>command can be used.

您可能想知道为什么curl命令有效，因为curl无需等待 ipv6 完成即可连接 ipv4。您可以使用strace -ff -e network -s 10000 -- curl -vLk '<your url>'命令跟踪套接字系统调用。对于python，strace -ff -e network -s 10000 -- python3 <your python script>可以使用命令。

Answer 6

回答by Erik Aronesty

Patching the documented "send" function will fix this for all requests - even in many dependent libraries and sdk's. When patching libs, be sure to patch supported/documented functions, not TimeoutSauce - otherwise you may wind up silently losing the effect of your patch.

修补记录的“发送”功能将为所有请求修复此问题 - 即使在许多依赖库和 sdk 中也是如此。修补库时，请确保修补支持/记录的功能，而不是 TimeoutSauce - 否则您可能会默默地失去补丁的效果。

import requests

DEFAULT_TIMEOUT = 180

old_send = requests.Session.send

def new_send(*args, **kwargs):
     if kwargs.get("timeout", None) is None:
         kwargs["timeout"] = DEFAULT_TIMEOUT
     return old_send(*args, **kwargs)

requests.Session.send = new_send

The effects of not having any timeout are quite severe, and the use of a default timeout can almost never break anything - because TCP itself has default timeouts as well.

没有任何超时的影响非常严重，使用默认超时几乎不会破坏任何东西——因为 TCP 本身也有默认超时。

Python 为什么 requests.get() 不返回？requests.get() 使用的默认超时是多少？

提问by Nawaz

采纳答案by ron rothman

回答by Hieu

回答by Alex Polekha

UPDATE

更新

回答by Tim Richardson

回答by Fruit

回答by Erik Aronesty

相关推荐

最近更新

标签

Python 为什么 requests.get() 不返回？requests.get() 使用的默认超时是多少？

提问by Nawaz

采纳答案by ron rothman

回答by Hieu

回答by Alex Polekha

UPDATE

更新

回答by Tim Richardson

回答by Fruit

回答by Erik Aronesty

相关推荐

Python Flask 使用按钮调用函数

Python 无法在 Flask 模板中显示来自 STATIC_FOLDER 的图像

在 Python 中的列表列表中查找最长的列表

Python 403 Django 和 mod_wsgi 的禁止错误

相关推荐

最近更新

标签