python 请求超时。获取整个响应
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21965484/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Timeout for python requests.get entire response
提问by Kiarash
I'm gathering statistics on a list of websites and I'm using requests for it for simplicity. Here is my code:
我正在收集有关网站列表的统计数据,并且为了简单起见,我正在使用它的请求。这是我的代码:
data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
r= requests.get(w, verify=False)
data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )
Now, I want requests.getto timeout after 10 seconds so the loop doesn't get stuck.
现在,我想requests.get在 10 秒后超时,这样循环就不会卡住了。
This question has been of interest beforetoo but none of the answers are clean. I will be putting some bounty on this to get a nice answer.
这个问题以前也很有趣,但没有一个答案是干净的。我会为此付出一些代价以获得一个很好的答案。
I hear that maybe not using requests is a good idea but then how should I get the nice things requests offer. (the ones in the tuple)
我听说不使用请求可能是个好主意,但是我应该如何获得请求提供的好东西。(元组中的那些)
采纳答案by Alvaro
What about using eventlet? If you want to timeout the request after 10 seconds, even if data is being received, this snippet will work for you:
使用 eventlet 怎么样?如果您想在 10 秒后超时请求,即使正在接收数据,此代码段也适用于您:
import requests
import eventlet
eventlet.monkey_patch()
with eventlet.Timeout(10):
requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip", verify=False)
回答by Lukasa
Set the timeout parameter:
设置超时参数:
r = requests.get(w, verify=False, timeout=10) # 10 seconds
As long as you don't set stream=Trueon that request, this will cause the call to requests.get()to timeout if the connection takes more than ten seconds, or if the server doesn't send data for more than ten seconds.
只要您没有设置stream=True该请求,requests.get()如果连接时间超过 10 秒,或者服务器未发送数据超过 10 秒,这将导致调用超时。
回答by Dima Tisnek
If it comes to that, create a watchdogthread that messes up requests' internal state after 10 seconds, e.g.:
如果涉及到这一点,请创建一个看门狗线程,在 10 秒后搞乱请求的内部状态,例如:
- closes the underlying socket, and ideally
- triggers an exception if requests retries the operation
- 关闭底层套接字,理想情况下
- 如果请求重试操作,则触发异常
Note that depending on the system libraries you may be unable to set deadline on DNS resolution.
请注意,根据系统库,您可能无法设置 DNS 解析的截止日期。
回答by Chris Johnson
This may be overkill, but the Celery distributed task queue has good support for timeouts.
这可能有点矫枉过正,但 Celery 分布式任务队列对超时有很好的支持。
In particular, you can define a soft time limit that just raises an exception in your process (so you can clean up) and/or a hard time limit that terminates the task when the time limit has been exceeded.
特别是,您可以定义一个软时间限制,它只会在您的流程中引发异常(以便您可以清理)和/或一个硬时间限制,在超过时间限制时终止任务。
Under the covers, this uses the same signals approach as referenced in your "before" post, but in a more usable and manageable way. And if the list of web sites you are monitoring is long, you might benefit from its primary feature -- all kinds of ways to manage the execution of a large number of tasks.
在幕后,这使用了与您在“之前”帖子中引用的相同的信号方法,但以更有用和更易于管理的方式。如果您监视的网站列表很长,您可能会受益于它的主要功能——管理大量任务执行的各种方法。
回答by totokaka
To create a timeout you can use signals.
要创建超时,您可以使用信号。
The best way to solve this case is probably to
解决此案的最佳方法可能是
- Set an exception as the handler for the alarm signal
- Call the alarm signal with a ten second delay
- Call the function inside a
try-except-finallyblock. - The except block is reached if the function timed out.
- In the finally block you abort the alarm, so it's not singnaled later.
- 设置异常作为警报信号的处理程序
- 以 10 秒的延迟调用警报信号
- 在
try-except-finally块内调用函数。 - 如果函数超时,则到达 except 块。
- 在 finally 块中,您中止警报,因此以后不会发出警报。
Here is some example code:
下面是一些示例代码:
import signal
from time import sleep
class TimeoutException(Exception):
""" Simple Exception to be called on timeouts. """
pass
def _timeout(signum, frame):
""" Raise an TimeoutException.
This is intended for use as a signal handler.
The signum and frame arguments passed to this are ignored.
"""
# Raise TimeoutException with system default timeout message
raise TimeoutException()
# Set the handler for the SIGALRM signal:
signal.signal(signal.SIGALRM, _timeout)
# Send the SIGALRM signal in 10 seconds:
signal.alarm(10)
try:
# Do our code:
print('This will take 11 seconds...')
sleep(11)
print('done!')
except TimeoutException:
print('It timed out!')
finally:
# Abort the sending of the SIGALRM signal:
signal.alarm(0)
There are some caveats to this:
对此有一些警告:
- It is not threadsafe, signals are always delivered to the main thread, so you can't put this in any other thread.
- There is a slight delay after the scheduling of the signal and the execution of the actual code. This means that the example would time out even if it only slept for ten seconds.
- 它不是线程安全的,信号总是传递到主线程,所以你不能把它放在任何其他线程中。
- 在信号的调度和实际代码的执行之后有轻微的延迟。这意味着该示例即使只睡了十秒钟也会超时。
But, it's all in the standard python library! Except for the sleep function import it's only one import. If you are going to use timeouts many places You can easily put the TimeoutException, _timeout and the singaling in a function and just call that. Or you can make a decorator and put it on functions, see the answer linked below.
但是,这一切都在标准的 Python 库中!除了 sleep 函数导入,它只有一个导入。如果您打算在很多地方使用超时,您可以轻松地将 TimeoutException、_timeout 和 singaling 放在一个函数中,然后调用它。或者您可以制作一个装饰器并将其放在函数上,请参阅下面链接的答案。
You can also set this up as a "context manager"so you can use it with the withstatement:
您还可以将其设置为“上下文管理器”,以便您可以将其与以下with语句一起使用:
import signal
class Timeout():
""" Timeout for use with the `with` statement. """
class TimeoutException(Exception):
""" Simple Exception to be called on timeouts. """
pass
def _timeout(signum, frame):
""" Raise an TimeoutException.
This is intended for use as a signal handler.
The signum and frame arguments passed to this are ignored.
"""
raise Timeout.TimeoutException()
def __init__(self, timeout=10):
self.timeout = timeout
signal.signal(signal.SIGALRM, Timeout._timeout)
def __enter__(self):
signal.alarm(self.timeout)
def __exit__(self, exc_type, exc_value, traceback):
signal.alarm(0)
return exc_type is Timeout.TimeoutException
# Demonstration:
from time import sleep
print('This is going to take maximum 10 seconds...')
with Timeout(10):
sleep(15)
print('No timeout?')
print('Done')
One possible down side with this context manager approach is that you can't know if the code actually timed out or not.
这种上下文管理器方法的一个可能的缺点是您无法知道代码是否真的超时。
Sources and recommended reading:
来源和推荐阅读:
- The documentation on signals
- This answer on timeouts by @David Narayan. He has organized the above code as a decorator.
- 关于信号的文档
- @David Narayan 关于超时的答案。他把上面的代码组织成了一个装饰器。
回答by Hieu
UPDATE: http://docs.python-requests.org/en/master/user/advanced/#timeouts
更新:http: //docs.python-requests.org/en/master/user/advanced/#timeouts
In new version of requests:
在新版本中requests:
If you specify a single value for the timeout, like this:
如果为超时指定单个值,如下所示:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connectand the readtimeouts. Specify a tuple if you would like to set the values separately:
超时值将应用于connect和read超时。如果要单独设置值,请指定一个元组:
r = requests.get('https://github.com', timeout=(3.05, 27))
If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.
如果远程服务器非常慢,您可以通过传递 None 作为超时值然后检索一杯咖啡来告诉请求永远等待响应。
r = requests.get('https://github.com', timeout=None)
My old (probably outdated) answer (which was posted long time ago):
我的旧(可能已经过时)答案(很久以前发布):
There are other ways to overcome this problem:
还有其他方法可以克服这个问题:
1. Use the TimeoutSauceinternal class
1.使用TimeoutSauce内部类
From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896
来自:https: //github.com/kennethreitz/requests/issues/1928#issuecomment-35811896
import requests from requests.adapters import TimeoutSauce class MyTimeout(TimeoutSauce): def __init__(self, *args, **kwargs): connect = kwargs.get('connect', 5) read = kwargs.get('read', connect) super(MyTimeout, self).__init__(connect=connect, read=read) requests.adapters.TimeoutSauce = MyTimeoutThis code should cause us to set the read timeout as equal to the connect timeout, which is the timeout value you pass on your Session.get() call. (Note that I haven't actually tested this code, so it may need some quick debugging, I just wrote it straight into the GitHub window.)
import requests from requests.adapters import TimeoutSauce class MyTimeout(TimeoutSauce): def __init__(self, *args, **kwargs): connect = kwargs.get('connect', 5) read = kwargs.get('read', connect) super(MyTimeout, self).__init__(connect=connect, read=read) requests.adapters.TimeoutSauce = MyTimeout此代码应该使我们将读取超时设置为等于连接超时,这是您在 Session.get() 调用中传递的超时值。(请注意,我还没有真正测试过这段代码,所以它可能需要一些快速调试,我只是直接把它写进了 GitHub 窗口。)
2. Use a fork of requests from kevinburke:https://github.com/kevinburke/requests/tree/connect-timeout
2. 使用来自 kevinburke 的请求分叉:https : //github.com/kevinburke/requests/tree/connect-timeout
From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst
从其文档:https: //github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
如果为超时指定单个值,如下所示:
r = requests.get('https://github.com', timeout=5)超时值将应用于连接和读取超时。如果要单独设置值,请指定一个元组:
r = requests.get('https://github.com', timeout=(3.05, 27))
kevinburke has requestedit to be merged into the main requests project, but it hasn't been accepted yet.
kevinburke 已要求将其合并到主要请求项目中,但尚未被接受。
回答by Jorge Leitao
I believe you can use multiprocessingand not depend on a 3rd party package:
我相信你可以使用multiprocessing而不是依赖于 3rd 方包:
import multiprocessing
import requests
def call_with_timeout(func, args, kwargs, timeout):
manager = multiprocessing.Manager()
return_dict = manager.dict()
# define a wrapper of `return_dict` to store the result.
def function(return_dict):
return_dict['value'] = func(*args, **kwargs)
p = multiprocessing.Process(target=function, args=(return_dict,))
p.start()
# Force a max. `timeout` or wait for the process to finish
p.join(timeout)
# If thread is still active, it didn't finish: raise TimeoutError
if p.is_alive():
p.terminate()
p.join()
raise TimeoutError
else:
return return_dict['value']
call_with_timeout(requests.get, args=(url,), kwargs={'timeout': 10}, timeout=60)
The timeout passed to kwargsis the timeout to get anyresponse from the server, the argument timeoutis the timeout to get the completeresponse.
传递给kwargs的超时时间是从服务器获取任何响应timeout的超时时间,参数是获取完整响应的超时时间。
回答by Realistic
I came up with a more direct solution that is admittedly ugly but fixes the real problem. It goes a bit like this:
我想出了一个更直接的解决方案,虽然很丑,但解决了真正的问题。它有点像这样:
resp = requests.get(some_url, stream=True)
resp.raw._fp.fp._sock.settimeout(read_timeout)
# This will load the entire response even though stream is set
content = resp.content
You can read the full explanation here
你可以在这里阅读完整的解释
回答by ACEE
this code working for socketError 11004 and 10060......
此代码适用于 socketError 11004 和 10060 ......
# -*- encoding:UTF-8 -*-
__author__ = 'ACE'
import requests
from PyQt4.QtCore import *
from PyQt4.QtGui import *
class TimeOutModel(QThread):
Existed = pyqtSignal(bool)
TimeOut = pyqtSignal()
def __init__(self, fun, timeout=500, parent=None):
"""
@param fun: function or lambda
@param timeout: ms
"""
super(TimeOutModel, self).__init__(parent)
self.fun = fun
self.timeer = QTimer(self)
self.timeer.setInterval(timeout)
self.timeer.timeout.connect(self.time_timeout)
self.Existed.connect(self.timeer.stop)
self.timeer.start()
self.setTerminationEnabled(True)
def time_timeout(self):
self.timeer.stop()
self.TimeOut.emit()
self.quit()
self.terminate()
def run(self):
self.fun()
bb = lambda: requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip")
a = QApplication([])
z = TimeOutModel(bb, 500)
print 'timeout'
a.exec_()
回答by John Smith
Despite the question being about requests, I find this very easy to do with pycurlCURLOPT_TIMEOUTor CURLOPT_TIMEOUT_MS.
尽管问题是关于请求的,但我发现使用pycurl CURLOPT_TIMEOUT或 CURLOPT_TIMEOUT_MS很容易做到这一点。
No threading or signaling required:
不需要线程或信号:
import pycurl
import StringIO
url = 'http://www.example.com/example.zip'
timeout_ms = 1000
raw = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.TIMEOUT_MS, timeout_ms) # total timeout in milliseconds
c.setopt(pycurl.WRITEFUNCTION, raw.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, url)
c.setopt(pycurl.HTTPGET, 1)
try:
c.perform()
except pycurl.error:
traceback.print_exc() # error generated on timeout
pass # or just pass if you don't want to print the error

