Python 使用 urllib3 下载文件的最佳方式是什么
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17285464/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's the best way to download file using urllib3
提问by running.t
I would like to download file over HTTPprotocol using urllib3.
I have managed to do this using following code:
我想HTTP使用urllib3. 我已经使用以下代码设法做到了这一点:
url = 'http://url_to_a_file'
connection_pool = urllib3.PoolManager()
resp = connection_pool.request('GET',url )
f = open(filename, 'wb')
f.write(resp.data)
f.close()
resp.release_conn()
But I was wondering what is the properway of doing this. For example will it work well for big files and If no what to do to make this code more bug tolerant and scalable.
但我想知道这样做的正确方法是什么。例如,它是否适用于大文件,如果没有如何使此代码更具容错性和可扩展性。
Note. It is important to me to use urllib3library not urllib2for example, because I want my code to be thread safe.
笔记。例如,urllib3不使用库对我来说很重要urllib2,因为我希望我的代码是线程安全的。
采纳答案by shazow
Your code snippet is close. Two things worth noting:
您的代码片段很接近。有两点值得注意:
If you're using
resp.data, it will consume the entire response and return the connection (you don't need toresp.release_conn()manually). This is fine if you're cool with holding the data in-memory.You could use
resp.read(amt)which will stream the response, but the connection will need to be returned viaresp.release_conn().
如果您正在使用
resp.data,它将消耗整个响应并返回连接(您不需要resp.release_conn()手动)。如果您喜欢将数据保存在内存中,这很好。您可以使用
resp.read(amt)which 来流式传输响应,但连接需要通过resp.release_conn().
This would look something like...
这看起来像......
import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url, preload_content=False)
with open(path, 'wb') as out:
while True:
data = r.read(chunk_size)
if not data:
break
out.write(data)
r.release_conn()
The documentation might be a bit lacking on this scenario. If anyone is interested in making a pull-request to improve the urllib3 documentation, that would be greatly appreciated. :)
在这种情况下,文档可能有点缺乏。如果有人有兴趣提出请求以改进 urllib3 文档,我们将不胜感激。:)
回答by Alecz
The most correct way to do this is probably to get a file-like object that represents the HTTP response and copy it to a real file using shutil.copyfileobj as below:
最正确的方法可能是获取一个表示 HTTP 响应的类文件对象,并使用shutil.copyfileobj 将其复制到真实文件中,如下所示:
url = 'http://url_to_a_file'
c = urllib3.PoolManager()
with c.request('GET',url, preload_content=False) as resp, open(filename, 'wb') as out_file:
shutil.copyfileobj(resp, out_file)
resp.release_conn() # not 100% sure this is required though
回答by Gray
Most easy way with urllib3, you can use shutil do auto-manage packages.
使用 urllib3 最简单的方法,您可以使用 shutil 自动管理软件包。
import urllib3
import shutil
http = urllib3.PoolManager()
with open(filename, 'wb') as out:
r = http.request('GET', url, preload_content=False)
shutil.copyfileobj(r, out)

