我正在使用 Python urllib2 下载文件。如何检查文件大小?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1636637/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
I am downloading a file using Python urllib2. How do I check how large the file size is?
提问by TIMEX
And if it is large...then stop the download? I don't want to download files that are larger than 12MB.
如果它很大...然后停止下载?我不想下载大于 12MB 的文件。
request = urllib2.Request(ep_url)
request.add_header('User-Agent',random.choice(agents))
thefile = urllib2.urlopen(request).read()
回答by Andrew Dalke
There's no need as bobincedid and drop to httplib. You can do all that with urllib directly:
没有必要像bobince那样直接使用 httplib。您可以直接使用 urllib 完成所有这些操作:
>>> import urllib2
>>> f = urllib2.urlopen("http://dalkescientific.com")
>>> f.headers.items()
[('content-length', '7535'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.14'),
('last-modified', 'Sun, 09 Mar 2008 00:27:43 GMT'), ('connection', 'close'),
('etag', '"19fa87-1d6f-447f627da7dc0"'), ('date', 'Wed, 28 Oct 2009 19:59:10 GMT'),
('content-type', 'text/html')]
>>> f.headers["Content-Length"]
'7535'
>>>
If you use httplib then you may have to implement redirect handling, proxy support, and the other nice things that urllib2 does for you.
如果您使用 httplib,那么您可能必须实现重定向处理、代理支持以及 urllib2 为您做的其他好事。
回答by bobince
You could say:
你可以说:
maxlength= 12*1024*1024
thefile= urllib2.urlopen(request).read(maxlength+1)
if len(thefile)==maxlength+1:
raise ThrowToysOutOfPramException()
but then of course you've still read 12MB of unwanted data. If you want to minimise the risk of this happening you can check the HTTP Content-Length header, if present (it might not be). But to do that you need to drop down to httplibinstead of the more general urllib.
但是当然,您仍然读取了 12MB 不需要的数据。如果您想将发生这种情况的风险降至最低,您可以检查 HTTP Content-Length 标头(如果存在)(可能没有)。但是要做到这一点,您需要下拉到httplib而不是更通用的 urllib。
u= urlparse.urlparse(ep_url)
cn= httplib.HTTPConnection(u.netloc)
cn.request('GET', u.path, headers= {'User-Agent': ua})
r= cn.getresponse()
try:
l= int(r.getheader('Content-Length', '0'))
except ValueError:
l= 0
if l>maxlength:
raise IAmCrossException()
thefile= r.read(maxlength+1)
if len(thefile)==maxlength+1:
raise IAmStillCrossException()
You can check the length before asking to get the file too, if you prefer. This is basically the same as above, except using the method 'HEAD'
instead of 'GET'
.
如果您愿意,也可以在要求获取文件之前检查长度。这与上面的基本相同,除了使用方法'HEAD'
而不是'GET'
。
回答by SeriousCallersOnly
you can check the content-length in a HEAD request first, but be warned, this header doesn't have to be set - see How do you send a HEAD HTTP request in Python 2?
您可以先检查 HEAD 请求中的内容长度,但请注意,不必设置此标头 - 请参阅如何在 Python 2 中发送 HEAD HTTP 请求?
回答by Gourneau
This will work if the Content-Length header is set
如果设置了 Content-Length 标头,这将起作用
import urllib2
req = urllib2.urlopen("http://example.com/file.zip")
total_size = int(req.info().getheader('Content-Length'))