我正在使用 Python urllib2 下载文件。如何检查文件大小？

Question

提问by TIMEX

And if it is large...then stop the download? I don't want to download files that are larger than 12MB.

如果它很大...然后停止下载？我不想下载大于 12MB 的文件。

request = urllib2.Request(ep_url)
request.add_header('User-Agent',random.choice(agents))
thefile = urllib2.urlopen(request).read()

Answer 1

回答by Andrew Dalke

There's no need as bobincedid and drop to httplib. You can do all that with urllib directly:

没有必要像bobince那样直接使用 httplib。您可以直接使用 urllib 完成所有这些操作：

>>> import urllib2
>>> f = urllib2.urlopen("http://dalkescientific.com")
>>> f.headers.items()
[('content-length', '7535'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.14'),
 ('last-modified', 'Sun, 09 Mar 2008 00:27:43 GMT'), ('connection', 'close'),
 ('etag', '"19fa87-1d6f-447f627da7dc0"'), ('date', 'Wed, 28 Oct 2009 19:59:10 GMT'),
 ('content-type', 'text/html')]
>>> f.headers["Content-Length"]
'7535'
>>>

If you use httplib then you may have to implement redirect handling, proxy support, and the other nice things that urllib2 does for you.

如果您使用 httplib，那么您可能必须实现重定向处理、代理支持以及 urllib2 为您做的其他好事。

Answer 2

回答by bobince

You could say:

你可以说：

maxlength= 12*1024*1024
thefile= urllib2.urlopen(request).read(maxlength+1)
if len(thefile)==maxlength+1:
    raise ThrowToysOutOfPramException()

but then of course you've still read 12MB of unwanted data. If you want to minimise the risk of this happening you can check the HTTP Content-Length header, if present (it might not be). But to do that you need to drop down to httplibinstead of the more general urllib.

但是当然，您仍然读取了 12MB 不需要的数据。如果您想将发生这种情况的风险降至最低，您可以检查 HTTP Content-Length 标头（如果存在）（可能没有）。但是要做到这一点，您需要下拉到httplib而不是更通用的 urllib。

u= urlparse.urlparse(ep_url)
cn= httplib.HTTPConnection(u.netloc)
cn.request('GET', u.path, headers= {'User-Agent': ua})
r= cn.getresponse()

try:
    l= int(r.getheader('Content-Length', '0'))
except ValueError:
    l= 0
if l>maxlength:
    raise IAmCrossException()

thefile= r.read(maxlength+1)
if len(thefile)==maxlength+1:
    raise IAmStillCrossException()

You can check the length before asking to get the file too, if you prefer. This is basically the same as above, except using the method 'HEAD'instead of 'GET'.

如果您愿意，也可以在要求获取文件之前检查长度。这与上面的基本相同，除了使用方法'HEAD'而不是'GET'。

Answer 3

回答by SeriousCallersOnly

you can check the content-length in a HEAD request first, but be warned, this header doesn't have to be set - see How do you send a HEAD HTTP request in Python 2?

您可以先检查 HEAD 请求中的内容长度，但请注意，不必设置此标头 - 请参阅如何在 Python 2 中发送 HEAD HTTP 请求？

Answer 4

回答by Gourneau

This will work if the Content-Length header is set

如果设置了 Content-Length 标头，这将起作用

import urllib2          
req = urllib2.urlopen("http://example.com/file.zip")
total_size = int(req.info().getheader('Content-Length'))

我正在使用 Python urllib2 下载文件。如何检查文件大小？

提问by TIMEX

回答by Andrew Dalke

回答by bobince

回答by SeriousCallersOnly

回答by Gourneau

相关推荐

最近更新

标签

我正在使用 Python urllib2 下载文件。如何检查文件大小？

提问by TIMEX

回答by Andrew Dalke

回答by bobince

回答by SeriousCallersOnly

回答by Gourneau

相关推荐

python 使 Django 管理员显示主键而不是每个对象的对象类型

如何使用 Python 获得准确的 UTC 时间？

python python中有没有简单的方法可以将数据点外推到未来？

python 如何制作复杂列表的完全非共享副本？（深拷贝是不够的）

相关推荐

最近更新

标签