python 使用部分下载 (HTTP) 下载文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1798879/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Download file using partial download (HTTP)
提问by Konstantin
Is there a way to download huge and still growing file over HTTP using the partial-download feature?
有没有办法使用部分下载功能通过 HTTP 下载巨大且仍在增长的文件?
It seems that this code downloads file from scratch every time it executed:
似乎这段代码每次执行时都会从头开始下载文件:
import urllib
urllib.urlretrieve ("http://www.example.com/huge-growing-file", "huge-growing-file")
I'd like:
我想要:
- To fetch just the newly-written data
- Download from scratch only if the source file becomes smaller (for example it has been rotated).
- 只获取新写入的数据
- 仅当源文件变小(例如已旋转)时才从头开始下载。
回答by Nadia Alramli
It is possible to do partial download using the range header, the following will request a selected range of bytes:
可以使用范围标头进行部分下载,以下将请求选定的字节范围:
req = urllib2.Request('http://www.python.org/')
req.headers['Range'] = 'bytes=%s-%s' % (start, end)
f = urllib2.urlopen(req)
For example:
例如:
>>> req = urllib2.Request('http://www.python.org/')
>>> req.headers['Range'] = 'bytes=%s-%s' % (100, 150)
>>> f = urllib2.urlopen(req)
>>> f.read()
'l1-transitional.dtd">\n\n\n<html xmlns="http://www.w3.'
Using this header you can resume partial downloads. In your case all you have to do is to keep track of already downloaded size and request a new range.
使用此标题,您可以恢复部分下载。在您的情况下,您所要做的就是跟踪已下载的大小并请求一个新的范围。
Keep in mind that the server need to accept this header for this to work.
请记住,服务器需要接受此标头才能使其工作。
回答by Conrad Meyer
This is quite easy to do using TCP sockets and raw HTTP. The relevant request header is "Range".
使用 TCP 套接字和原始 HTTP 很容易做到这一点。相关的请求头是“Range”。
An example request might look like:
示例请求可能如下所示:
mysock = connect(("www.example.com", 80))
mysock.write(
"GET /huge-growing-file HTTP/1.1\r\n"+\
"Host: www.example.com\r\n"+\
"Range: bytes=XXXX-\r\n"+\
"Connection: close\r\n\r\n")
Where XXXX represents the number of bytes you've already retrieved. Then you can read the response headers and any content from the server. If the server returns a header like:
其中 XXXX 表示您已检索的字节数。然后你可以从服务器读取响应头和任何内容。如果服务器返回一个标头,如:
Content-Length: 0
You know you've got the entire file.
你知道你已经得到了整个文件。
If you want to be particularly nice as an HTTP client you can look into "Connection: keep-alive". Perhaps there is a python library that does everything I have described (perhaps even urllib2 does it!) but I'm not familiar with one.
如果你想成为一个特别好的 HTTP 客户端,你可以查看“Connection: keep-alive”。也许有一个 python 库可以完成我所描述的所有事情(甚至 urllib2 也可以做到!)但我不熟悉一个。
回答by mpez0
If I understand your question correctly, the file is not changing during download, but is updated regularly. If that is the question, rsyncis the answer.
如果我正确理解您的问题,则文件在下载期间不会更改,而是会定期更新。如果这是问题,rsync就是答案。
If the file is being updated continually including during download, you'll need to modify rsync or a bittorrent program. They split files into separate chunks and download or update the chunks independently. When you get to the end of the file from the first iteration, repeat to get the appended chunk; continue as necessary. With less efficiency, one could just repeatedly rsync.
如果文件在下载期间不断更新,则需要修改 rsync 或 bittorrent 程序。它们将文件拆分为单独的块并独立下载或更新块。当你从第一次迭代到文件末尾时,重复获取附加的块;根据需要继续。由于效率较低,人们只能重复 rsync。