Python在可用时逐行读取网站数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16870648/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python read website data line by line when available
提问by sarbjit
I am using urllib2to read the data from the url, below is the code snippet :
我正在使用urllib2从 url 读取数据,下面是代码片段:
data = urllib2.urlopen(urllink)
for lines in data.readlines():
print lines
Url that I am opening is actually a cgi script which does some processing and prints the data in parallel. CGI script takes around 30 minutes to complete. So with the above code, I could see the output only after 3o minutes when the execution of CGI script is completed.
我打开的 URL 实际上是一个 cgi 脚本,它执行一些处理并并行打印数据。CGI 脚本大约需要 30 分钟才能完成。所以通过上面的代码,我只能在CGI脚本执行完成3o分钟后才能看到输出。
How can I read the data from the url as soon as it is available and print it.
如何在 url 可用时立即从 url 读取数据并打印它。
采纳答案by Martijn Pieters
Just loop directlyover the file object:
直接循环遍历文件对象:
for line in data:
print line
This reads the incoming data stream line by line (internally, the socket fileobject calls .readline()every time you iterate). This does assume that your server is sending data as soon as possible.
这会逐行读取传入的数据流(在内部,.readline()每次迭代时都会调用套接字文件对象)。这确实假设您的服务器正在尽快发送数据。
Calling .readlines()(plural) guarantees that you read the whole request beforeyou start looping, don't do that.
调用.readlines()(复数)保证您在开始循环之前阅读整个请求,不要这样做。
Alternatively, use the requestslibrary, which has more explicit support for request streaming:
或者,使用requestslibrary,它对请求流有更明确的支持:
import requests
r = requests.get(url, stream=True)
for line in r.iter_lines():
if line: print line
Note that this only will work if your server starts streaming data immediately. If your CGI doesn't produce data until the process is complete, there is no point in trying to stream the data.
请注意,这仅在您的服务器立即开始流式传输数据时才有效。如果您的 CGI 直到过程完成才产生数据,那么尝试流式传输数据就没有意义了。

