如何处理 IncompleteRead:在 python 中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14442222/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to handle IncompleteRead: in python
提问by
I am trying to fetch some data from a website. However it returns me incomplete read. The data I am trying to get is a huge set of nested links. I did some research online and found that this might be due to a server error (A chunked transfer encoding finishing before
reaching the expected size). I also found a workaround for above on this link
我正在尝试从网站获取一些数据。但是它返回给我incomplete read。我试图获取的数据是一组巨大的嵌套链接。我在网上做了一些研究,发现这可能是由于服务器错误(在达到预期大小之前完成分块传输编码)。我还在此链接上找到了上述解决方法
However, I am not sure as to how to use this for my case. Following is the code I am working on
但是,我不确定如何在我的情况下使用它。以下是我正在处理的代码
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1;Trident/5.0)')]
urls = "http://shop.o2.co.uk/mobile_phones/Pay_Monthly/smartphone/all_brands"
page = urllib2.urlopen(urls).read()
soup = BeautifulSoup(page)
links = soup.findAll('img',url=True)
for tag in links:
name = tag['alt']
tag['url'] = urlparse.urljoin(urls, tag['url'])
r = br.open(tag['url'])
page_child = br.response().read()
soup_child = BeautifulSoup(page_child)
contracts = [tag_c['value']for tag_c in soup_child.findAll('input', {"name": "tariff-duration"})]
data_usage = [tag_c['value']for tag_c in soup_child.findAll('input', {"name": "allowance"})]
print contracts
print data_usage
Please help me with this.Thanks
请帮我解决这个问题。谢谢
采纳答案by Kyle
The linkyou included in your question is simply a wrapper that executes urllib's read() function, which catches any incomplete read exceptions for you. If you don't want to implement this entire patch, you could always just throw in a try/catch loop where you read your links. For example:
您在问题中包含的链接只是一个执行 urllib 的 read() 函数的包装器,它为您捕获任何不完整的读取异常。如果你不想实现整个补丁,你总是可以在你阅读链接的地方加入一个 try/catch 循环。例如:
try:
page = urllib2.urlopen(urls).read()
except httplib.IncompleteRead, e:
page = e.partial
for python3
对于python3
try:
page = request.urlopen(urls).read()
except (http.client.IncompleteRead) as e:
page = e.partial
回答by Sérgio
I find out in my case : send HTTP/1.0 request , adding this , fix the problem.
我发现在我的情况下:发送 HTTP/1.0 请求,添加这个,解决问题。
import httplib
httplib.HTTPConnection._http_vsn = 10
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'
after I do the request :
在我完成请求之后:
req = urllib2.Request(url, post, headers)
filedescriptor = urllib2.urlopen(req)
img = filedescriptor.read()
after I back to http 1.1 with (for connections that support 1.1) :
在我回到 http 1.1 之后(对于支持 1.1 的连接):
httplib.HTTPConnection._http_vsn = 11
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.1'
the trick is use http 1.0 instead the default http/1.1 http 1.1 could handle chunks but for some reason webserver don't , so we do the request in http 1.0
诀窍是使用 http 1.0 而不是默认的 http/1.1 http 1.1 可以处理块但由于某种原因网络服务器不这样做,所以我们在 http 1.0 中进行请求
for Python3, it will tell you
对于 Python3,它会告诉你
ModuleNotFoundError: No module named 'httplib'
ModuleNotFoundError: 没有名为“httplib”的模块
then try to use http.client Module it will solve the problem
然后尝试使用 http.client 模块它会解决问题
import http.client as http
http.HTTPConnection._http_vsn = 10
http.HTTPConnection._http_vsn_str = 'HTTP/1.0'
回答by gDexter42
What worked for me is catching IncompleteRead as an exception and harvesting the data you managed to read in each iteration by putting this into a loop like below: (Note, I am using Python 3.4.1 and the urllib library has changed between 2.7 and 3.4)
对我有用的是将 IncompleteRead 作为异常捕获并收集您在每次迭代中设法读取的数据,方法是将其放入如下所示的循环中:(注意,我使用的是 Python 3.4.1,并且 urllib 库在 2.7 和 3.4 之间发生了变化)
try:
requestObj = urllib.request.urlopen(url, data)
responseJSON=""
while True:
try:
responseJSONpart = requestObj.read()
except http.client.IncompleteRead as icread:
responseJSON = responseJSON + icread.partial.decode('utf-8')
continue
else:
responseJSON = responseJSON + responseJSONpart.decode('utf-8')
break
return json.loads(responseJSON)
except Exception as RESTex:
print("Exception occurred making REST call: " + RESTex.__str__())
回答by nigel76
I found that my virus detector/firewall was causing this problem. "Online Shield" part of AVG.
我发现我的病毒检测器/防火墙导致了这个问题。AVG 的“Online Shield”部分。
回答by Aminah Nuraini
You can use requestsinstead of urllib2. requestsis based on urllib3so it rarely have any problem. Put it in a loop to try it 3 times, and it will be much stronger. You can use it this way:
您可以使用requests代替urllib2. requests是基于urllib3所以它很少有任何问题。放在一个循环里尝试3次,就会强很多。你可以这样使用它:
import requests
msg = None
for i in [1,2,3]:
try:
r = requests.get(self.crawling, timeout=30)
msg = r.text
if msg: break
except Exception as e:
sys.stderr.write('Got error when requesting URL "' + self.crawling + '": ' + str(e) + '\n')
if i == 3 :
sys.stderr.write('{0.filename}@{0.lineno}: Failed requesting from URL "{1}" ==> {2}\n'. format(inspect.getframeinfo(inspect.currentframe()), self.crawling, e))
raise e
time.sleep(10*(i-1))
回答by Brian
I tried all these solutions and none of them worked for me. Actually, what did work is instead of using urllib, I just used http.client (Python 3)
我尝试了所有这些解决方案,但没有一个对我有用。实际上,起作用的不是使用 urllib,而是使用 http.client (Python 3)
conn = http.client.HTTPConnection('www.google.com')
conn.request('GET', '/')
r1 = conn.getresponse()
page = r1.read().decode('utf-8')
This works perfectly every time, whereas with urllib it was returning an incompleteread exception every time.
这每次都能完美运行,而使用 urllib 时,它每次都返回一个不完整的读取异常。
回答by KJoker
I just add a more exception to pass this problem.
just like
我只是添加了一个更多的例外来解决这个问题。
就像
try:
r = requests.get(url, timeout=timeout)
except (requests.exceptions.ChunkedEncodingError, requests.ConnectionError) as e:
logging.error("There is a error: %s" % e)
回答by Sain Wu
python3 FYI
python3仅供参考
from urllib import request
import http.client
import os
url = 'http://shop.o2.co.uk/mobile_phones/Pay_Monthly/smartphone/all_brand'
try:
response = request.urlopen(url)
file = response.read()
except http.client.IncompleteRead as e:
file = e.partial
except Exception as result:
print("Unkonw error" + str(result))
return
# save file
with open(file_path, 'wb') as f:
print("save -> %s " % file_path)
f.write(file)

