python urllib.urlopen 有效,但 urllib2.urlopen 无效
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/201515/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
urllib.urlopen works but urllib2.urlopen doesn't
提问by Eli Courtwright
I have a simple website I'm testing. It's running on localhost and I can access it in my web browser. The index page is simply the word "running". urllib.urlopen
will successfully read the page but urllib2.urlopen
will not. Here's a script which demonstrates the problem (this is the actual script and not a simplification of a different test script):
我有一个正在测试的简单网站。它在本地主机上运行,我可以在我的网络浏览器中访问它。索引页只是“运行”这个词。 urllib.urlopen
将成功读取页面但urllib2.urlopen
不会。这是一个演示问题的脚本(这是实际脚本,而不是不同测试脚本的简化):
import urllib, urllib2
print urllib.urlopen("http://127.0.0.1").read() # prints "running"
print urllib2.urlopen("http://127.0.0.1").read() # throws an exception
Here's the stack trace:
这是堆栈跟踪:
Traceback (most recent call last):
File "urltest.py", line 5, in <module>
print urllib2.urlopen("http://127.0.0.1").read()
File "C:\Python25\lib\urllib2.py", line 121, in urlopen
return _opener.open(url, data)
File "C:\Python25\lib\urllib2.py", line 380, in open
response = meth(req, response)
File "C:\Python25\lib\urllib2.py", line 491, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\lib\urllib2.py", line 412, in error
result = self._call_chain(*args)
File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
result = func(*args)
File "C:\Python25\lib\urllib2.py", line 575, in http_error_302
return self.parent.open(new)
File "C:\Python25\lib\urllib2.py", line 380, in open
response = meth(req, response)
File "C:\Python25\lib\urllib2.py", line 491, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\lib\urllib2.py", line 418, in error
return self._call_chain(*args)
File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
result = func(*args)
File "C:\Python25\lib\urllib2.py", line 499, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 504: Gateway Timeout
Any ideas? I might end up needing some of the more advanced features of urllib2
, so I don't want to just resort to using urllib
, plus I want to understand this problem.
有任何想法吗?我可能最终需要 的一些更高级的功能urllib2
,所以我不想仅仅诉诸于使用urllib
,而且我想了解这个问题。
回答by John Millikin
Sounds like you have proxy settings defined that urllib2 is picking up on. When it tries to proxy "127.0.0.01/", the proxy gives up and returns a 504 error.
听起来您已经定义了 urllib2 正在接收的代理设置。当它尝试代理“127.0.0.01/”时,代理放弃并返回 504 错误。
From Obscure python urllib2 proxy gotcha:
来自Obscure python urllib2 代理陷阱:
proxy_support = urllib2.ProxyHandler({})
opener = urllib2.build_opener(proxy_support)
print opener.open("http://127.0.0.1").read()
# Optional - makes this opener default for urlopen etc.
urllib2.install_opener(opener)
print urllib2.urlopen("http://127.0.0.1").read()
回答by Sijin
Does calling urlib2.open first followed by urllib.open have the same results? Just wondering if the first call to open is causing the http server to get busy causing the timeout?
先调用 urlib2.open 再调用 urllib.open 是否有相同的结果?只是想知道第一次调用 open 是否导致 http 服务器忙导致超时?
回答by Alex Coventry
I don't know what's going on, but you may find this helpful in figuring it out:
我不知道发生了什么,但您可能会发现这有助于弄清楚:
>>> import urllib2
>>> urllib2.urlopen('http://mit.edu').read()[:10]
'<!DOCTYPE '
>>> urllib2._opener.handlers[1].set_http_debuglevel(100)
>>> urllib2.urlopen('http://mit.edu').read()[:10]
connect: (mit.edu, 80)
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: mit.edu\r\nConnection: close\r\nUser-Agent: Python-urllib/2.5\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 14 Oct 2008 15:52:03 GMT
header: Server: MIT Web Server Apache/1.3.26 Mark/1.5 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.7c
header: Last-Modified: Tue, 14 Oct 2008 04:02:15 GMT
header: ETag: "71d3f96-2895-48f419c7"
header: Accept-Ranges: bytes
header: Content-Length: 10389
header: Connection: close
header: Content-Type: text/html
'<!DOCTYPE '
回答by Deestan
urllib.urlopen() throws the following request at the server:
urllib.urlopen() 向服务器抛出以下请求:
GET / HTTP/1.0
Host: 127.0.0.1
User-Agent: Python-urllib/1.17
while urllib2.urlopen() throws this:
而 urllib2.urlopen() 抛出这个:
GET / HTTP/1.1
Accept-Encoding: identity
Host: 127.0.0.1
Connection: close
User-Agent: Python-urllib/2.5
So, your server either doesn't understand HTTP/1.1 or the extra header fields.
因此,您的服务器要么不理解 HTTP/1.1,要么不理解额外的标头字段。