Python 3.4 urllib.request 错误 (http 403)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28396036/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python 3.4 urllib.request error (http 403)
提问by Belial
I'm trying to open and parse a html page. In python 2.7.8 I have no problem:
我正在尝试打开并解析一个 html 页面。在 python 2.7.8 我没有问题:
import urllib
url = "https://ipdb.at/ip/66.196.116.112"
html = urllib.urlopen(url).read()
and everything is fine. However I want to move to python 3.4 and there I get HTTP error 403 (Forbidden). My code:
一切都很好。但是我想转移到 python 3.4,然后我收到 HTTP 错误 403(禁止)。我的代码:
import urllib.request
html = urllib.request.urlopen(url) # same URL as before
File "C:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 461, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 499, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
It work for other URLs which don't use https.
它适用于不使用 https 的其他 URL。
url = 'http://www.stopforumspam.com/ipcheck/212.91.188.166'
is ok.
没问题。
采纳答案by falsetru
It seems like the site does not like the user agent of Python 3.x.
该站点似乎不喜欢 Python 3.x 的用户代理。
Specifying User-Agent
will solve your problem:
指定User-Agent
将解决您的问题:
import urllib.request
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()
NOTEPython 2.x urllib version also receives 403 status, but unlike Python 2.x urllib2 and Python 3.x urllib, it does not raise the exception.
注意Python 2.x urllib 版本也会收到 403 状态,但与 Python 2.x urllib2 和 Python 3.x urllib 不同,它不会引发异常。
You can confirm that by following code:
您可以通过以下代码确认:
print(urllib.urlopen(url).getcode()) # => 403
回答by falsetru
Here are some notes I gathered on urllib
when I was studying python-3:
I kept them in case they might come in handy or help someone else out.
以下是urllib
我在学习 python-3 时收集的一些笔记:
我保留它们以防它们可能派上用场或帮助其他人。
How to import urllib.request
and urllib.parse
:
如何导入urllib.request
和urllib.parse
:
import urllib.request as urlRequest
import urllib.parse as urlParse
How to make a GET request:
如何发出 GET 请求:
url = "http://www.example.net"
# open the url
x = urlRequest.urlopen(url)
# get the source code
sourceCode = x.read()
How to make a POST request:
如何发出 POST 请求:
url = "https://www.example.com"
values = {"q": "python if"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url, values)
# open the url
x = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()
How to make a POST request (403 forbidden
responses):
如何发出 POST 请求(403 forbidden
响应):
url = "https://www.example.com"
values = {"q": "python urllib"}
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url = url, data = values, headers = headers)
# open the url
x = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()
How to make a GET request (403 forbidden
responses):
如何发出 GET 请求(403 forbidden
响应):
url = "https://www.example.com"
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
req = urlRequest.Request(url, headers = headers)
# open the url
x = urlRequest.urlopen(req)
# get the source code
sourceCode = x.read()