Python 3.4 urllib.request 错误 (http 403)

Question

提问by Belial

I'm trying to open and parse a html page. In python 2.7.8 I have no problem:

我正在尝试打开并解析一个 html 页面。在 python 2.7.8 我没有问题：

import urllib
url = "https://ipdb.at/ip/66.196.116.112"
html = urllib.urlopen(url).read()

and everything is fine. However I want to move to python 3.4 and there I get HTTP error 403 (Forbidden). My code:

一切都很好。但是我想转移到 python 3.4，然后我收到 HTTP 错误 403（禁止）。我的代码：

import urllib.request
html = urllib.request.urlopen(url) # same URL as before

File "C:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 461, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 499, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

It work for other URLs which don't use https.

它适用于不使用 https 的其他 URL。

url = 'http://www.stopforumspam.com/ipcheck/212.91.188.166'

is ok.

没问题。

Answer 1

采纳答案by falsetru

It seems like the site does not like the user agent of Python 3.x.

该站点似乎不喜欢 Python 3.x 的用户代理。

Specifying User-Agentwill solve your problem:

指定User-Agent将解决您的问题：

import urllib.request
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()

NOTEPython 2.x urllib version also receives 403 status, but unlike Python 2.x urllib2 and Python 3.x urllib, it does not raise the exception.

注意Python 2.x urllib 版本也会收到 403 状态，但与 Python 2.x urllib2 和 Python 3.x urllib 不同，它不会引发异常。

You can confirm that by following code:

您可以通过以下代码确认：

print(urllib.urlopen(url).getcode())  # => 403

Answer 2

回答by falsetru

Here are some notes I gathered on urllibwhen I was studying python-3:
I kept them in case they might come in handy or help someone else out.

以下是urllib我在学习 python-3 时收集的一些笔记：
我保留它们以防它们可能派上用场或帮助其他人。

How to import `urllib.request`and `urllib.parse`:

如何导入`urllib.request`和`urllib.parse`：

import urllib.request as urlRequest
import urllib.parse as urlParse

How to make a GET request:

如何发出 GET 请求：

url = "http://www.example.net"
# open the url
x = urlRequest.urlopen(url)
# get the source code
sourceCode = x.read()

How to make a POST request:

如何发出 POST 请求：

url = "https://www.example.com"
values = {"q": "python if"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url, values)
# open the url
x  = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()

How to make a POST request (`403 forbidden`responses):

如何发出 POST 请求（`403 forbidden`响应）：

url = "https://www.example.com"
values = {"q": "python urllib"}
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url = url, data = values, headers = headers)
# open the url
x  = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()

How to make a GET request (`403 forbidden`responses):

如何发出 GET 请求（`403 forbidden`响应）：

url = "https://www.example.com"
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
req = urlRequest.Request(url, headers = headers)
# open the url
x = urlRequest.urlopen(req)
# get the source code
sourceCode = x.read()

Python 3.4 urllib.request 错误 (http 403)

提问by Belial

采纳答案by falsetru

回答by falsetru

How to import `urllib.request`and `urllib.parse`:

如何导入`urllib.request`和`urllib.parse`：

How to make a GET request:

如何发出 GET 请求：

How to make a POST request:

如何发出 POST 请求：

How to make a POST request (`403 forbidden`responses):

如何发出 POST 请求（`403 forbidden`响应）：

How to make a GET request (`403 forbidden`responses):

如何发出 GET 请求（`403 forbidden`响应）：

相关推荐

最近更新

标签

Python 3.4 url​​lib.request 错误 (http 403)

提问by Belial

采纳答案by falsetru

回答by falsetru

How to import urllib.requestand urllib.parse:

如何导入urllib.request和urllib.parse：

How to make a GET request:

如何发出 GET 请求：

How to make a POST request:

如何发出 POST 请求：

How to make a POST request (403 forbiddenresponses):

如何发出 POST 请求（403 forbidden响应）：

How to make a GET request (403 forbiddenresponses):

如何发出 GET 请求（403 forbidden响应）：

相关推荐

Python 防止 matplotlib.pyplot 中的科学记数法

找不到 Python 可执行文件“python”

Python 添加边权重以在 networkx 中绘制输出

Python 从 json 脚本输出中刮取

相关推荐

最近更新

标签

Python 3.4 urllib.request 错误 (http 403)

How to import `urllib.request`and `urllib.parse`:

如何导入`urllib.request`和`urllib.parse`：

How to make a POST request (`403 forbidden`responses):

如何发出 POST 请求（`403 forbidden`响应）：

How to make a GET request (`403 forbidden`responses):

如何发出 GET 请求（`403 forbidden`响应）：