Python urllib2.HTTPError:HTTP 错误 503:有效网站上的服务不可用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25936072/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python urllib2.HTTPError: HTTP Error 503: Service Unavailable on valid website
提问by user2548635
I have been using Amazon's Product Advertising API to generate urls that contains prices for a given book. One url that I have generated is the following:
我一直在使用亚马逊的产品广告 API 来生成包含给定书籍价格的网址。我生成的一个网址如下:
When I click on the link or paste the link on the address bar, the web page loads fine. However, when I execute the following code I get an error:
当我单击链接或将链接粘贴到地址栏时,网页加载正常。但是,当我执行以下代码时,出现错误:
url = "http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327"
html_contents = urllib2.urlopen(url)
The error is urllib2.HTTPError: HTTP Error 503: Service Unavailable. First of all, I don't understand why I even get this error since the web page successfully loads.
错误是urllib2.HTTPError: HTTP Error 503: Service Unavailable。首先,我不明白为什么我什至收到这个错误,因为网页成功加载。
Also, another weird behavior that I have noticed is that the following code sometimes does and sometimes does not give the stated error:
此外,我注意到的另一个奇怪的行为是以下代码有时会出现有时不会给出所述错误:
html_contents = urllib2.urlopen("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327")
I am totally lost on how this behavior occurs. Is there any fix or work around to this? My goal is to read the html contents of the url.
我完全不知道这种行为是如何发生的。是否有任何修复或解决方法?我的目标是读取 url 的 html 内容。
EDIT
编辑
I don't know why stack overflow is changing my code to change the amazon link I listed above in my code to rads.stackoverflow. Anyway, ignore the rads.stackoverflow link and use my link above between the quotes.
我不知道为什么堆栈溢出要更改我的代码以将我上面在代码中列出的亚马逊链接更改为 rads.stackoverflow。无论如何,忽略 rads.stackoverflow 链接并在引号之间使用我上面的链接。
采纳答案by Ben
It's because Amazon don't allow automated access to their data, so they're rejecting your request because it didn't come from a proper browser. If you look at the content of the 503 response, it says:
这是因为亚马逊不允许自动访问他们的数据,所以他们拒绝了您的请求,因为它不是来自正确的浏览器。如果您查看 503 响应的内容,它会说:
To discuss automated access to Amazon data please contact [email protected]. For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_5_acfor advertising use cases.
要讨论对亚马逊数据的自动访问,请联系 [email protected]。有关迁移到我们的API信息,请参阅我们的市场API在https://developer.amazonservices.com/ref=rm_5_sv,或我们的产品广告API在 https://affiliate-program.amazon.com/gp/advertising/api /detail/main.html/ref=rm_5_ac用于广告用例。
This is because the User-Agentfor Python's urllibis so obviously not a browser. You could always fake the User-Agent, but that's not really good (or moral) practice.
这是因为User-Agentfor Pythonurllib显然不是浏览器。你总是可以伪造User-Agent,但这并不是真正好的(或道德的)做法。
As a side note, as mentioned in another answer, the requestslibrary is really good for HTTP access in Python.
作为旁注,如另一个答案中所述,该requests库非常适合 Python 中的 HTTP 访问。
回答by Spade
Amazon is rejecting the default User-Agent for urllib2 . One workaround is to use the requests module
亚马逊拒绝为 urllib2 使用默认的 User-Agent。一种解决方法是使用 requests 模块
import requests
page = requests.get("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327")
html_contents = page.text
If you insist on using urllib2, this is how a header can be faked to do it:
如果您坚持使用 urllib2,那么可以通过以下方式伪造标题:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open('http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327')
html_contents = response.read()
Don't worry about stackoverflow editing the URL. They explain that they are doing this here.
不要担心 stackoverflow 编辑 URL。他们解释说他们在这里这样做。

