python 如何在python 3.0中授权通过http下载文件,解决错误?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/395451/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:00:46  来源:igfitidea点击:

How to download a file over http with authorization in python 3.0, working around bugs?

pythonpython-3.xurllib

提问by Lasse V. Karlsen

I have a script that I'd like to continue using, but it looks like I either have to find some workaround for a bug in Python 3, or downgrade back to 2.6, and thus having to downgrade other scripts as well...

我有一个我想继续使用的脚本,但看起来我要么必须为 Python 3 中的错误找到一些解决方法,要么降级回 2.6,因此也不得不降级其他脚本......

Hopefully someone here have already managed to find a workaround.

希望这里有人已经设法找到了解决方法。

The problem is that due to the new changes in Python 3.0 regarding bytes and strings, not all the library code is apparently tested.

问题在于,由于 Python 3.0 中有关字节和字符串的新变化,显然并非所有库代码都经过测试。

I have a script that downloades a page from a web server. This script passed a username and password as part of the url in python 2.6, but in Python 3.0, this doesn't work any more.

我有一个从 Web 服务器下载页面的脚本。该脚本在 python 2.6 中将用户名和密码作为 url 的一部分传递,但在 Python 3.0 中,这不再起作用。

For instance, this:

例如,这个:

import urllib.request;
url = "http://username:password@server/file";
urllib.request.urlretrieve(url, "temp.dat");

fails with this exception:

失败,此异常:

Traceback (most recent call last):
  File "C:\Temp\test.py", line 5, in <module>
    urllib.request.urlretrieve(url, "test.html");
  File "C:\Python30\lib\urllib\request.py", line 134, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python30\lib\urllib\request.py", line 1476, in retrieve
    fp = self.open(url, data)
  File "C:\Python30\lib\urllib\request.py", line 1444, in open
    return getattr(self, name)(url)
  File "C:\Python30\lib\urllib\request.py", line 1618, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "C:\Python30\lib\urllib\request.py", line 1576, in _open_generic_http
    auth = base64.b64encode(user_passwd).strip()
  File "C:\Python30\lib\base64.py", line 56, in b64encode
    raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str

Apparently, base64-encoding now needs bytes in and outputs a string, and thus urlretrieve (or some code therein) which builds up a string of username:password, and tries to base64-encode this for simple authorization, fails.

显然,base64 编码现在需要输入字节并输出一个字符串,因此 urlretrieve(或其中的某些代码)构建了一个用户名:密码字符串,并尝试对其进行 base64 编码以进行简单授权,失败了。

If I instead try to use urlopen, like this:

如果我尝试使用 urlopen,如下所示:

import urllib.request;
url = "http://username:password@server/file";
f = urllib.request.urlopen(url);
contents = f.read();

Then it fails with this exception:

然后它失败并出现此异常:

Traceback (most recent call last):
  File "C:\Temp\test.py", line 5, in <module>
    f = urllib.request.urlopen(url);
  File "C:\Python30\lib\urllib\request.py", line 122, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python30\lib\urllib\request.py", line 359, in open
    response = self._open(req, data)
  File "C:\Python30\lib\urllib\request.py", line 377, in _open
    '_open', req)
  File "C:\Python30\lib\urllib\request.py", line 337, in _call_chain
    result = func(*args)
  File "C:\Python30\lib\urllib\request.py", line 1082, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Python30\lib\urllib\request.py", line 1051, in do_open
    h = http_class(host, timeout=req.timeout) # will parse host:port
  File "C:\Python30\lib\http\client.py", line 620, in __init__
    self._set_hostport(host, port)
  File "C:\Python30\lib\http\client.py", line 632, in _set_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: 'password@server'

Apparently the url parsing in this "next gen url retrieval library" doesn't know what to do with username and passwords in the url.

显然,这个“下一代 url 检索库”中的 url 解析不知道如何处理 url 中的用户名和密码。

What other choices do I have?

我还有哪些选择?

回答by jb.

Direct from the Py3k docs: http://docs.python.org/dev/py3k/library/urllib.request.html#examples

直接来自 Py3k 文档:http://docs.python.org/dev/py3k/library/urllib.request.html#examples

import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
                          uri='https://mahler:8092/site-updates.py',
                          user='klem',
                          passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib.request.install_opener(opener)
urllib.request.urlopen('http://www.example.com/login.html')

回答by Ali Afshar

My advice would be to maintain your 2.* branch as your production branch until you can get the 3.0 stuff sorted.

我的建议是将 2.* 分支保持为生产分支,直到对 3.0 的内容进行排序。

I am going to wait a while before moving over to Python 3.0. There seems a lot of people in a rush, but I just want everything sorted out, and a decent selection of third-party libraries. This may take a year, it may take 18 months, but the pressure to "upgrade" is really low for me.

在转向 Python 3.0 之前,我将等待一段时间。似乎有很多人很匆忙,但我只想把所有事情都整理好,以及一些不错的第三方库。这可能需要一年,也可能需要18个月,但“升级”的压力对我来说真的很小。