Python UnicodeEncodeError: 'ascii' 编解码器无法对位置 0-5 中的字符进行编码:序号不在范围内 (128)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28544686/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:25:25  来源:igfitidea点击:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

pythonpython-2.7utf-8decode

提问by Serhii Matrunchyk

I'm simply trying to decode \uXXXX\uXXXX\uXXXX-like string. But I get an error:

我只是想解码 \uXXXX\uXXXX\uXXXX 之类的字符串。但我收到一个错误:

$ python
Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print u'\u041e\u043b\u044c\u0433\u0430'.decode('utf-8')
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)

    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

I'm Python newbie. What's a problem? Thanks!

我是 Python 新手。有什么问题?谢谢!

采纳答案by Martijn Pieters

Python is trying to be helpful. You cannot decodeUnicode data, it is already decoded. So Python first will encodethe data (using the ASCII codec) to get bytes to decode. It is this implicit encoding that fails.

Python 正在努力提供帮助。您无法解码Unicode 数据,它已被解码。因此 Python 首先将编码数据(使用 ASCII 编解码器)以获取要解码的字节。正是这种隐式编码失败了。

If you have Unicode data, it only makes sense to encodeto UTF-8, not decode:

如果您有 Unicode 数据,则只能编码为 UTF-8,而不是解码:

>>> print u'\u041e\u043b\u044c\u0433\u0430'
Ольга
>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8')
'\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0'

If you wanted a Unicode value, then using a Unicode literal (u'...') is all you needed to do. No further decoding is necessary.

如果您想要一个 Unicode 值,那么u'...'您只需要使用 Unicode 文字 ( ) 即可。不需要进一步的解码。

The same implicit conversion takes place in the other direction; if you tried to encode a bytestring you'd trigger an implicit decoding:

同样的隐式转换发生在另一个方向;如果您尝试对字节串进行编码,则会触发隐式解码:

>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

回答by Ranvijay Sachan

you can set default encoding utf-8.

您可以设置默认编码 utf-8。

import sys  
reload(sys)  
sys.setdefaultencoding('utf-8')