Python UnicodeDecodeError: 'ascii' 编解码器无法解码位置 23 中的字节 0xc3:序号不在范围内 (128)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24475393/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128)
提问by Capens
when I try to concatenate this, I get the UnicodeDecodeError when the field contains '?' or '′'. If the field that contains the '?' or '′' is the last I get no error.
当我尝试连接它时,当字段包含 '?' 时,我会收到 UnicodeDecodeError 或者 '''。如果字段包含“?” 或 ''' 是最后一个我没有收到错误。
#...
nombre = fabrica
nombre = nombre.encode("utf-8") + '-' + sector.encode("utf-8")
nombre = nombre.encode("utf-8") + '-' + unidad.encode("utf-8")
#...
return nombre
any idea? Many thanks!
任何的想法?非常感谢!
采纳答案by Martijn Pieters
You are encoding to UTF-8, then re-encoding to UTF-8. Python can only do this if it first decodesagain to Unicode, but it has to use the default ASCII codec:
您正在编码为 UTF-8,然后重新编码为 UTF-8。Python 只能在它首先再次解码为 Unicode 时才能做到这一点,但它必须使用默认的 ASCII 编解码器:
>>> u'?'
u'\xf1'
>>> u'?'.encode('utf8')
'\xc3\xb1'
>>> u'?'.encode('utf8').encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
Don't keep encoding; leave encoding to UTF-8 to the last possible momentinstead. Concatenate Unicode values instead.
不要继续编码;将编码留给 UTF-8到最后可能的时刻。改为连接 Unicode 值。
You can use str.join()
(or, rather, unicode.join()
) here to concatenate the three values with dashes in between:
您可以在此处使用str.join()
(或者更确切地说,unicode.join()
)将三个值与中间的破折号连接起来:
nombre = u'-'.join(fabrica, sector, unidad)
return nombre.encode('utf-8')
but even encoding here might be too early.
但即使在这里编码也可能为时过早。
Rule of thumb: decode the moment you receive the value (if not Unicode values supplied by an API already), encode only when you have to (if the destination API does not handle Unicode values directly).
经验法则:在您收到值时解码(如果不是 API 已经提供的 Unicode 值),仅在必须时才进行编码(如果目标 API 不直接处理 Unicode 值)。
回答by Serge Ballesta
When you get a UnicodeEncodeError
, it means that somewhere in your code you convert directly a byte string to a unicode one. By default in Python 2 it uses ascii encoding, and utf8 encoding in Python3 (both may fail because not every byte is valid in either encoding)
当您得到 a 时UnicodeEncodeError
,这意味着您在代码中的某处直接将字节字符串转换为 unicode 字符串。默认情况下,在 Python 2 中它使用 ascii 编码,在 Python3 中使用 utf8 编码(两者都可能失败,因为在任一编码中并非每个字节都有效)
To avoid that, you must use explicit decoding.
为避免这种情况,您必须使用显式解码。
If you may have 2 different encoding in your input file, one of them accepts any byte (say UTF8 and Latin1), you can try to first convert a string with first and use the second one if a UnicodeDecodeError occurs.
如果您的输入文件中可能有 2 种不同的编码,其中一个接受任何字节(例如 UTF8 和 Latin1),您可以尝试首先使用第一个转换字符串,如果发生 UnicodeDecodeError 则使用第二个。
def robust_decode(bs):
'''Takes a byte string as param and convert it into a unicode one.
First tries UTF8, and fallback to Latin1 if it fails'''
cr = None
try:
cr = bs.decode('utf8')
except UnicodeDecodeError:
cr = bs.decode('latin1')
return cr
If you do not know original encoding and do not care for non ascii character, you can set the optional errors
parameter of the decode
method to replace
. Any offending byte will be replaced (from the standard library documentation):
如果您不知道原始编码并且不关心非 ascii 字符,您可以将方法的可选errors
参数设置decode
为replace
. 任何有问题的字节都将被替换(来自标准库文档):
Replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT CHARACTER for the built-in Unicode codecs on decoding and ‘?' on encoding.
替换为合适的替换字符;Python 将使用官方的 U+FFFD REPLACEMENT CHARACTER 作为内置 Unicode 编解码器的解码和 '?' 关于编码。
bs.decode(errors='replace')
回答by Jose Kj
I was getting this error when executing in python3,I got the same program working by simply executing in python2
我在 python3 中执行时遇到了这个错误,我通过简单地执行得到了相同的程序 python2