Python UnicodeDecodeError: 'ascii' 编解码器无法解码位置 23 中的字节 0xc3：序号不在范围内 (128)

Question

提问by Capens

when I try to concatenate this, I get the UnicodeDecodeError when the field contains '?' or '′'. If the field that contains the '?' or '′' is the last I get no error.

当我尝试连接它时，当字段包含 '?' 时，我会收到 UnicodeDecodeError 或者 '''。如果字段包含“?” 或 ''' 是最后一个我没有收到错误。

#...

nombre = fabrica
nombre = nombre.encode("utf-8") + '-' + sector.encode("utf-8")
nombre = nombre.encode("utf-8") + '-' + unidad.encode("utf-8")

#...

return nombre

any idea? Many thanks!

任何的想法？非常感谢！

Answer 1

采纳答案by Martijn Pieters

You are encoding to UTF-8, then re-encoding to UTF-8. Python can only do this if it first decodesagain to Unicode, but it has to use the default ASCII codec:

您正在编码为 UTF-8，然后重新编码为 UTF-8。Python 只能在它首先再次解码为 Unicode 时才能做到这一点，但它必须使用默认的 ASCII 编解码器：

>>> u'?'
u'\xf1'
>>> u'?'.encode('utf8')
'\xc3\xb1'
>>> u'?'.encode('utf8').encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Don't keep encoding; leave encoding to UTF-8 to the last possible momentinstead. Concatenate Unicode values instead.

不要继续编码；将编码留给 UTF-8到最后可能的时刻。改为连接 Unicode 值。

You can use str.join()(or, rather, unicode.join()) here to concatenate the three values with dashes in between:

您可以在此处使用str.join()（或者更确切地说，unicode.join()）将三个值与中间的破折号连接起来：

nombre = u'-'.join(fabrica, sector, unidad)
return nombre.encode('utf-8')

but even encoding here might be too early.

但即使在这里编码也可能为时过早。

Rule of thumb: decode the moment you receive the value (if not Unicode values supplied by an API already), encode only when you have to (if the destination API does not handle Unicode values directly).

经验法则：在您收到值时解码（如果不是 API 已经提供的 Unicode 值），仅在必须时才进行编码（如果目标 API 不直接处理 Unicode 值）。

Answer 2

回答by Serge Ballesta

When you get a UnicodeEncodeError, it means that somewhere in your code you convert directly a byte string to a unicode one. By default in Python 2 it uses ascii encoding, and utf8 encoding in Python3 (both may fail because not every byte is valid in either encoding)

当您得到 a 时UnicodeEncodeError，这意味着您在代码中的某处直接将字节字符串转换为 unicode 字符串。默认情况下，在 Python 2 中它使用 ascii 编码，在 Python3 中使用 utf8 编码（两者都可能失败，因为在任一编码中并非每个字节都有效）

To avoid that, you must use explicit decoding.

为避免这种情况，您必须使用显式解码。

If you may have 2 different encoding in your input file, one of them accepts any byte (say UTF8 and Latin1), you can try to first convert a string with first and use the second one if a UnicodeDecodeError occurs.

如果您的输入文件中可能有 2 种不同的编码，其中一个接受任何字节（例如 UTF8 和 Latin1），您可以尝试首先使用第一个转换字符串，如果发生 UnicodeDecodeError 则使用第二个。

def robust_decode(bs):
    '''Takes a byte string as param and convert it into a unicode one.
First tries UTF8, and fallback to Latin1 if it fails'''
    cr = None
    try:
        cr = bs.decode('utf8')
    except UnicodeDecodeError:
        cr = bs.decode('latin1')
    return cr

If you do not know original encoding and do not care for non ascii character, you can set the optional errorsparameter of the decodemethod to replace. Any offending byte will be replaced (from the standard library documentation):

如果您不知道原始编码并且不关心非 ascii 字符，您可以将方法的可选errors参数设置decode为replace. 任何有问题的字节都将被替换（来自标准库文档）：

Replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT CHARACTER for the built-in Unicode codecs on decoding and ‘?' on encoding.

替换为合适的替换字符；Python 将使用官方的 U+FFFD REPLACEMENT CHARACTER 作为内置 Unicode 编解码器的解码和 '?' 关于编码。

bs.decode(errors='replace')

Answer 3

回答by Jose Kj

I was getting this error when executing in python3,I got the same program working by simply executing in python2

我在 python3 中执行时遇到了这个错误，我通过简单地执行得到了相同的程序 python2

Python UnicodeDecodeError: 'ascii' 编解码器无法解码位置 23 中的字节 0xc3：序号不在范围内 (128)

提问by Capens

采纳答案by Martijn Pieters

回答by Serge Ballesta

回答by Jose Kj

相关推荐

最近更新

标签

Python UnicodeDecodeError: 'ascii' 编解码器无法解码位置 23 中的字节 0xc3：序号不在范围内 (128)

提问by Capens

采纳答案by Martijn Pieters

回答by Serge Ballesta

回答by Jose Kj

相关推荐

Python 在 Pandas 中将 .loc 与 MultiIndex 一起使用？

使用 Psycopg2 插入 Python 字典

Python 无法解释的 Flask 404 错误

Python 在 xslxwriter 中模拟自动调整列

相关推荐

最近更新

标签