Python ascii 编解码器无法解码字节 0xe9

Question

提问by iqueqiorio

I have done some research and seen solutions but none have worked for me.

我做了一些研究并看到了解决方案，但没有一个对我有用。

Python - 'ascii' codec can't decode byte

This didn't work for me. And I know the 0xe9 is the é character. But I still can't figure out how to get this working, here is my code

这对我不起作用。我知道 0xe9 是 é 字符。但我仍然无法弄清楚如何让它工作，这是我的代码

output_lines = ['<menu>', '<day name="monday">', '<meal name="BREAKFAST">', '<counter name="Entreé">', '<dish>', '<name icon1="Vegan" icon2="Mindful Item">', 'Cream of Wheat (Farina)','</name>', '</dish>', '</counter >', '</meal >', '</day >', '</menu >']
output_string = '\n'.join([line.encode("utf-8") for line in output_lines])

And this give me the error ascii codec cant decode byte 0xe9

这给了我错误 ascii codec cant decode byte 0xe9

And I have tried decoding, I have tried to replace the "é" but can't seem to get that to work either.

我试过解码，我试过替换“é”，但似乎也无法让它工作。

Answer 1

采纳答案by Martijn Pieters

You are trying to encode bytestrings:

您正在尝试对字节串进行编码：

>>> '<counter name="Entreé">'.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)

Python is trying to be helpful, you can only encode a Unicodestring to bytes, so to encode Python first implictly decodes, using the default encoding.

Python是想尽力帮忙，你只能编码的Unicode字符串字节，所以编码Python的第一implictly解码，使用默认的编码。

The solution is to not encodedata that is already encoded, or first decode using a suitable codec before trying to encode again, if the data was encoded to a different codec than what you needed.

解决方案是不对已经编码的数据进行编码，或者在尝试再次编码之前首先使用合适的编解码器进行解码，如果数据被编码为与您需要的编解码器不同的编解码器。

If you have a mix of unicode and bytestring values, decode just the bytestrings or encode just the unicode values; try to avoid mixing the types. The following decodes byte strings to unicode first:

如果你有 unicode 和 bytestring 值的混合，只解码字节串或只编码 unicode 值；尽量避免混合类型。下面首先将字节字符串解码为 unicode：

def ensure_unicode(v):
    if isinstance(v, str):
        v = v.decode('utf8')
    return unicode(v)  # convert anything not a string to unicode too

output_string = u'\n'.join([ensure_unicode(line) for line in output_lines])

Answer 2

回答by Joran Beasley

encode= turn a unicode string into a bytestring

encode= 将 unicode 字符串转换为字节字符串

decode= turn a bytestring into unicode

decode= 将字节串转换为 unicode

since you already have a bytestring you need decode to make it a unicode instance (assuming that is actually what you are trying to do)

因为你已经有一个字节串，你需要解码以使其成为一个 unicode 实例（假设这实际上是你想要做的）

output_string = '\n'.join(output_lines)
print output_string.decode("latin1")  #now this returns unicode

Answer 3

回答by Kasramvd

Based on what you want to do with your lines, you can do different work here, if you just want to print in consul as normally the consuls use utf8encoding you dont need to do that by your self as the format of your string is not unicode:

根据你想对你的行做什么，你可以在这里做不同的工作，如果你只想在领事中打印，通常领事使用utf8编码，你不需要自己做，因为你的字符串格式不是unicode：

>>> output_string = '\n'.join(output_lines)
>>> print output_string
<menu>
<day name="monday">
<meal name="BREAKFAST">
<counter name="Entreé">
<dish>
<name icon1="Vegan" icon2="Mindful Item">
Cream of Wheat (Farina)
</name>
</dish>
</counter >
</meal >
</day >
</menu >

But if you want to write to file you can use codecsmodule:

但是如果你想写入文件，你可以使用codecs模块：

import codecs
f= codecs.open('out_file','w',encoding='utf8')

Answer 4

回答by tdelaney

A simple example of the problem is:

问题的一个简单示例是：

>>> '\xe9'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

\xe9isn't an ascii character which means that your string is already encoded. You need to decode it into python's unicode and then encode it again in the serialization format you want.

\xe9不是 ascii 字符，这意味着您的字符串已被编码。您需要将其解码为python的unicode，然后以您想要的序列化格式再次对其进行编码。

Since I don't know where your string came from, I just peeked at the python codecs, picked something from Western Europe and gave it a go:

因为我不知道你的字符串来自哪里，我只是偷看了python codecs，从西欧挑选了一些东西并试了一下：

>>> '\xe9'.decode('cp1252')
u'\xe9'
>>> u'\xe9'.encode('utf-8')
'\xc3\xa9'
>>>

You'll have the best luck if you know exactly which encoding the file came from.

如果您确切地知道文件来自哪种编码，那么您将获得最好的运气。

Python ascii 编解码器无法解码字节 0xe9

提问by iqueqiorio

采纳答案by Martijn Pieters

回答by Joran Beasley

回答by Kasramvd

回答by tdelaney

相关推荐

最近更新

标签

Python ascii 编解码器无法解码字节 0xe9

提问by iqueqiorio

采纳答案by Martijn Pieters

回答by Joran Beasley

回答by Kasramvd

回答by tdelaney

相关推荐

Python 将具有常量值的列添加到 Pandas 数据框

Python 熊猫获取不在其他数据框中的行

Python 在 Pandas 中设置现有数据框的多索引

Python “模块”没有属性“urlencode”

相关推荐

最近更新

标签