Python ascii 编解码器无法解码字节 0xe9
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28947607/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ascii codec cant decode byte 0xe9
提问by iqueqiorio
I have done some research and seen solutions but none have worked for me.
我做了一些研究并看到了解决方案,但没有一个对我有用。
Python - 'ascii' codec can't decode byte
This didn't work for me. And I know the 0xe9 is the é character. But I still can't figure out how to get this working, here is my code
这对我不起作用。我知道 0xe9 是 é 字符。但我仍然无法弄清楚如何让它工作,这是我的代码
output_lines = ['<menu>', '<day name="monday">', '<meal name="BREAKFAST">', '<counter name="Entreé">', '<dish>', '<name icon1="Vegan" icon2="Mindful Item">', 'Cream of Wheat (Farina)','</name>', '</dish>', '</counter >', '</meal >', '</day >', '</menu >']
output_string = '\n'.join([line.encode("utf-8") for line in output_lines])
And this give me the error ascii codec cant decode byte 0xe9
这给了我错误 ascii codec cant decode byte 0xe9
And I have tried decoding, I have tried to replace the "é" but can't seem to get that to work either.
我试过解码,我试过替换“é”,但似乎也无法让它工作。
采纳答案by Martijn Pieters
You are trying to encode bytestrings:
您正在尝试对字节串进行编码:
>>> '<counter name="Entreé">'.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)
Python is trying to be helpful, you can only encode a Unicodestring to bytes, so to encode Python first implictly decodes, using the default encoding.
Python是想尽力帮忙,你只能编码的Unicode字符串字节,所以编码Python的第一implictly解码,使用默认的编码。
The solution is to not encodedata that is already encoded, or first decode using a suitable codec before trying to encode again, if the data was encoded to a different codec than what you needed.
解决方案是不对已经编码的数据进行编码,或者在尝试再次编码之前首先使用合适的编解码器进行解码,如果数据被编码为与您需要的编解码器不同的编解码器。
If you have a mix of unicode and bytestring values, decode just the bytestrings or encode just the unicode values; try to avoid mixing the types. The following decodes byte strings to unicode first:
如果你有 unicode 和 bytestring 值的混合,只解码字节串或只编码 unicode 值;尽量避免混合类型。下面首先将字节字符串解码为 unicode:
def ensure_unicode(v):
if isinstance(v, str):
v = v.decode('utf8')
return unicode(v) # convert anything not a string to unicode too
output_string = u'\n'.join([ensure_unicode(line) for line in output_lines])
回答by Joran Beasley
encode
= turn a unicode string into a bytestring
encode
= 将 unicode 字符串转换为字节字符串
decode
= turn a bytestring into unicode
decode
= 将字节串转换为 unicode
since you already have a bytestring you need decode to make it a unicode instance (assuming that is actually what you are trying to do)
因为你已经有一个字节串,你需要解码以使其成为一个 unicode 实例(假设这实际上是你想要做的)
output_string = '\n'.join(output_lines)
print output_string.decode("latin1") #now this returns unicode
回答by Kasramvd
Based on what you want to do with your lines, you can do different work here, if you just want to print in consul as normally the consuls use utf8
encoding you dont need to do that by your self as the format of your string is not unicode
:
根据你想对你的行做什么,你可以在这里做不同的工作,如果你只想在领事中打印,通常领事使用utf8
编码,你不需要自己做,因为你的字符串格式不是unicode
:
>>> output_string = '\n'.join(output_lines)
>>> print output_string
<menu>
<day name="monday">
<meal name="BREAKFAST">
<counter name="Entreé">
<dish>
<name icon1="Vegan" icon2="Mindful Item">
Cream of Wheat (Farina)
</name>
</dish>
</counter >
</meal >
</day >
</menu >
But if you want to write to file you can use codecs
module:
但是如果你想写入文件,你可以使用codecs
模块:
import codecs
f= codecs.open('out_file','w',encoding='utf8')
回答by tdelaney
A simple example of the problem is:
问题的一个简单示例是:
>>> '\xe9'.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
\xe9
isn't an ascii character which means that your string is already encoded. You need to decode it into python's unicode and then encode it again in the serialization format you want.
\xe9
不是 ascii 字符,这意味着您的字符串已被编码。您需要将其解码为python的unicode,然后以您想要的序列化格式再次对其进行编码。
Since I don't know where your string came from, I just peeked at the python codecs, picked something from Western Europe and gave it a go:
因为我不知道你的字符串来自哪里,我只是偷看了python codecs,从西欧挑选了一些东西并试了一下:
>>> '\xe9'.decode('cp1252')
u'\xe9'
>>> u'\xe9'.encode('utf-8')
'\xc3\xa9'
>>>
You'll have the best luck if you know exactly which encoding the file came from.
如果您确切地知道文件来自哪种编码,那么您将获得最好的运气。