Python Unicode 编码错误序号不在范围 <128> 内,带有欧元符号
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15237702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Unicode Encode Error ordinal not in range<128> with Euro Sign
提问by Joe
I have to read an XML file in Python and grab various things, and I ran into a frustrating error with Unicode Encode Error that I couldn't figure out even with googling.
我必须在 Python 中读取 XML 文件并获取各种内容,但我遇到了令人沮丧的 Unicode 编码错误错误,即使使用谷歌搜索也无法弄清楚。
Here are snippets of my code:
以下是我的代码片段:
#!/usr/bin/python
# coding: utf-8
from xml.dom.minidom import parseString
with open('data.txt','w') as fout:
#do a lot of stuff
nameObj = data.getElementsByTagName('name')[0]
name = nameObj.childNodes[0].nodeValue
#... do more stuff
fout.write(','.join((name,bunch of other stuff))
This spectacularly crashes when a name entry I am parsing contains a Euro sign. Here is the error:
当我解析的名称条目包含欧元符号时,这会严重崩溃。这是错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 60: ordinal not in range(128)
I understand why Euro sign will screw it up (because it's at 128, right?), but I thought doing # coding: utf-8 would fix that. I also tried adding .encode(utf-8) so that the name looks instead like
我明白为什么欧元符号会搞砸(因为它是 128,对吧?),但我认为做 #coding: utf-8 会解决这个问题。我还尝试添加 .encode(utf-8) 以便名称看起来像
name = nameObj.childNodes[0].nodeValue.encode(utf-8)
But that doesn't work either. What am I doing wrong? (I am using Python 2.7.3 if anyone wants to know)
但这也行不通。我究竟做错了什么?(如果有人想知道,我正在使用 Python 2.7.3)
EDIT: Python crashes out on the fout.write() line -- it will go through fine where the name field is like:
编辑:Python 在 fout.write() 行崩溃了——它会在 name 字段如下所示的地方正常运行:
<name>United States, USD</name>
But will crap out on name fields like:
但是会在名称字段上出错,例如:
<name>France, </name>
采纳答案by Fernando Freitas Alves
when you are opening a file in python using the openbuilt-in function you will always read the file in ascii. To access it in another encoding you have to use codecs:
当您使用open内置函数在 python 中打开文件时,您将始终以 ascii 格式读取文件。要以另一种编码访问它,您必须使用编解码器:
import codecs
fout = codecs.open('data.txt','w','utf-8')
回答by Blckknght
It looks like you're getting Unicode data from your XML parser, but you're not encoding it before writing it out. You can explicitly encode the result before writing it out to the file:
看起来您正在从 XML 解析器获取 Unicode 数据,但在将其写出之前并未对其进行编码。您可以在将结果写入文件之前对其进行显式编码:
text = ",".join(stuff) # this will be unicode if any value in stuff is unicode
encoded = text.encode("utf-8") # or use whatever encoding you prefer
fout.write(encoded)

