Python Unicode 编码错误序号不在范围 <128> 内，带有欧元符号

Question

提问by Joe

I have to read an XML file in Python and grab various things, and I ran into a frustrating error with Unicode Encode Error that I couldn't figure out even with googling.

我必须在 Python 中读取 XML 文件并获取各种内容，但我遇到了令人沮丧的 Unicode 编码错误错误，即使使用谷歌搜索也无法弄清楚。

Here are snippets of my code:

以下是我的代码片段：

#!/usr/bin/python
# coding: utf-8
from xml.dom.minidom import parseString
with open('data.txt','w') as fout:
   #do a lot of stuff
   nameObj = data.getElementsByTagName('name')[0]
   name = nameObj.childNodes[0].nodeValue
   #... do more stuff
   fout.write(','.join((name,bunch of other stuff))

This spectacularly crashes when a name entry I am parsing contains a Euro sign. Here is the error:

当我解析的名称条目包含欧元符号时，这会严重崩溃。这是错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 60: ordinal not in range(128)

I understand why Euro sign will screw it up (because it's at 128, right?), but I thought doing # coding: utf-8 would fix that. I also tried adding .encode(utf-8) so that the name looks instead like

我明白为什么欧元符号会搞砸（因为它是 128，对吧？），但我认为做 #coding: utf-8 会解决这个问题。我还尝试添加 .encode(utf-8) 以便名称看起来像

name = nameObj.childNodes[0].nodeValue.encode(utf-8)

But that doesn't work either. What am I doing wrong? (I am using Python 2.7.3 if anyone wants to know)

但这也行不通。我究竟做错了什么？（如果有人想知道，我正在使用 Python 2.7.3）

EDIT: Python crashes out on the fout.write() line -- it will go through fine where the name field is like:

编辑：Python 在 fout.write() 行崩溃了——它会在 name 字段如下所示的地方正常运行：

<name>United States, USD</name>

But will crap out on name fields like:

但是会在名称字段上出错，例如：

<name>France,  </name>

Answer 1

采纳答案by Fernando Freitas Alves

when you are opening a file in python using the openbuilt-in function you will always read the file in ascii. To access it in another encoding you have to use codecs:

当您使用open内置函数在 python 中打开文件时，您将始终以 ascii 格式读取文件。要以另一种编码访问它，您必须使用编解码器：

import codecs
fout = codecs.open('data.txt','w','utf-8')

Answer 2

回答by Blckknght

It looks like you're getting Unicode data from your XML parser, but you're not encoding it before writing it out. You can explicitly encode the result before writing it out to the file:

看起来您正在从 XML 解析器获取 Unicode 数据，但在将其写出之前并未对其进行编码。您可以在将结果写入文件之前对其进行显式编码：

text = ",".join(stuff) # this will be unicode if any value in stuff is unicode
encoded = text.encode("utf-8") # or use whatever encoding you prefer
fout.write(encoded)

Python Unicode 编码错误序号不在范围 <128> 内，带有欧元符号

提问by Joe

采纳答案by Fernando Freitas Alves

回答by Blckknght

相关推荐

最近更新

标签

Python Unicode 编码错误序号不在范围 <128> 内，带有欧元符号

提问by Joe

采纳答案by Fernando Freitas Alves

回答by Blckknght

相关推荐

Python E731 不分配 lambda 表达式，使用 def

Python 从文件指针获取文件名

pythonic是什么意思？

Python 如何制作良好的可重复熊猫示例

相关推荐

最近更新

标签