Python UnicodeDecodeError: 'ascii' 编解码器无法解码字节 0xc2

Question

提问by kagat-kagat

I am creating XML file in Python and there's a field on my XML that I put the contents of a text file. I do it by

我正在用 Python 创建 XML 文件，我的 XML 上有一个字段，用于放置文本文件的内容。我这样做

f = open ('myText.txt',"r")
data = f.read()
f.close()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml")

And then I get the UnicodeDecodeError. I already tried to put the special comment # -*- coding: utf-8 -*-on top of my script but still got the error. Also I tried already to enforce the encoding of my variable data.encode('utf-8')but still got the error. I know this issue is very common but all the solutions I got from other questions didn't work for me.

然后我得到了UnicodeDecodeError. 我已经尝试将特殊注释# -*- coding: utf-8 -*-放在我的脚本之上，但仍然出现错误。此外，我已经尝试强制对我的变量进行编码，data.encode('utf-8')但仍然出现错误。我知道这个问题很常见，但我从其他问题中得到的所有解决方案都对我不起作用。

UPDATE

更新

Traceback: Using only the special comment on the first line of the script

回溯：仅使用脚本第一行的特殊注释

Traceback (most recent call last):
  File "D:\Python\lse\createxml.py", line 151, in <module>
    tree.write("D:\python\lse\xmls\" + items[ctr][0] + ".xml")
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write
    serialize(write, self._root, encoding, qnames, namespaces)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata
    return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 243: ordina
l not in range(128)

Traceback: Using .encode('utf-8')

追溯：使用 .encode('utf-8')

Traceback (most recent call last):
  File "D:\Python\lse\createxml.py", line 148, in <module>
    field.text = data.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 227: ordina
l not in range(128)

I used .decode('utf-8')and the error message didn't appear and it successfully created my XML file. But the problem is that the XML is not viewable on my browser.

我使用过.decode('utf-8')并且没有出现错误消息，它成功地创建了我的 XML 文件。但问题是在我的浏览器上看不到 XML。

Answer 1

回答by uhbif19

You need to decode data from input string into unicode, before using it, to avoid encoding problems.

在使用之前，您需要将输入字符串中的数据解码为 unicode，以避免编码问题。

field.text = data.decode("utf8")

Answer 2

回答by kqw

I was running into a similar error in pywikipediabot. The .decodemethod is a step in the right direction but for me it didn't work without adding 'ignore':

我在 pywikipediabot 中遇到了类似的错误。该.decode方法是朝着正确方向迈出的一步，但对我来说，如果不添加它就不起作用'ignore'：

ignore_encoding = lambda s: s.decode('utf8', 'ignore')

Ignoring encoding errors can lead to data loss or produce incorrect output. But if you just want to get it done and the details aren't very important this can be a good way to move faster.

忽略编码错误会导致数据丢失或产生不正确的输出。但是，如果您只是想完成它并且细节不是很重要，那么这可能是加快行动速度的好方法。

Answer 3

回答by Alastair McCormack

Python 2

蟒蛇 2

The error is caused because ElementTree did not expect to find non-ASCII strings set the XML when trying to write it out. You should use Unicode strings for non-ASCII instead. Unicode strings can be made either by using the uprefix on strings, i.e. u''or by decoding a string with mystr.decode('utf-8')using the appropriate encoding.

该错误是因为 ElementTree 在尝试将其写出时没有期望找到非 ASCII 字符串设置 XML。您应该对非 ASCII 使用 Unicode 字符串。Unicode 字符串可以通过u在字符串上使用前缀来生成，即，u''或者通过mystr.decode('utf-8')使用适当的编码对字符串进行解码。

The best practice is to decode all text data as it's read, rather than decoding mid-program. The iomodule provides an open()method which decodes text data to Unicode strings as it's read.

最佳做法是在读取所有文本数据时对其进行解码，而不是在程序中解码。该io模块提供了一种open()方法，可在读取文本数据时将其解码为 Unicode 字符串。

ElementTree will be much happier with Unicodes and will properly encode it correctly when using the ET.write()method.

ElementTree 对 Unicode 会更满意，并且在使用该ET.write()方法时会正确地对其进行正确编码。

Also, for best compatibility and readability, ensure that ET encodes to UTF-8 during write()and adds the relevant header.

此外，为了获得最佳兼容性和可读性，请确保 ET 在期间编码为 UTF-8write()并添加相关标头。

Presuming your input file is UTF-8 encoded (0xC2is common UTF-8 lead byte), putting everything together, and using the withstatement, your code should look like:

假设您的输入文件是 UTF-8 编码（0xC2是常见的 UTF-8 前导字节），将所有内容放在一起，并使用该with语句，您的代码应如下所示：

with io.open('myText.txt', "r", encoding='utf-8') as f:
    data = f.read()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml", encoding='utf-8', xml_declaration=True)

Output:

输出：

<?xml version='1.0' encoding='utf-8'?>
<add><doc><field name="text">data</field></doc></add>

Answer 4

回答by Ankit Kumar Rathod

#!/usr/bin/python

# encoding=utf8

Try This to starting of python file

试试这个来启动python文件

Python UnicodeDecodeError: 'ascii' 编解码器无法解码字节 0xc2

提问by kagat-kagat

回答by uhbif19

回答by kqw

回答by Alastair McCormack

回答by Ankit Kumar Rathod

相关推荐

最近更新

标签

Python UnicodeDecodeError: 'ascii' 编解码器无法解码字节 0xc2

提问by kagat-kagat

回答by uhbif19

回答by kqw

回答by Alastair McCormack

回答by Ankit Kumar Rathod

相关推荐

Python 是否可以按降序使用 argsort？

Python Pandas 中布尔索引的逻辑运算符

Python pandas：选择数据框中所有零条目的列

Python 字典中的“TypeError: 'unicode' 对象不支持项目分配”

相关推荐

最近更新

标签