Python UnicodeDecodeError: 'ascii' 编解码器无法解码字节 0xc2

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16508539/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:51:17  来源:igfitidea点击:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2

python

提问by kagat-kagat

I am creating XML file in Python and there's a field on my XML that I put the contents of a text file. I do it by

我正在用 Python 创建 XML 文件,我的 XML 上有一个字段,用于放置文本文件的内容。我这样做

f = open ('myText.txt',"r")
data = f.read()
f.close()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml")

And then I get the UnicodeDecodeError. I already tried to put the special comment # -*- coding: utf-8 -*-on top of my script but still got the error. Also I tried already to enforce the encoding of my variable data.encode('utf-8')but still got the error. I know this issue is very common but all the solutions I got from other questions didn't work for me.

然后我得到了UnicodeDecodeError. 我已经尝试将特殊注释# -*- coding: utf-8 -*-放在我的脚本之上,但仍然出现错误。此外,我已经尝试强制对我的变量进行编码,data.encode('utf-8')但仍然出现错误。我知道这个问题很常见,但我从其他问题中得到的所有解决方案都对我不起作用。

UPDATE

更新

Traceback: Using only the special comment on the first line of the script

回溯:仅使用脚本第一行的特殊注释

Traceback (most recent call last):
  File "D:\Python\lse\createxml.py", line 151, in <module>
    tree.write("D:\python\lse\xmls\" + items[ctr][0] + ".xml")
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write
    serialize(write, self._root, encoding, qnames, namespaces)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata
    return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 243: ordina
l not in range(128)

Traceback: Using .encode('utf-8')

追溯:使用 .encode('utf-8')

Traceback (most recent call last):
  File "D:\Python\lse\createxml.py", line 148, in <module>
    field.text = data.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 227: ordina
l not in range(128)

I used .decode('utf-8')and the error message didn't appear and it successfully created my XML file. But the problem is that the XML is not viewable on my browser.

我使用过.decode('utf-8')并且没有出现错误消息,它成功地创建了我的 XML 文件。但问题是在我的浏览器上看不到 XML。

回答by uhbif19

You need to decode data from input string into unicode, before using it, to avoid encoding problems.

在使用之前,您需要将输入字符串中的数据解码为 un​​icode,以避免编码问题。

field.text = data.decode("utf8")

回答by kqw

I was running into a similar error in pywikipediabot. The .decodemethod is a step in the right direction but for me it didn't work without adding 'ignore':

我在 pywikipediabot 中遇到了类似的错误。该.decode方法是朝着正确方向迈出的一步,但对我来说,如果不添加它就不起作用'ignore'

ignore_encoding = lambda s: s.decode('utf8', 'ignore')

Ignoring encoding errors can lead to data loss or produce incorrect output. But if you just want to get it done and the details aren't very important this can be a good way to move faster.

忽略编码错误会导致数据丢失或产生不正确的输出。但是,如果您只是想完成它并且细节不是很重要,那么这可能是加快行动速度的好方法。

回答by Alastair McCormack

Python 2

蟒蛇 2

The error is caused because ElementTree did not expect to find non-ASCII strings set the XML when trying to write it out. You should use Unicode strings for non-ASCII instead. Unicode strings can be made either by using the uprefix on strings, i.e. u''or by decoding a string with mystr.decode('utf-8')using the appropriate encoding.

该错误是因为 ElementTree 在尝试将其写出时没有期望找到非 ASCII 字符串设置 XML。您应该对非 ASCII 使用 Unicode 字符串。Unicode 字符串可以通过u在字符串上使用前缀来生成,即,u''或者通过mystr.decode('utf-8')使用适当的编码对字符串进行解码。

The best practice is to decode all text data as it's read, rather than decoding mid-program. The iomodule provides an open()method which decodes text data to Unicode strings as it's read.

最佳做法是在读取所有文本数据时对其进行解码,而不是在程序中解码。该io模块提供了一种open()方法,可在读取文本数据时将其解码为 Unicode 字符串。

ElementTree will be much happier with Unicodes and will properly encode it correctly when using the ET.write()method.

ElementTree 对 Unicode 会更满意,并且在使用该ET.write()方法时会正确地对其进行正确编码。

Also, for best compatibility and readability, ensure that ET encodes to UTF-8 during write()and adds the relevant header.

此外,为了获得最佳兼容性和可读性,请确保 ET 在期间编码为 UTF-8write()并添加相关标头。

Presuming your input file is UTF-8 encoded (0xC2is common UTF-8 lead byte), putting everything together, and using the withstatement, your code should look like:

假设您的输入文件是 UTF-8 编码(0xC2是常见的 UTF-8 前导字节),将所有内容放在一起,并使用该with语句,您的代码应如下所示:

with io.open('myText.txt', "r", encoding='utf-8') as f:
    data = f.read()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml", encoding='utf-8', xml_declaration=True)

Output:

输出:

<?xml version='1.0' encoding='utf-8'?>
<add><doc><field name="text">data</field></doc></add>

回答by Ankit Kumar Rathod

#!/usr/bin/python

#!/usr/bin/python

# encoding=utf8

# encoding=utf8

Try This to starting of python file

试试这个来启动python文件