python 的 etree.tostring 的编码问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1428172/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 22:14:24  来源:igfitidea点击:

Encoding issues with python's etree.tostring

pythonxmlutf-8tostring

提问by smock

I'm using python 2.6.2's xml.etree.cElementTree to create an xml document:

我正在使用 python 2.6.2 的 xml.etree.cElementTree 创建一个 xml 文档:

import xml.etree.cElementTree as etree
elem = etree.Element('tag')
elem.text = (u"Würth Elektronik Midcom").encode('utf-8')
xml = etree.tostring(elem,encoding='UTF-8')

At the end of the day, xml looks like:

在一天结束时,xml 看起来像:

<?xml version='1.0' encoding='UTF-8'?>
<tag>W&#195;&#188;rth Elektronik Midcom</tag>

It looks like tostring ignored the encoding parameter and encoded 'ü' into some other character encoding ('ü' is a valid utf-8 encoding, I'm fairly sure).

看起来 tostring 忽略了编码参数并将 'ü' 编码为其他一些字符编码('ü' 是有效的 utf-8 编码,我很确定)。

Any advice as to what I'm doing wrong would be greatly appreciated.

任何关于我做错了什么的建议将不胜感激。

回答by John Millikin

You're encoding the text twice. Try this:

您对文本进行了两次编码。试试这个:

import xml.etree.cElementTree as etree
elem = etree.Element('tag')
elem.text = u"Würth Elektronik Midcom"
xml = etree.tostring(elem, encoding='UTF-8')

回答by BaiJiFeiLong

etree.tostring(elem, encoding=str)

etree.tostring(elem, encoding=str)

will return strbut not binaryin Python 3

将返回str但不在binaryPython 3 中

You can also serialise to a Unicode string without declaration by passing the unicodefunction as encoding (or strin Py3), or the name 'unicode'. This changes the return value from a byte string to an unencoded unicode string.

您还可以通过将unicode函数作为编码(或str在 Py3 中)或名称“unicode”传递来序列化为Unicode 字符串而无需声明。这会将返回值从字节字符串更改为未编码的 unicode 字符串。