Python Unicode 字符串的 lxml.etree.XML ValueError

Question

提问by Papouche Guinslyzinho

I'm transforming an xmldocument with xslt. While doing it with python3 I had this following error. But I don't have any errors with python2

我正在使用xslt转换xml文档。使用 python3 执行此操作时，出现以下错误。但是我对 python2 没有任何错误

-> % python3 cstm/artefact.py
Traceback (most recent call last):
  File "cstm/artefact.py", line 98, in <module>
    simplify_this_dataset('fisheries-service-des-peches.xml')
  File "cstm/artefact.py", line 85, in simplify_this_dataset
    xslt_root = etree.XML(xslt_content)
  File "lxml.etree.pyx", line 3012, in lxml.etree.XML (src/lxml/lxml.etree.c:67861)
  File "parser.pxi", line 1780, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:102420)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

#!/usr/bin/env python3
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
# -*- coding: utf-8 -*-

from lxml import etree

def simplify_this_dataset(dataset):
    """Create A simplify version of an xml file
    it will remove all the attributes and assign them as Elements instead
    """
    module_path = os.path.dirname(os.path.abspath(__file__))
    data = open(module_path+'/data/ex-fire.xslt')
    xslt_content = data.read()
    xslt_root = etree.XML(xslt_content)
    dom = etree.parse(module_path+'/../CanSTM_dataset/'+dataset)
    transform = etree.XSLT(xslt_root)
    result = transform(dom)
    f = open(module_path+ '/../CanSTM_dataset/otra.xml', 'w')
    f.write(str(result))
    f.close()

Answer 1

采纳答案by bobince

data = open(module_path+'/data/ex-fire.xslt')
xslt_content = data.read()

This implicitly decodes the bytes in the file to Unicode text, using the default encoding. (This might give wrong results, if the XML file isn't in that encoding.)

这使用默认编码将文件中的字节隐式解码为 Unicode 文本。（如果 XML 文件不在该编码中，这可能会产生错误的结果。）

xslt_root = etree.XML(xslt_content)

XML has its own handling and signalling for encodings, the <?xml encoding="..."?>prolog. If you pass a Unicode string starting with <?xml encoding="..."?>to a parser, the parser would like to reintrepret the rest of the byte string using that encoding... but can't, because you've already decoded the byte input to a Unicode string.

XML 有自己的编码处理和信号发送，即<?xml encoding="..."?>序言。如果您将一个以开头的 Unicode 字符串传递<?xml encoding="..."?>给解析器，解析器希望使用该编码重新解释字节字符串的其余部分……但不能，因为您已经将字节输入解码为 Unicode 字符串。

Instead, you should either pass the undecoded byte string to the parser:

相反，您应该将未解码的字节字符串传递给解析器：

data = open(module_path+'/data/ex-fire.xslt', 'rb')

xslt_content = data.read()
xslt_root = etree.XML(xslt_content)

or, better, just have the parser read straight from the file:

或者，更好的是让解析器直接从文件中读取：

xslt_root = etree.parse(module_path+'/data/ex-fire.xslt')

Answer 2

回答by Josh Allemon

You can also decode the UTF-8 string and encode it with ascii before passing it to etree.XML

您还可以解码 UTF-8 字符串并使用 ascii 对其进行编码，然后再将其传递给 etree.XML

 xslt_content = data.read()
 xslt_content = xslt_content.decode('utf-8').encode('ascii')
 xslt_root = etree.XML(xslt_content)

Answer 3

回答by Loki

I made it work by simply reencoding with the default options

我通过简单地使用默认选项重新编码来使它工作

xslt_content = data.read().encode()

Python Unicode 字符串的 lxml.etree.XML ValueError

提问by Papouche Guinslyzinho

采纳答案by bobince

回答by Josh Allemon

回答by Loki

相关推荐

最近更新

标签

Python Unicode 字符串的 lxml.etree.XML ValueError

提问by Papouche Guinslyzinho

采纳答案by bobince

回答by Josh Allemon

回答by Loki

相关推荐

Python 我可以在没有自动 ID 的情况下在 Django 中创建模型吗？

Python 手动设置图例中点的颜色

Python 读取 Excel 单元格值而不是计算它的公式 -openpyxl

Python 如何升级django？

相关推荐

最近更新

标签