如何在没有第三方库的情况下使用 python 验证 xml?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13742538/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 09:31:11  来源:igfitidea点击:

How to validate xml using python without third-party libs?

python

提问by WoooHaaaa

I have some xml pieces like this:

我有一些像这样的 xml 片段:

<!DOCTYPE mensaje SYSTEM "record.dtd">
<record>
    <player_birthday>1979-09-23</player_birthday>
    <player_name>Orene Ai'i</player_name>
    <player_team>Blues</player_team>
    <player_id>453</player_id>
    <player_height>170</player_height>
    <player_position>F&W</player_position>   <---- a '&' here.
    <player_weight>75</player_weight>
</record>

Is there any way to validate whether the xml pieces is well-formatted? Is there any way to validate the xml against a DTD or XML Scheme?

有什么方法可以验证 xml 片段是否格式正确?有什么方法可以根据 DTD 或 XML 方案验证 xml?

For various reasons I can't use any third-party packages.

由于各种原因,我不能使用任何第三方软件包。

e.g. the xml above is not conrrect since it has a '&' in it. Note that the DOCTYPE definition sentence refer to a DTD.

例如,上面的 xml 是不正确的,因为它里面有一个“&”。请注意,DOCTYPE 定义语句指的是 DTD。

采纳答案by jsbueno

Just try to parse it with ElementTree (xml.etree.ElementTree.fromstring) - it will raise an error if the XML is not well formed.

只需尝试使用 ElementTree (xml.etree.ElementTree.fromstring) 解析它 - 如果 XML 格式不正确,它将引发错误。

>>> a = """<record>
...     <player_birthday>1979-09-23</player_birthday>
...     <player_name>Orene Ai'i</player_name>
...     <player_team>Blues</player_team>
...     <player_id>453</player_id>
...     <player_height>170</player_height>
...     <player_position>F&W</player_position>   <---- a '&' here.
...     <player_weight>75</player_weight>
... </record>"""
>>> 
>>> from xml.etree import ElementTree as ET
>>> x = ET.fromstring(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1282, in XML
    parser.feed(text)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1624, in feed
    self._raiseerror(v)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1488, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 7, column 24

回答by Thomas Orozco

You can use python's xml.dom.minidomXML parser (which is in the standard library, but isn't as powerful as alternatives such as lxml).

您可以使用 python 的xml.dom.minidomXML 解析器(它在标准库中,但不如lxml.

Just do:

做就是了:

import xml.dom.minidom
xml.dom.minidom.parseString('<My><XML><String/><XML/><My/>')

You will get a xml.parsers.expat.ExpatErrorif the XML is invalid.

xml.parsers.expat.ExpatError如果 XML 无效,您将得到。