Python 中的 XML 解析
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1373707/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
XML parsing in Python
提问by Alex
I'd like to parse a simple, small XML file using python however work on pyXML seems to have ceased. I'd like to use python 2.6 if possible. Can anyone recommend an XML parser that will work with 2.6?
我想使用 python 解析一个简单的小型 XML 文件,但是 pyXML 的工作似乎已经停止。如果可能,我想使用 python 2.6。谁能推荐一个适用于 2.6 的 XML 解析器?
Thanks
谢谢
回答by Eli Courtwright
If it's small and simple then just use the standard library:
如果它小而简单,那么只需使用标准库:
from xml.dom.minidom import parse
doc = parse("filename.xml")
This will return a DOM tree implementing the standard Document Object Model API
这将返回一个实现标准文档对象模型 API的 DOM 树
If you later need to do complex things like schema validation or XPath querying then I recommend the third-party lxml module, which is a wrapper around the popular libxml2 C library.
如果您以后需要做一些复杂的事情,比如模式验证或 XPath 查询,那么我推荐第三方lxml 模块,它是流行的 libxml2 C 库的包装器。
回答by Alex
For most of my tasks I have used the Minidom Lightweight DOM implementation, from the official page:
对于我的大部分任务,我使用了官方页面上的 Minidom Lightweight DOM 实现:
from xml.dom.minidom import parse, parseString
dom1 = parse('c:\temp\mydata.xml') # parse an XML file by name
datasource = open('c:\temp\mydata.xml')
dom2 = parse(datasource) # parse an open file
dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')
回答by Andrei Vajna II
回答by Il-Bhima
回答by steveha
A few years ago, I wrote a library for working with structuredXML. It makes XML simpler by making some limiting assumptions.
几年前,我编写了一个用于处理结构化XML的库。它通过做出一些限制性假设使 XML 更简单。
You could use XML for something like a word processor document, in which case you have a complicated soup of stuff with XML tags embedded all over the place; in which case my library would not be good.
您可以将 XML 用于文字处理器文档之类的内容,在这种情况下,您会遇到一堆复杂的东西,其中到处都嵌入了 XML 标签;在这种情况下,我的图书馆不会很好。
But if you are using XML for something like a config file, my library is rather convenient. You define classes that describe the structure of the XML you want, and once you have the classes done, there is a method to slurp in XML and parse it. The actual parsing is done by xml.dom.minidom, but then my library extracts the data and puts it in the classes.
但是,如果您将 XML 用于配置文件之类的内容,我的库就相当方便了。您可以定义描述所需 XML 结构的类,一旦完成了这些类,就有一种方法可以在 XML 中获取并解析它。实际的解析由 xml.dom.minidom 完成,但随后我的库提取数据并将其放入类中。
The best part: you can declare a "Collection" type that will be a Python list with zero or more other XML elements inside it. This is great for things like Atom or RSS feeds (which was the original reason I designed the library).
最好的部分:您可以声明一个“集合”类型,该类型将是一个 Python 列表,其中包含零个或多个其他 XML 元素。这对于 Atom 或 RSS 提要(这是我设计该库的最初原因)之类的内容非常有用。
Here's the URL: http://home.avvanta.com/~steveha/xe.html
这是网址: http://home.avvanta.com/~steveha/xe.html
I'd be happy to answer questions if you have any.
如果您有任何问题,我很乐意回答。