PHP 的 DOM 和 SimpleXML 扩展之间有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4803063/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's the difference between PHP's DOM and SimpleXML extensions?
提问by Stann
I'm failing to comprehend why do we need 2 XML parsers in PHP.
我无法理解为什么我们需要在 PHP 中使用 2 个 XML 解析器。
Can someone explain the difference between those two?
有人可以解释这两者之间的区别吗?
回答by Gordon
In a nutshell:
简而言之:
SimpleXml
简单的XML
- is for simple XML and/or simple UseCases
- limited API to work with nodes (e.g. cannot program to an interface that much)
- all nodes are of the same kind (element node is the same as attribute node)
- nodes are magically accessible, e.g.
$root->foo->bar['attribute']
- 用于简单的 XML 和/或简单的用例
- 与节点一起工作的有限 API(例如,不能对接口进行那么多编程)
- 所有节点都属于同一类(元素节点与属性节点相同)
- 节点可以神奇地访问,例如
$root->foo->bar['attribute']
DOM
DOM
- is for any XML UseCase you might have
- is an implementation of the W3C DOM API(found implemented in many languages)
- differentiates between various Node Types (more control)
- much more verbose due to explicit API (can code to an interface)
- can parse broken HTML
- allows you to use PHP functions in XPath queries
- 适用于您可能拥有的任何 XML 用例
- 是 W3C DOM API 的实现(发现以多种语言实现)
- 区分各种节点类型(更多控制)
- 由于显式 API(可以编码到接口),因此更加冗长
- 可以解析损坏的 HTML
- 允许您在 XPath 查询中使用 PHP 函数
Both of these are based on libxmland can be influenced to some extend by the libxml functions
这两者都基于libxml,并且可以在一定程度上受到libxml 函数的影响
Personally, I dont like SimpleXml too much. That's because I dont like the implicit access to the nodes, e.g. $foo->bar[1]->baz['attribute']
. It ties the actual XML structure to the programming interface. The one-node-type-for-everything is also somewhat unintuitive because the behavior of the SimpleXmlElement magically changes depending on it's contents.
就个人而言,我不太喜欢 SimpleXml。那是因为我不喜欢对节点的隐式访问,例如$foo->bar[1]->baz['attribute']
. 它将实际的 XML 结构与编程接口联系起来。一个节点类型的一切也有点不直观,因为 SimpleXmlElement 的行为会根据它的内容神奇地改变。
For instance, when you have <foo bar="1"/>
the object dump of /foo/@bar
will be identical to that of /foo
but doing an echo of them will print different results. Moreover, because both of them are SimpleXml elements, you can call the same methods on them, but they will only get applied when the SimpleXmlElement supports it, e.g. trying to do $el->addAttribute('foo', 'bar')
on the first SimpleXmlElement will do nothing. Now of course it is correct that you cannot add an attribute to an Attribute Node, but the point is, an attribute node would not expose that method in the first place.
例如,当您拥有<foo bar="1"/>
的对象转储/foo/@bar
将与的对象转储相同/foo
但对它们进行回显时将打印不同的结果。此外,因为它们都是 SimpleXml 元素,所以您可以对它们调用相同的方法,但只有在 SimpleXmlElement 支持时它们才会被应用,例如,尝试$el->addAttribute('foo', 'bar')
在第一个 SimpleXmlElement 上做将什么都不做。当然,您不能向属性节点添加属性是正确的,但重点是,属性节点首先不会公开该方法。
But that's just my 2c. Make up your own mind:)
但这只是我的 2c。自己做决定:)
On a sidenote, there is not two parsers, but a couple more in PHP. SimpleXml and DOM are just the two that parse a document into a tree structure. The others are either pull or event based parsers/readers/writers.
在旁注中,PHP 中没有两个解析器,而是更多。SimpleXml 和 DOM 只是将文档解析为树结构的两个。其他的是基于拉取或事件的解析器/读取器/写入器。
Also see my answer to
另请参阅我对
回答by Josh Davis
I'm going to make the shortest answer possible so that beginners can take it away easily. I'm also slightly simplifying things for shortness' sake. Jump to the end of that answer for the overstated TL;DR version.
我将尽可能缩短答案,以便初学者可以轻松地将其带走。为了简短起见,我也稍微简化了一些事情。对于夸大的 TL;DR 版本,跳到该答案的末尾。
DOM and SimpleXML aren't actually two different parsers. The real parser is libxml2, which is used internally by DOM and SimpleXML. So DOM/SimpleXML are just two ways to use the same parser and they provide ways to convert one objectto another.
DOM 和 SimpleXML实际上并不是两个不同的解析器。真正的解析器是libxml2,它被 DOM 和 SimpleXML 内部使用。所以 DOM/SimpleXML 只是使用相同解析器的两种方法,它们提供了将一个对象转换为另一个对象的方法。
SimpleXMLis intended to be very simple so it has a small set of functions, and it is focused on reading and writing data. That is, you can easily read or write a XML file, you can update some valuesor remove some nodes (with some limitations!), and that's it. No fancy manipulation, and you don't have access to the less common node types. For instance, SimpleXML cannot create a CDATA section although it can read them.
SimpleXML旨在非常简单,因此它具有一小部分功能,并且专注于读取和写入数据。也就是说,您可以轻松读取或写入 XML 文件,您可以更新某些值或删除某些节点(有一些限制!),仅此而已。没有花哨的操作,并且您无法访问不太常见的节点类型。例如,SimpleXML 不能创建 CDATA 部分,尽管它可以读取它们。
DOMoffers a full-fledged implementation of the DOMplus a couple of non-standard methods such as appendXML. If you're used to manipulate DOM in Javascript, you'll find exactly the same methods in PHP's DOM. There's basically no limitationin what you can do and it evens handles HTML. The flipside to this richness of features is that it is more complexand more verbose than SimpleXML.
DOM提供了一个全面实施的DOM加上一对夫妇的非标准方法,如appendXML。如果您习惯于在 Javascript 中操作 DOM,您会在 PHP 的 DOM 中找到完全相同的方法。您可以做什么基本上没有限制,它甚至可以处理 HTML。这种丰富特性的另一面是它比 SimpleXML更复杂、更冗长。
Side-note
边注
People often wonder/ask what extension they should use to handle their XML or HTML content. Actually the choice is easy because there isn't much of a choice to begin with:
人们经常想知道/询问他们应该使用什么扩展来处理他们的 XML 或 HTML 内容。实际上选择很容易,因为开始时没有太多选择:
- if you need to deal with HTML, you don't really have a choice: you have to use DOM
- if you have to do anything fancy such as moving nodes or appending some raw XML, again you pretty much haveto use DOM
- if all you need to do is read and/or write some basic XML (e.g. exchanging data with an XML service or reading a RSS feed) then you can use either. Orboth.
- if your XML document is so big that it doesn't fit in memory, you can't use either and you have to use XMLReaderwhich is alsobased on libxml2, is even more annoying to use but still plays nice with others
- 如果你需要处理 HTML,你真的没有选择:你必须使用 DOM
- 如果你必须做任何花哨的事情,比如移动节点或附加一些原始 XML,那么你几乎必须使用 DOM
- 如果您需要做的只是读取和/或编写一些基本的 XML(例如,与 XML 服务交换数据或读取 RSS 提要),那么您可以使用其中任何一个。或者两者兼而有之。
- 如果您的 XML 文档太大以至于无法放入内存,那么您也不能使用它们,并且您必须使用同样基于 libxml2 的XMLReader,使用起来更烦人,但仍然可以很好地与其他人一起使用
TL;DR
TL; 博士
- SimpleXML is super easy to use but only good for 90% of use cases.
- DOM is more complex, but can do everything.
- XMLReader is super complicated, but uses very little memory. Very situational.
- SimpleXML 非常易于使用,但仅适用于 90% 的用例。
- DOM 更复杂,但可以做任何事情。
- XMLReader 非常复杂,但占用的内存很少。非常情况。
回答by IMSoP
As others have pointed out, the DOM and SimpleXML extensions are not strictly "XML parsers", rather they are different interfaces to the structure generated by the underlying libxml2 parser.
正如其他人指出的那样,DOM 和 SimpleXML 扩展并不是严格意义上的“XML 解析器”,而是与底层 libxml2 解析器生成的结构的不同接口。
The SimpleXML interface treats XML as a serialized data structure, in the same way you would treat a decoded JSON string. So it provides quick access to the contentsof a document, with emphasis on accessing elements by name, and reading their attributes and text content (including automatically folding in entities and CDATA sections). It supports documents containing multiple namespaces (primarily using the children()
and attributes()
methods), and can search a document using an XPath expression. It also includes support for basicmanipulation of the content - e.g. adding or overwriting elements or attributes with a new string.
SimpleXML 接口将 XML 视为序列化数据结构,就像对待解码的 JSON 字符串一样。因此,它提供了对文档内容的快速访问,重点是按名称访问元素,并读取它们的属性和文本内容(包括在实体和 CDATA 部分中自动折叠)。它支持包含多个名称空间的文档(主要使用children()
和attributes()
方法),并且可以使用 XPath 表达式搜索文档。它还包括对内容基本操作的支持——例如用新字符串添加或覆盖元素或属性。
The DOM interface, on the other hand, treats XML as a structured document, where the representation used is as important as the data represented. It therefore provides much more granular and explicit access to different types of "node", such as entities and CDATA sections, as well as some which are ignored by SimpleXML, such as comments and processing instructions. It also provides a much richer set of manipulation functions, allowing you to rearrange nodes and choose how to represent text content, for instance. The tradeoff is a fairly complex API, with a large number of classes and methods; since it implements a standard API (originally developed for manipulating HTML in JavaScript), there may be less of a "natural PHP" feel, but some programmers may be familiar with it from other contexts.
另一方面,DOM 接口将 XML 视为结构化文档,其中使用的表示与表示的数据一样重要。因此,它提供了对不同类型“节点”(例如实体和 CDATA 部分)以及一些被 SimpleXML 忽略的“节点”(例如注释和处理指令)的更细粒度和显式的访问。它还提供了一组更丰富的操作功能,例如,允许您重新排列节点并选择如何表示文本内容。权衡是一个相当复杂的 API,有大量的类和方法;由于它实现了标准 API(最初是为在 JavaScript 中操作 HTML 而开发的),因此可能没有“自然的 PHP”感觉,但一些程序员可能从其他上下文熟悉它。
Both interfaces require the full document to be parsed into memory, and effectively wrap up pointers into that parsed representation; you can even switch between the two wrappers with simplexml_import_dom()
and dom_import_simplexml()
, for instance to add a "missing" feature to SimpleXML using a function from the DOM API. For larger documents, the "pull-based" XMLReaderor the "event-based" XML Parsermay be more appropriate.
这两个接口都需要将完整文档解析到内存中,并有效地将指针包装到解析后的表示中;您甚至可以使用simplexml_import_dom()
和在两个包装器之间切换 dom_import_simplexml()
,例如使用来自 DOM API 的函数向 SimpleXML 添加“缺失”功能。对于较大的文档,“基于拉的” XMLReader或“基于事件的” XML 解析器可能更合适。
回答by hakre
Which DOMNodes can be represented by SimpleXMLElement?
SimpleXMLElement 可以表示哪些 DOMNode?
The biggest difference between the two libraries is that SimpleXML is mainly a single class: SimpleXMLElement
. In contrast, the DOM extension has many classes, most of them a subtype of DOMNode
.
这两个库之间最大的区别在于 SimpleXML 主要是一个单一的类:SimpleXMLElement
. 相比之下,DOM 扩展有很多类,其中大多数是DOMNode
.
So one core question when comparing those two libraries is which of the many classes DOM offers can be represented by a SimpleXMLElement
in the end?
因此,比较这两个库时的一个核心问题是 DOM 提供的众多类SimpleXMLElement
中的哪一个最终可以用 a 表示?
The following is a comparison table containing those DOMNode
types that are actually useful as long as dealing with XML is concerned (useful node types). Your mileage may vary, e.g. when you need to deal with DTDs for example:
下面是一个比较表,其中包含那些DOMNode
在处理 XML 时实际有用的类型(有用的节点类型)。您的里程可能会有所不同,例如,当您需要处理 DTD 时,例如:
+-------------------------+----+--------------------------+-----------+
| LIBXML Constant | # | DOMNode Classname | SimpleXML |
+-------------------------+----+--------------------------+-----------+
| XML_ELEMENT_NODE | 1 | DOMElement | yes |
| XML_ATTRIBUTE_NODE | 2 | DOMAttr | yes |
| XML_TEXT_NODE | 3 | DOMText | no [1] |
| XML_CDATA_SECTION_NODE | 4 | DOMCharacterData | no [2] |
| XML_PI_NODE | 7 | DOMProcessingInstruction | no |
| XML_COMMENT_NODE | 8 | DOMComment | no |
| XML_DOCUMENT_NODE | 9 | DOMDocument | no |
| XML_DOCUMENT_FRAG_NODE | 11 | DOMDocumentFragment | no |
+-------------------------+----+--------------------------+-----------+
[1]
: SimpleXML abstracts text-nodes as the string value of an element (compare__toString
). This does only work well when an element contains text only, otherwise text-information can get lost.[2]
: Every XML Parser can expand CDATA nodes when loading the document. SimpleXML expands these when theLIBXML_NOCDATA
optionis used withsimplexml_load_*
functionsor the constructor. (Option works as well withDOMDocument::loadXML()
)
[1]
:SimpleXML 将文本节点抽象为元素的字符串值(比较__toString
)。这仅在元素仅包含文本时有效,否则文本信息可能会丢失。[2]
: 每个 XML Parser 都可以在加载文档时扩展 CDATA 节点。SimpleXML expands these when theLIBXML_NOCDATA
optionis used withsimplexml_load_*
functionsor the constructor. (选项也适用于DOMDocument::loadXML()
)
As this table shows, SimpleXML has really limited interfaces compared to DOM. Next to the ones in the table, SimpleXMLElement
also abstracts access to children and attribute lists as well as it provides traversal via element names (property access), attributes (array access) as well as being a Traversable
iterating it's "own" children (elements or attributes) and offering namespaced access via the children()
and attributes()
methods.
如该表所示,与 DOM 相比,SimpleXML 的接口非常有限。在表中的旁边,SimpleXMLElement
还抽象了对子项和属性列表的访问,并通过元素名称(属性访问)、属性(数组访问)以及Traversable
迭代它的“自己的”子项(元素或属性)来提供遍历) 并通过children()
和attributes()
方法提供命名空间访问。
As long as all this magic interface it's fine, however it can not be changed by extending from SimpleXMLElement, so as magic as it is, as limited it is as well.
只要所有这些神奇的接口都可以,但是它不能通过从 SimpleXMLElement 扩展来改变,所以它是神奇的,也是有限的。
To find out which nodetype a SimpleXMLElement object represents, please see:
要找出 SimpleXMLElement 对象代表哪种节点类型,请参阅:
DOM follows here the DOMDocument Core Level 1 specs. You can do nearly every imaginable XML handling with that interface. However it's only Level 1, so compared with modern DOMDocument Levels like 3, it's somewhat limited for some cooler stuff. Sure SimpleXML has lost here as well.
DOM 在此遵循 DOMDocument Core Level 1 规范。您可以使用该接口执行几乎所有可以想象的 XML 处理。然而,它只是级别 1,因此与现代 DOMDocument 级别(如 3)相比,它对于一些更酷的东西有些限制。当然 SimpleXML 也在这里失败了。
SimpleXMLElement allows casting to subtypes. This is very special in PHP. DOM allows this as well, albeit it's a little bit more work and a more specific nodetype needs to be chosen.
SimpleXMLElement 允许转换为子类型。这在 PHP 中非常特殊。DOM 也允许这样做,尽管它需要做更多的工作并且需要选择更具体的节点类型。
XPath 1.0is supported by both, the result in SimpleXML is an array
of SimpleXMLElements
, in DOM a DOMNodelist
.
XPath 1.0中由双方都支持,结果SimpleXML中是array
的SimpleXMLElements
,在DOM中的DOMNodelist
。
SimpleXMLElement
supports casting to string and array (json), the DOMNode classes in DOM do not. They offer casting to array, but only like any other object does (public properties as keys/values).
SimpleXMLElement
支持转换为字符串和数组 (json),DOM 中的 DOMNode 类不支持。它们提供到数组的转换,但只像任何其他对象一样(公共属性作为键/值)。
Common usage patterns of those two extensions in PHP are:
这两个扩展在 PHP 中的常见使用模式是:
- You normally start to use SimpleXMLElement. Your level of knowledge about XML and XPath is on an equally low level.
- After fighting with the magic of its interfaces, a certain level of frustration is reached sooner or later.
- You discover that you can import
SimpleXMLElement
s into DOM and vice-versa. You learn more about DOM and how to use the extension to do stuff you were not able (or not able to find out how) to do withSimpleXMLElement
. - You notice that you can load HTML documents with the DOM extension. And invalid XML. And do output formatting. Things SimpleXMLElement just can't do. Not even with the dirty tricks.
- You probably even switch to DOM extension fully because at least you know that the interface is more differentiated and allows you to do stuff. Also you see a benefit in learning the DOM Level 1 because you can use it as well in Javascript and other languages (a huge benefit of DOM extension for many).
- 您通常开始使用 SimpleXMLElement。您对 XML 和 XPath 的知识水平同样低。
- 在与其界面的魔力搏斗之后,迟早会达到一定程度的挫败感。
- 您发现可以将
SimpleXMLElement
s导入到 DOM 中,反之亦然。您将了解有关 DOM 的更多信息以及如何使用扩展来完成您无法(或无法找到)使用SimpleXMLElement
. - 您注意到您可以加载带有 DOM 扩展名的 HTML 文档。和无效的 XML。并进行输出格式化。SimpleXMLElement 做不到的事情。甚至没有肮脏的伎俩。
- 您甚至可能完全切换到 DOM 扩展,因为至少您知道界面更加差异化并允许您做一些事情。您还会看到学习 DOM Level 1 的好处,因为您也可以在 Javascript 和其他语言中使用它(DOM 扩展对许多人来说是一个巨大的好处)。
You can have fun with both extensions and I think you should know both. The more the better. All the libxml based extensions in PHP are very good and powerful extensions. And on Stackoverflow under the phptag there is a good tradition to cover these libraries well and also with detailed information.
您可以享受这两个扩展的乐趣,我认为您应该了解两者。越多越好。PHP 中所有基于 libxml 的扩展都是非常好的和强大的扩展。在php标签下的 Stackoverflow 上,有一个很好的传统,可以很好地涵盖这些库并提供详细信息。
回答by usoban
SimpleXML is, as name states, simple parser for XML content, and nothing else. You cannot parse, let's say standard html content. It's easy and quick, and therefore a great tool for creating simple applications.
顾名思义,SimpleXML 是 XML 内容的简单解析器,仅此而已。您无法解析,比如说标准的 html 内容。它简单快捷,因此是创建简单应用程序的绝佳工具。
DOM extension, on other side, is much more powerful. It enables you to parse almost any DOM document, including html, xhtml, xml. It enables you to open, write and even correct output code, supports xpath and overall more manipulation. Therefore, its usage is much more complicated, because library is quite complex, and that makes it a perfect tool for bigger projects where heavy data manipulation is needed.
另一方面,DOM 扩展要强大得多。它使您能够解析几乎任何 DOM 文档,包括 html、xhtml、xml。它使您能够打开、编写甚至更正输出代码,支持 xpath 和更多的整体操作。因此,它的使用要复杂得多,因为库非常复杂,这使它成为需要大量数据操作的大型项目的完美工具。
Hope that answers your question :)
希望这能回答你的问题 :)