如何通过 Domdocument PHP 获取第一级 dom 元素?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5882433/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How get first level of dom elements by Domdocument PHP?
提问by Yosef
How get first level of dom elements by Domdocument PHP?
如何通过 Domdocument PHP 获取第一级 dom 元素?
Example with code that not works - tooken from Q&A:http://stackoverflow.com/questions/1540302/how-to-get-nodes-in-first-level-using-php-domdocument
代码不起作用的示例 - 取自问答:http://stackoverflow.com/questions/1540302/how-to-get-nodes-in-first-level-using-php-domdocument
<?php
$str=<<< EOD
<div id="header">
</div>
<div id="content">
<div id="sidebar">
</div>
<div id="info">
</div>
</div>
<div id="footer">
</div>
EOD;
$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXpath($doc);
$entries = $xpath->query("/");
foreach ($entries as $entry) {
var_dump($entry->firstChild->nodeValue);
}
?>
Thanks, Yosef
谢谢,约瑟夫
回答by Gordon
The first level of elements below the root node can be accessed with
可以使用以下命令访问根节点下方的第一级元素
$dom->documentElement->childNodes
The childNodes property contains a DOMNodeList
, which you can iterate with foreach
.
childNodes 属性包含DOMNodeList
,您可以使用 进行迭代foreach
。
See DOMDocument::documentElement
看 DOMDocument::documentElement
This is a convenience attribute that allows direct access to the child node that is the document element of the document.
这是一个方便的属性,允许直接访问作为文档的文档元素的子节点。
A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.
包含此节点的所有子节点的 DOMNodeList。如果没有孩子,这是一个空的 DOMNodeList。
Since childNodes
is a property of DOMNode
any class extending DOMNode
(which is most of the classes in DOM) have this property, so to get the first level of elements below a DOMElement
is to access that DOMElement's childNode property.
由于childNodes
是DOMNode
任何扩展类的属性DOMNode
(这是 DOM 中的大多数类)都具有此属性,因此要获取 a 下方的第一级元素DOMElement
是访问该 DOMElement 的 childNode 属性。
Note that if you use DOMDocument::loadHTML()
on invalid HTML or partial documents, the HTML parser module will add an HTML skeleton with html and body tags, so in the DOM tree, the HTML in your example will be
请注意,如果您DOMDocument::loadHTML()
在无效的 HTML 或部分文档上使用,HTML 解析器模块将添加一个带有 html 和 body 标签的 HTML 框架,因此在 DOM 树中,您示例中的 HTML 将是
<!DOCTYPE html … ">
<html><body><div id="header">
</div>
<div id="content">
<div id="sidebar">
</div>
<div id="info">
</div>
</div>
<div id="footer">
</div></body></html>
which you have to take into account when traversing or using XPath. Consequently, using
在遍历或使用 XPath 时必须考虑到这一点。因此,使用
$dom = new DOMDocument;
$dom->loadHTML($str);
foreach ($dom->documentElement->childNodes as $node) {
echo $node->nodeName; // body
}
will only iterate the <body>
DOMElement node. Knowing that libxml will add the skeleton, you will have to iterate over the childNodes of the <body>
element to get the div elements from your example code, e.g.
只会迭代<body>
DOMElement 节点。知道 libxml 将添加骨架,您将必须遍历<body>
元素的 childNodes以从示例代码中获取 div 元素,例如
$dom->getElementsByTagName('body')->item(0)->childNodes
However, doing so will also take into account any whitespace nodes, so you either have to make sure to set preserveWhiteSpace
to false or query for the right element nodeTypeif you only want to get DOMElement
nodes, e.g.
但是,这样做也会考虑任何空白节点,因此如果您只想获取节点,则必须确保设置preserveWhiteSpace
为 false 或查询正确的元素nodeTypeDOMElement
,例如
foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
if ($node->nodeType === XML_ELEMENT_NODE) {
echo $node->nodeName;
}
}
or use XPath
或使用 XPath
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/html/body/*') as $node) {
echo $node->nodeName;
}
Additional information:
附加信息: