如何通过 Domdocument PHP 获取第一级 dom 元素?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5882433/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 22:46:53  来源:igfitidea点击:

How get first level of dom elements by Domdocument PHP?

phpxpathdomdocument

提问by Yosef

How get first level of dom elements by Domdocument PHP?

如何通过 Domdocument PHP 获取第一级 dom 元素?

Example with code that not works - tooken from Q&A:http://stackoverflow.com/questions/1540302/how-to-get-nodes-in-first-level-using-php-domdocument

代码不起作用的示例 - 取自问答:http://stackoverflow.com/questions/1540302/how-to-get-nodes-in-first-level-using-php-domdocument

<?php
$str=<<< EOD
<div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div>
EOD;

$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXpath($doc);
$entries = $xpath->query("/");
foreach ($entries as $entry) {
    var_dump($entry->firstChild->nodeValue);
}
?>

Thanks, Yosef

谢谢,约瑟夫

回答by Gordon

The first level of elements below the root node can be accessed with

可以使用以下命令访问根节点下方的第一级元素

$dom->documentElement->childNodes

The childNodes property contains a DOMNodeList, which you can iterate with foreach.

childNodes 属性包含DOMNodeList,您可以使用 进行迭代foreach

See DOMDocument::documentElement

DOMDocument::documentElement

This is a convenience attribute that allows direct access to the child node that is the document element of the document.

这是一个方便的属性,允许直接访问作为文档的文档元素的子节点。

and DOMNode::childNodes

DOMNode::childNodes

A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.

包含此节点的所有子节点的 DOMNodeList。如果没有孩子,这是一个空的 DOMNodeList。

Since childNodesis a property of DOMNodeany class extending DOMNode(which is most of the classes in DOM) have this property, so to get the first level of elements below a DOMElementis to access that DOMElement's childNode property.

由于childNodesDOMNode任何扩展类的属性DOMNode(这是 DOM 中的大多数类)都具有此属性,因此要获取 a 下方的第一级元素DOMElement是访问该 DOMElement 的 childNode 属性。



Note that if you use DOMDocument::loadHTML()on invalid HTML or partial documents, the HTML parser module will add an HTML skeleton with html and body tags, so in the DOM tree, the HTML in your example will be

请注意,如果您DOMDocument::loadHTML()在无效的 HTML 或部分文档上使用,HTML 解析器模块将添加一个带有 html 和 body 标签的 HTML 框架,因此在 DOM 树中,您示例中的 HTML 将是

<!DOCTYPE html … ">
<html><body><div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div></body></html>

which you have to take into account when traversing or using XPath. Consequently, using

在遍历或使用 XPath 时必须考虑到这一点。因此,使用

$dom = new DOMDocument;
$dom->loadHTML($str);
foreach ($dom->documentElement->childNodes as $node) {
    echo $node->nodeName; // body
}

will only iterate the <body>DOMElement node. Knowing that libxml will add the skeleton, you will have to iterate over the childNodes of the <body>element to get the div elements from your example code, e.g.

只会迭代<body>DOMElement 节点。知道 libxml 将添加骨架,您将必须遍历<body>元素的 childNodes以从示例代码中获取 div 元素,例如

$dom->getElementsByTagName('body')->item(0)->childNodes

However, doing so will also take into account any whitespace nodes, so you either have to make sure to set preserveWhiteSpaceto false or query for the right element nodeTypeif you only want to get DOMElementnodes, e.g.

但是,这样做也会考虑任何空白节点,因此如果您只想获取节点,则必须确保设置preserveWhiteSpace为 false 或查询正确的元素nodeTypeDOMElement,例如

foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
    if ($node->nodeType === XML_ELEMENT_NODE) {
        echo $node->nodeName;
    }
}

or use XPath

或使用 XPath

$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/html/body/*') as $node) {
    echo $node->nodeName;
}

Additional information:

附加信息: