php 循环遍历 DOMDocument
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2909849/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Loop over DOMDocument
提问by Zoredache
I am following the suggestion from this question Robust, Mature HTML Parser for PHP, about parsing html that may be malformed with DOMDocument.
我正在遵循这个问题的建议 Robust, Mature HTML Parser for PHP,关于解析可能与DOMDocument格式错误的 html 。
Is there any easy way to loop over the parsed document? So I would like to loop over html like this.
有没有简单的方法来循环解析的文档?所以我想像这样循环 html。
$html='<ul>
<li>value1</li>
<li>value1</li>
<li>value3
<p>subvalue</p>
</li>
</ul>
<p>hello world</p>';
$doc = new DOMDocument();
$doc->loadHTML($html);
???
foreach (??? as $node)
{
print $node->nodeName.':'.$node->nodeValue;
}
And get results somewhat like this.
并得到有点像这样的结果。
ul:
li:value1
li:value2
li:value3
p:subvalue
p:hello world
Using $doc->childNodesby itself doesn't really do what I want. Since it doesn't seem to go down to lower branches in the tree. I used the code suggested by halfdanand I get results like this.
单独使用$doc->childNodes并不能真正做到我想要的。因为它似乎没有下降到树中的较低分支。我使用了halfdan建议的代码,得到了这样的结果。
html:
html:value1
value1
value3
subvalue
hello world
回答by halfdan
Try this:
尝试这个:
$doc = new DOMDocument();
$doc->loadHTML($html);
showDOMNode($doc);
function showDOMNode(DOMNode $domNode) {
foreach ($domNode->childNodes as $node)
{
print $node->nodeName.':'.$node->nodeValue;
if($node->hasChildNodes()) {
showDOMNode($node);
}
}
}
回答by JustAC0der
You need to use PHP Simple HTML DOM Parserand the following code:
您需要使用PHP Simple HTML DOM Parser和以下代码:
<?php
require_once 'simplehtmldom/simple_html_dom.php';
function iterateHtmlElements($html)
{
$dom = str_get_html($html);
$dom->set_callback('handleElement');
$dom->__toString();
echo "\n";
}
function handleElement(simple_html_dom_node $elem)
{
if($elem->tag == 'text') {
echo $elem->innertext();
}
else {
echo "\n" . $elem->tag . ": ";
}
}
$html='<ul>
<li>value1</li>
<li>value1</li>
<li>value3
<p>subvalue</p>
</li>
</ul>
<p>hello world</p>';
iterateHtmlElements($html);
It works exactly as expected. I checked it with the input you provided and got the following results:
它完全按预期工作。我用你提供的输入检查了它并得到以下结果:
> php test2.php
ul:
li: value1
li: value1
li: value3
p: subvalue
p: hello world
回答by Alexis Wilke
One way is to walk the tree as follow:
一种方法是走树如下:
function next_node($node)
{
if($node->firstChild != null)
{
return $node->firstChild;
}
if($node->nextSibling != null)
{
return $node->nextSibling;
}
for($node = $node->parentNode; $node != null; $node = $node->parentNode)
{
if($node->nextSibling != null)
{
return $node->nextSibling;
}
}
return null;
}
for($node = $doc; $node != null; $node = next_node($node))
{
// handle node (read-only mode, if you need read-write
// you have to save all the nodes in an array and then
// use that array
//
...
}
This works for most documents, however it looks like at times the parentNodeis somehow not correctly set and the next_node()function ends up returning the wrong information.
这适用于大多数文档,但有时看起来似乎parentNode没有正确设置并且该next_node()函数最终返回错误信息。
回答by Drunken Peacock
I was having issues with elements that had c data, where even elements that didn't have children where returning that they did.
我遇到了具有 c 数据的元素的问题,即使是没有子元素的元素也会返回它们。
I am not sure why it was.
我不知道为什么会这样。
The work around I found was to change
我发现的解决方法是改变
if($node->hasChildNodes()) {
showDOMNode($node);
}
to
到
if($node->childNodes->length != 1) {
showDOMNode($node);
}
And the code now works perfectly.
代码现在可以完美运行。

