PHP DOM 获取节点值 html?(不带剥离标签)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6286362/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP DOM get nodevalue html? (without stripping tags)
提问by Marty
I am trying to get the innerhtml of div tags in a file using nodeValue, however this code is outputting only plain text and seems to strip out all html tag from inside the div. How can I change this code to output the div's HTML content and not plain text, AND also output the main div wrapping it's child elements.
我正在尝试使用 nodeValue 在文件中获取 div 标签的 innerhtml,但是此代码仅输出纯文本并且似乎从 div 内部去除了所有 html 标签。如何更改此代码以输出 div 的 HTML 内容而不是纯文本,并且还输出包装其子元素的主 div。
Example:
例子:
contents of file.txt:
file.txt 的内容:
<div class="1"><span class="test">text text text</span></div>
<div class="2"><span class="test">text text text</span></div>
<div class="3"><span class="test">text text text</span></div>
script.php:
脚本.php:
$file= file_get_contents('file.txt');
$doc = new DOMDocument();
@$doc->loadHTML('<?xml encoding="UTF-8">'.$file);
$entries = $doc->getElementsByTagName('div');
for ($i=0;$i<$entries->length;$i++) {
$entry = $entries->item($i);
echo $entry->nodeValue;
}
outputs: text text texttext text texttext text text
输出:文本文本文本文本文本文本文本文本文本
what I need it to output:
我需要它输出什么:
<div class="1"><span class="test">text text text</span></div>
<div class="2"><span class="test">text text text</span></div>
<div class="3"><span class="test">text text text</span></div>
Notice the parent div's (..etc) are needed to be outputted as well wrapping the span tags...
请注意,需要输出父 div (..etc) 以及包装 span 标签...
HELP!
帮助!
回答by regex
I have never done what you're attempting to do, but as a stab in the dark, using the API docs, does echo $entry->textContent; work?
我从来没有做过你想要做的事情,但是作为在黑暗中的刺,使用 API 文档,确实 echo $entry->textContent; 工作?
Adding an update. This is from the comments located on the docs page for DOMNode:
添加更新。这是来自DOMNode文档页面上的评论:
Hi!
你好!
Combining all th comments, the easiest way to get inner HTML of the node is to use this function:
结合所有注释,获取节点内部 HTML 的最简单方法是使用此函数:
<?php function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML; } ?>
Or, maybe a simpler method is just to do:
或者,也许一个更简单的方法就是这样做:
echo $domDocument->saveXML($entry);
回答by Kai Noack
Instead of:
代替:
echo $entry->nodeValue;
You have to use:
你必须使用:
echo $doc->saveXML($entry);
Here is a more complete example that might help others too, $doccontent
is the HTML block as a string:
这是一个更完整的示例,也可能对其他人有帮助,$doccontent
将 HTML 块作为字符串:
$doccontent = '<html> …'; // your html string
$dom = new DOMDocument;
$internalErrors = libxml_use_internal_errors(true); // prevent error messages
$content_utf = mb_convert_encoding($doccontent, 'HTML-ENTITIES', 'UTF-8'); // correct parsing of utf-8 chars
$dom->loadHTML($content_utf);
libxml_use_internal_errors($internalErrors); // prevent error messages
$specialdiv = $dom->getElementById('xdiv');
if(isset($specialdiv))
{
echo $dom->saveXML($specialdiv);
}