PHP DOM 获取节点值 html？（不带剥离标签）

Question

提问by Marty

I am trying to get the innerhtml of div tags in a file using nodeValue, however this code is outputting only plain text and seems to strip out all html tag from inside the div. How can I change this code to output the div's HTML content and not plain text, AND also output the main div wrapping it's child elements.

我正在尝试使用 nodeValue 在文件中获取 div 标签的 innerhtml，但是此代码仅输出纯文本并且似乎从 div 内部去除了所有 html 标签。如何更改此代码以输出 div 的 HTML 内容而不是纯文本，并且还输出包装其子元素的主 div。

Example:

例子：

contents of file.txt:

file.txt 的内容：

<div class="1"><span class="test">text text text</span></div>
<div class="2"><span class="test">text text text</span></div>
<div class="3"><span class="test">text text text</span></div>

script.php:

脚本.php：

  $file= file_get_contents('file.txt');

    $doc = new DOMDocument();

    @$doc->loadHTML('<?xml encoding="UTF-8">'.$file); 

    $entries = $doc->getElementsByTagName('div');

        for ($i=0;$i<$entries->length;$i++) {
            $entry = $entries->item($i);
            echo $entry->nodeValue;
        }

outputs: text text texttext text texttext text text

输出：文本文本文本文本文本文本文本文本文本

what I need it to output:

我需要它输出什么：

<div class="1"><span class="test">text text text</span></div>
<div class="2"><span class="test">text text text</span></div>
<div class="3"><span class="test">text text text</span></div>

Notice the parent div's (..etc) are needed to be outputted as well wrapping the span tags...

请注意，需要输出父 div (..etc) 以及包装 span 标签...

HELP!

帮助！

Answer 1

回答by regex

I have never done what you're attempting to do, but as a stab in the dark, using the API docs, does echo $entry->textContent; work?

我从来没有做过你想要做的事情，但是作为在黑暗中的刺，使用 API 文档，确实 echo $entry->textContent; 工作？

Adding an update. This is from the comments located on the docs page for DOMNode:

添加更新。这是来自DOMNode文档页面上的评论：

Hi!

你好！

Combining all th comments, the easiest way to get inner HTML of the node is to use this function:

结合所有注释，获取节点内部 HTML 的最简单方法是使用此函数：

<?php  function get_inner_html( $node ) { 
    $innerHTML= ''; 
    $children = $node->childNodes; 
    foreach ($children as $child) { 
        $innerHTML .= $child->ownerDocument->saveXML( $child ); 
    } 

    return $innerHTML;  }  ?>

Or, maybe a simpler method is just to do:

或者，也许一个更简单的方法就是这样做：

echo $domDocument->saveXML($entry);

Answer 2

回答by Kai Noack

Instead of:

代替：

echo $entry->nodeValue;

You have to use:

你必须使用：

echo $doc->saveXML($entry);

Here is a more complete example that might help others too, $doccontentis the HTML block as a string:

这是一个更完整的示例，也可能对其他人有帮助，$doccontent将 HTML 块作为字符串：

$doccontent = '<html> …'; // your html string
$dom = new DOMDocument;
$internalErrors = libxml_use_internal_errors(true); // prevent error messages 
$content_utf = mb_convert_encoding($doccontent, 'HTML-ENTITIES', 'UTF-8'); // correct parsing of utf-8 chars
$dom->loadHTML($content_utf);
libxml_use_internal_errors($internalErrors); // prevent error messages 
$specialdiv = $dom->getElementById('xdiv');
if(isset($specialdiv))
{
    echo $dom->saveXML($specialdiv);
}

PHP DOM 获取节点值 html？（不带剥离标签）

提问by Marty

回答by regex

回答by Kai Noack

相关推荐

最近更新

标签

PHP DOM 获取节点值 html？（不带剥离标签）

提问by Marty

回答by regex

回答by Kai Noack

相关推荐

DVWA 设置 PHP 函数 allow_url_include：已禁用

php 使用分页获取结果总数

PHP 错误：php_network_getaddresses：getaddrinfo 失败：（同时从其他站点获取信息。）

php SQLSTATE[HY000][2002] php_network_getaddresses: getaddrinfo failed: nodename 或 servname 提供，或未知

相关推荐

最近更新

标签