PHP DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: 实体中没有名称

Question

提问by David

I trying to get the "link" elements from certain webpages. I can't figure out what i'm doing wrong though. I'm getting the following error:

我试图从某些网页中获取“链接”元素。我无法弄清楚我做错了什么。我收到以下错误：

Severity: Warning
Message: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 536
Filename: controllers/test.php
Line Number: 34

严重性：警告
消息：DOMDocument::loadHTML() [domdocument.loadhtml]：htmlParseEntityRef：实体中没有名称，行：536
文件名：controllers/test.php
行号：34

Line 34 is the following in the code:

代码中的第 34 行如下：

      $dom->loadHTML($html);

      $dom->loadHTML($html);

my code:

我的代码：

            $url = "http://www.amazon.com/";

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    if($html = curl_exec($ch)){

        // parse the html into a DOMDocument
        $dom = new DOMDocument();

        $dom->recover = true;
        $dom->strictErrorChecking = false;

        $dom->loadHTML($html);

        $hrefs = $dom->getElementsByTagName('a');

        echo "<pre>";
        print_r($hrefs);
        echo "</pre>";

        curl_close($ch);


    }else{
        echo "The website could not be reached.";
    }

Answer 1

回答by Kris

It means some of the HTML code is invalid. THis is just a warning, not an error. Your script will still process it. To suppress the warnings set

这意味着某些 HTML 代码无效。这只是一个警告，而不是一个错误。您的脚本仍将处理它。抑制警告集

 libxml_use_internal_errors(true);

Or you could just completely suppress the warning by doing

或者您可以通过执行以下操作来完全抑制警告

@$dom->loadHTML($html);

Answer 2

回答by Ujjwal Singh

This may be caused by a rogue &symbol that is immediately succeeded by a proper tag. As otherwise you would receive a missing ;error. See: Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,.

这可能是由一个流氓&符号引起的，该符号立即被正确的标记接住。否则你会收到一个丢失的;错误。请参阅：警告：DOMDocument::loadHTML(): htmlParseEntityRef: 期望 ';' 在实体中，。

The solution is to - replace the &symbol with &
or if you must have that &as it is then, may beyou could enclose it in: <![CDATA[- ]]>

解决的办法是-更换&用符号&
，或者如果你必须有&，因为它是的话，可能是你可以在它括：<![CDATA[-]]>

Answer 3

回答by DeltaLee

The HTML is poorly formed. If formed poorly enough loading the HTML into the DOM Document might even fail. If loadHTML is not working then suppressing the errors is pointless. I suggest using a tool like HTML Tidy to "clean up" the poorly formed HTML if you are unable to load the HTML into the DOM.

HTML 格式不佳。如果格式不够好，将 HTML 加载到 DOM 文档甚至可能会失败。如果 loadHTML 不起作用，那么抑制错误是没有意义的。如果您无法将 HTML 加载到 DOM 中，我建议使用像 HTML Tidy 这样的工具来“清理”格式不佳的 HTML。

HTML Tidy can be found here http://www.htacg.org/tidy-html5/

HTML Tidy 可以在这里找到http://www.htacg.org/tidy-html5/

PHP DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: 实体中没有名称

提问by David

回答by Kris

回答by Ujjwal Singh

回答by DeltaLee

相关推荐

最近更新

标签

PHP DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: 实体中没有名称

提问by David

回答by Kris

回答by Ujjwal Singh

回答by DeltaLee

相关推荐

PHP - 如何让我的用户保持登录状态？

php Zend Framework 1 与 Zend Framework 2 性能对比

php 下载与存储名称不同的文件

php SQLSTATE[HY093]：无效的参数号：绑定变量的数量与第 102 行的标记数量不匹配

相关推荐

最近更新

标签