php DOMDocument::loadHTML(): 警告 - htmlParseEntityRef: 实体中没有名称
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/14648442/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
DOMDocument::loadHTML(): warning - htmlParseEntityRef: no name in Entity
提问by David Gard
I have found several similar questions, but so far, none have been able to help me.
我发现了几个类似的问题,但到目前为止,没有一个能够帮助我。
I am trying to output the 'src' of all images in a block of HTML, so I'm using DOMDocument(). This method is actully working, but I'm getting a warning on some pages, and I can't figure out why. Some posts suggested surpressing the warning, but I'd much rather find out why the warning is being generated.
我试图在 HTML 块中输出所有图像的“src”,所以我使用DOMDocument(). 这种方法确实有效,但我在某些页面上收到警告,我不知道为什么。一些帖子建议抑制警告,但我更愿意找出生成警告的原因。
Warning: DOMDocument::loadHTML(): htmlParseEntityRef: no name in Entity, line: 10
警告:DOMDocument::loadHTML(): htmlParseEntityRef: 实体中没有名称,第 10 行
One example of post->post_contentthat is generating the error is -
post->post_content产生错误的一个例子是-
On Wednesday 21st November specialist rights of way solicitor Jonathan Cheal of Dyne Drewett will be speaking at the Annual Briefing for Rural Practice Surveyors and Agricultural Valuers in Petersfield.
<br>
Jonathan is one of many speakers during the day and he is specifically addressing issues of public rights of way and village greens.
<br>
Other speakers include:-
<br>
<ul>
<li>James Atrrill, Chairman of the Agricultural Valuers Associates of Hants, Wilts and Dorset;</li>
<li>Martin Lowry, Chairman of the RICS Countryside Policies Panel;</li>
<li>Angus Burnett, Director at Martin & Company;</li>
<li>Esther Smith, Partner at Thomas Eggar;</li>
<li>Jeremy Barrell, Barrell Tree Consultancy;</li>
<li>Robin Satow, Chairman of the RICS Surrey Local Association;</li>
<li>James Cooper, Stnsted Oark Foundation;</li>
<li>Fenella Collins, Head of Planning at the CLA; and</li>
<li>Tom Bodley, Partner at Batcheller Monkhouse</li>
</ul>
I can post some more examples of what post->post_contentcontains if that would be helpful?
post->post_content如果有帮助,我可以发布更多包含内容的示例?
I have allowed access to a development site temporarily, so you can see some examples [Note - links no longer accessable as question has been answered] -
我暂时允许访问开发站点,因此您可以查看一些示例 [注意 - 由于问题已回答,链接不再可访问] -
- Error - http://test.dynedrewett.com/specialist-solicitor-speaks-at-petersfield-update/
 - No error - http://test.dynedrewett.com/restrictive-covenants-in-employment-contracts/
 
- 错误 - http://test.dynedrewett.com/specialist-solicitor-speaks-at-petersfield-update/
 - 没有错误 - http://test.dynedrewett.com/restrictive-covenants-in-employment-contracts/
 
Any tips on how to resolve this? Thanks.
有关如何解决此问题的任何提示?谢谢。
$dom = new DOMDocument();
$dom->loadHTML(apply_filters('the_content', $post->post_content)); // Have tried stripping all tags but <img>, still generates warning
$nodes = $dom->getElementsByTagName('img');
foreach($nodes as $img) :
    $images[] = $img->getAttribute('src');
endforeach;
回答by David Gard
This correct answer comes from a comment from @lonesomeday.
这个正确答案来自@lonesomeday 的评论。
My best guess then is that there is an unescaped ampersand (&) somewhere in the HTML. This will make the parser think we're in an entity reference (e.g. ©). When it gets to ;, it thinks the entity is over. It then realises what it has doesn't conform to an entity, so it sends out a warning and returns the content as plain text.
我最好的猜测是 HTML 中的某处有一个未转义的与号 (&)。这将使解析器认为我们在实体引用中(例如 ©)。当它到达 ; 时,它认为实体已经结束。然后它意识到它所拥有的内容不符合实体,因此它发出警告并将内容作为纯文本返回。
回答by Ka.
As mentionned here
正如这里提到的
Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,
警告:DOMDocument::loadHTML(): htmlParseEntityRef: 期待 ';' 在实体中,
you can use :
您可以使用 :
libxml_use_internal_errors(true);
see http://php.net/manual/en/function.libxml-use-internal-errors.php
见http://php.net/manual/en/function.libxml-use-internal-errors.php
回答by Dhana
Check "&" character in your HTML code anywhere.I had that issue because of that scenario.
在任何地方检查 HTML 代码中的“&”字符。由于这种情况,我遇到了这个问题。
回答by Good Idea
I don't have the reputation required to leave a comment above, but using htmlspecialcharssolved this problem in my case:
我没有在上面发表评论所需的声誉,但htmlspecialchars在我的情况下使用解决了这个问题:
$inputHTML = htmlspecialchars($post->post_content);
$dom = new DOMDocument();
$dom->loadHTML(apply_filters('the_content', $inputHTML)); // Have tried stripping all tags but <img>, still generates warning
$nodes = $dom->getElementsByTagName('img');
foreach($nodes as $img) :
    $images[] = $img->getAttribute('src');
endforeach;
For my purposes, I'm also using strip_tags($inputHTML, "<strong><em><br>"), so all image tags are stripped out as well - I'm not sure if this would be a problem otherwise.
出于我的目的,我也在使用strip_tags($inputHTML, "<strong><em><br>"),因此所有图像标签也都被删除了 - 我不确定这是否会成为问题。
回答by yoorock.fr
I eventually solved this problem the right way, using tidy
我最终以正确的方式解决了这个问题,使用 tidy
// Configuration
$config = array(
    'indent'         => true,
    'output-xhtml'   => true,
    'wrap'           => 200);
// Tidy to avoid errors during load html
$tidy = new tidy;
$tidy->parseString($bill->bill_text, $config, 'utf8');
$tidy->cleanRepair();
$domDocument = new DOMDocument();
$domDocument->loadHTML(mb_convert_encoding($tidy, 'HTML-ENTITIES', 'UTF-8'));
回答by Mike
just replace "&" with "and" in your string. do that for all the other symbols
只需将字符串中的“&”替换为“and”即可。对所有其他符号执行此操作

