php 警告:DOMDocument::loadHTML(): htmlParseEntityRef: 期待 ';' 在实体中,
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1685277/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,
提问by gweg
$html = file_get_contents("http://www.somesite.com/");
$dom = new DOMDocument();
$dom->loadHTML($html);
echo $dom;
throws
投掷
Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,
Catchable fatal error: Object of class DOMDocument could not be converted to string in test.php on line 10
回答by Dewsworld
To evaporate the warning, you can use libxml_use_internal_errors(true)
要蒸发警告,您可以使用 libxml_use_internal_errors(true)
// create new DOMDocument
$document = new \DOMDocument('1.0', 'UTF-8');
// set error level
$internalErrors = libxml_use_internal_errors(true);
// load HTML
$document->loadHTML($html);
// Restore error level
libxml_use_internal_errors($internalErrors);
回答by mattalxndr
I would bet that if you looked at the source of http://www.somesite.com/you would find special characters that haven't been converted to HTML. Maybe something like this:
我敢打赌,如果您查看源代码,http://www.somesite.com/您会发现尚未转换为 HTML 的特殊字符。也许是这样的:
<a href="/script.php?foo=bar&hello=world">link</a>
Should be
应该
<a href="/script.php?foo=bar&hello=world">link</a>
回答by Maanas Royy
$dom->@loadHTML($html);
This is incorrect, use this instead:
这是不正确的,请改用它:
@$dom->loadHTML($html);
回答by user279583
There are 2 errors: the second is because $dom is no string but an object and thus cannot be "echoed". The first error is a warning from loadHTML, caused by invalid syntax of the html document to load (probably an &(ampersand) used as parameter separator and not masked as entity with &).
有两个错误:第二个是因为 $dom 不是字符串而是一个对象,因此不能“回显”。第一个错误是来自 loadHTML 的警告,这是由要加载的 html 文档的无效语法引起的(可能是用作参数分隔符的&(与号),而不是用 & 屏蔽为实体)。
You ignore and supress this error message (not the error, just the message!) by calling the function with the error control operator "@" (http://www.php.net/manual/en/language.operators.errorcontrol.php)
通过使用错误控制运算符“@”(http://www.php.net/manual/en/language.operators.errorcontrol. php)
@$dom->loadHTML($html);
回答by Mike B
The reason for your fatal error is DOMDocumentdoes not have a __toString() method and thus can not be echo'ed.
您致命错误的原因是DOMDocument没有 __toString() 方法,因此无法回显。
You're probably looking for
你可能正在寻找
echo $dom->saveHTML();
回答by Lorenz Lo Sauer
Regardless of the echo (which would need to be replaced with print_r or var_dump), if an exception is thrown the object should stay empty:
不管回声(需要用 print_r 或 var_dump 替换),如果抛出异常,对象应该保持为空:
DOMNodeList Object
(
)
Solution
解决方案
Set
recoverto true, andstrictErrorCheckingto false$content = file_get_contents($url); $doc = new DOMDocument(); $doc->recover = true; $doc->strictErrorChecking = false; $doc->loadHTML($content);Use php's entity-encoding on the markup's contents, which is a most common error source.
设置
recover为真,并strictErrorChecking为假$content = file_get_contents($url); $doc = new DOMDocument(); $doc->recover = true; $doc->strictErrorChecking = false; $doc->loadHTML($content);对标记的内容使用 php 的实体编码,这是最常见的错误源。
回答by David Chan
replace the simple
替换简单的
$dom->loadHTML($html);
with the more robust ...
随着更强大...
libxml_use_internal_errors(true);
if (!$DOM->loadHTML($page))
{
$errors="";
foreach (libxml_get_errors() as $error) {
$errors.=$error->message."<br/>";
}
libxml_clear_errors();
print "libxml errors:<br>$errors";
return;
}
回答by nmwi22
$html = file_get_contents("http://www.somesite.com/");
$dom = new DOMDocument();
$dom->loadHTML(htmlspecialchars($html));
echo $dom;
try this
尝试这个
回答by Nicolas Bouvrette
I know this is an old question, but if you ever want ot fix the malformed '&' signs in your HTML. You can use code similar to this:
我知道这是一个老问题,但是如果您想修复 HTML 中格式错误的“&”符号。您可以使用与此类似的代码:
$page = file_get_contents('http://www.example.com');
$page = preg_replace('/\s+/', ' ', trim($page));
fixAmps($page, 0);
$dom->loadHTML($page);
function fixAmps(&$html, $offset) {
$positionAmp = strpos($html, '&', $offset);
$positionSemiColumn = strpos($html, ';', $positionAmp+1);
$string = substr($html, $positionAmp, $positionSemiColumn-$positionAmp+1);
if ($positionAmp !== false) { // If an '&' can be found.
if ($positionSemiColumn === false) { // If no ';' can be found.
$html = substr_replace($html, '&', $positionAmp, 1); // Replace straight away.
} else if (preg_match('/&(#[0-9]+|[A-Z|a-z|0-9]+);/', $string) === 0) { // If a standard escape cannot be found.
$html = substr_replace($html, '&', $positionAmp, 1); // This mean we need to escape the '&' sign.
fixAmps($html, $positionAmp+5); // Recursive call from the new position.
} else {
fixAmps($html, $positionAmp+1); // Recursive call from the new position.
}
}
}
回答by lastYorsh
Another possibile solution is
另一种可能的解决方案是
$sContent = htmlspecialchars($sHTML);
$oDom = new DOMDocument();
$oDom->loadHTML($sContent);
echo html_entity_decode($oDom->saveHTML());

