php Dom loadHTML 在服务器上不能正常工作
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11819603/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Dom loadHTML doesn't work properly on a server
提问by LuZ
I run the code first on MAMP and it worked very well. But when I tried to run the code on another server, I got a lot of warnings like:
我首先在 MAMP 上运行代码,它运行得非常好。但是当我尝试在另一台服务器上运行代码时,我收到了很多警告,例如:
Warning: DOMDocument::loadHTML(): Unexpected end tag : head in Entity, line: 3349 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17 Warning: DOMDocument::loadHTML(): htmlParseStartTag: misplaced tag in Entity, line: 3350 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17 Warning: DOMDocument::loadHTML(): Tag header invalid in Entity, line: 3517 in /cgihome/zhang1/html/cgi-bin/getPrice.php on line 17
警告:DOMDocument::loadHTML():意外的结束标记:实体中的头部,第 3349 行在 /cgihome/zhang1/html/cgi-bin/getPrice.php 第 17 行警告:DOMDocument::loadHTML():htmlParseStartTag:错位实体中的标记,第 17 行 /cgihome/zhang1/html/cgi-bin/getPrice.php 中的 3350 警告:DOMDocument::loadHTML():实体中的标记标头无效,行:/cgihome/zhang1/html 中的 3517 /cgi-bin/getPrice.php 第 17 行
The codes are following:
代码如下:
<?php
$amazon = file_get_contents('http://www.amazon.com/blablabla');
$doc = new DOMdocument();
$doc->loadHTML($amazon);
$doc->saveHTML();
$price = $doc -> getElementById('actualPriceValue')->textContent;
$ASIN = $doc -> getElementById('ASIN')->getAttribute('value');
?>
Anyone knows what's going on? Thanks!
有谁知道发生了什么?谢谢!
回答by hakre
To disable the warning, you can use
要禁用警告,您可以使用
libxml_use_internal_errors(true);
This works for me. Manual
这对我有用。手动的
Background: You are loading invalid HTML. Invalid HTML is quite common, DOMDocument::loadHTMLcorrects most of the problems, but gives warnings by default.
背景:您正在加载无效的 HTML。无效的 HTML 很常见,DOMDocument::loadHTML纠正了大部分问题,但默认情况下会给出警告。
With libxml_use_internal_errorsyou can control that behavior. Set it before loading the document:
随着libxml_use_internal_errors您可以控制的行为。在加载文档之前设置它:
libxml_use_internal_errors(true);
$doc->loadHTML($amazon);
回答by Pascal
This problemis related to non xHTMLcode
此问题与非xHTML代码有关
As DOMdocument() can only process clean XHTMLyou need to clean up your code
由于DOMdocument() 只能处理干净的 XHTML,因此您需要清理代码
Php have an extension that does the job pretty well. Called Tidy php.net/book.tidy
PHP 有一个扩展可以很好地完成这项工作。称为整洁 php.net/book.tidy
It might be tricky as you may need to enableit in your php.ini
这可能很棘手,因为您可能需要在php.ini 中启用它
Then
然后
$tidy_config = array(
'clean' => true,
'output-xhtml' => true,
'show-body-only' => true,
'wrap' => 0,
);
$tidy = tidy_parse_string( $html, $tidy_config, 'UTF8');
$tidy->cleanRepair();
$doc = new DOMdocument();
$doc->loadHTML( (string) $tidy);
回答by Aminah Nuraini
You can surpress the warning like this:
您可以像这样抑制警告:
@$doc->loadHTML($amazon);

