使用 PHP Simple HTML DOM 解析器的奇怪错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6832197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Weird error using PHP Simple HTML DOM parser
提问by Tsundoku
I am using this library (PHP Simple HTML DOM parser) to parse a link, here's the code:
我正在使用这个库(PHP Simple HTML DOM parser)来解析一个链接,代码如下:
function getSemanticRelevantKeywords($keyword){
$results = array();
$html = file_get_html("http://www.semager.de/api/keyword.php?q=". urlencode($keyword) ."&lang=de&out=html&count=2&threshold=");
foreach($html->find('span') as $e){
$results[] = $e->plaintext;
}
return $results;
}
but I am getting this error when I output the results:
但是当我输出结果时出现此错误:
Fatal error: Call to a member function find() on a non-object in /var/www/vhosts/efamous.de/subdomains/sandbox/httpdocs/getNewTrusts.php on line 25
致命错误:在第 25 行的 /var/www/vhosts/ecious.de/subdomains/sandbox/httpdocs/getNewTrusts.php 中的非对象上调用成员函数 find()
(line 25 is the foreach loop), the odd thing is that it outputs everything (at least seemingly) correctly but I still get that error and can't figure out why.
(第 25 行是 foreach 循环),奇怪的是它输出了所有内容(至少看起来是正确的),但我仍然得到那个错误并且不知道为什么。
采纳答案by Jim
This error usually means that $html isn't an object.
这个错误通常意味着 $html 不是一个对象。
It's odd that you say this seems to work. What happens if you output $html? I'd imagine that the url isn't available and that $html is null.
你说这似乎有效,这很奇怪。如果输出 $html 会发生什么?我想网址不可用并且 $html 为空。
Edit: Looks like this may be an error in the parser. Someone has submitted a bugand added a check in his code as a workaround.
编辑:看起来这可能是解析器中的错误。有人提交了一个错误并在他的代码中添加了一个检查作为解决方法。
回答by Sagar Shetty
The reason for this error is: the simple HTML DOM does not return the object if the size of the response from url is greater than 600000.
You can void it by changing the simple_html_dom.php
file. Remove strlen($contents) > MAX_FILE_SIZE
from the if
condition of the file_get_html
function.
This will solve your issue.
这个错误的原因是:如果来自 url 的响应大小大于 600000,简单的 HTML DOM 不会返回对象。
您可以通过更改simple_html_dom.php
文件来取消它。strlen($contents) > MAX_FILE_SIZE
从函数的if
条件中删除file_get_html
。
这将解决您的问题。
回答by LAMPHONGPAUL
You just need to increase CONSTANT MAX_FILE_SIZE
in file simple_html_dom.php.
你只需要增加CONSTANT MAX_FILE_SIZE
文件simple_html_dom.php。
For example:
例如:
define('MAX_FILE_SIZE', 999999999999999);
回答by trante
Before file_get_html/load_file
method, you should first check if URL exists or not.
在file_get_html/load_file
方法之前,您应该首先检查 URL 是否存在。
If the URL exists, you pass one step.
(Some servers, service a 404 page a valid HTML page. which has propriate HTML page structure like body, head, etc. But it has only text "This page couldn'!t find. 404 error bla bla..)
如果 URL 存在,则通过一个步骤。
(某些服务器为 404 页面提供有效的 HTML 页面。它具有适当的 HTML 页面结构,如正文、头部等。但它只有文本“此页面找不到!404 错误 bla bla ..)
If URL is 200-OK, then you should check whether fetched thing is object and whether nodes are set.
如果 URL 为 200-OK,则应检查获取的事物是否为对象以及是否设置了节点。
That's the code i used in my pages.
这是我在页面中使用的代码。
function url_exists($url){
if ((strpos($url, "http")) === false) $url = "http://" . $url;
$headers = @get_headers($url);
// print_r($headers);
if (is_array($headers)){
if(strpos($headers[0], '404 Not Found'))
return false;
else
return true;
}
else
return false;
}
$pageAddress='http://www.google.com';
if ( url_exists($pageAddress) ) {
$htmlPage->load_file( $pageAddress );
} else {
echo 'url doesn t exist, i stop';
return;
}
if( $htmlPage && is_object($htmlPage) && isset($htmlPage->nodes) )
{
// do your work here...
} else {
echo 'fetched page is not ok, i stop';
return;
}
回答by futtta
For those arriving here via a search engine (as I did), after reading the info (and linked bug-report) above, I started some code-prodding and ended up fixing my problems with 2 extra checks after loading the dom;
对于那些通过搜索引擎到达这里的人(就像我一样),在阅读了上面的信息(和链接的错误报告)之后,我开始了一些代码生产,并在加载 dom 后通过 2 次额外检查解决了我的问题;
$html = file_get_html('<your url here>');
// first check if $html->find exists
if (method_exists($html,"find")) {
// then check if the html element exists to avoid trying to parse non-html
if ($html->find('html')) {
// and only then start searching (and manipulating) the dom
}
}
回答by Eric Strom
I'm having the same error come up in my logs and apart from the solutions mentioned above, it could also be that there is no 'span' in the document. I get the same error when searching for divs with a particular class that doesn't exist on the page, but when searching for something that I know exists on the page, the error doesn't pop up.
我的日志中出现了同样的错误,除了上面提到的解决方案之外,还可能是文档中没有“跨度”。使用页面上不存在的特定类搜索 div 时,我遇到相同的错误,但是当搜索我知道页面上存在的内容时,错误不会弹出。
回答by Tudor
your script is OK. I receive this error when it doase not find the element that i'm looking for on that page.
你的脚本没问题。当它在该页面上找不到我正在寻找的元素时,我会收到此错误。
In your case, please check if the page that you are accessing it has 'SPAN' element
在您的情况下,请检查您正在访问的页面是否具有“SPAN”元素
回答by Cesar Bielich
Simplest solution to this problem
这个问题的最简单的解决方案
if ($html = file_get_html("http://www.semager.de/api/keyword.php?q=". urlencode($keyword) ."&lang=de&out=html&count=2&threshold=") {
} else {
// do something else because couldn't find html
}
回答by toopay
Error means, the find() function is either not defined yet or not available. Make sure you have loaded or include related function.
错误意味着 find() 函数尚未定义或不可用。确保您已加载或包含相关功能。