php file_get_contents 不适用于某些网址
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17363545/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
file_get_contents is not working for some url
提问by Parixit
I use file_get_contents
in PHP. In the below code in first URL works fine but the second one isn't working.
我file_get_contents
在 PHP 中使用。在下面的第一个 URL 中的代码工作正常,但第二个不起作用。
$URL = "http://test6473.blogspot.com";
$domain = file_get_contents($URL);
print_r($domain);
$add_url= "http://adfoc.us/1575051";
$add_domain = file_get_contents($add_url);
echo $add_domain;
Any suggestions on why the second one doesn't work?
关于为什么第二个不起作用的任何建议?
回答by Parixit
URL which is not retrieved by file_get_contents, because their server checks whether the request come from browser or any script. If they found request from script they simply disable page contents.
file_get_contents 未检索到的 URL,因为它们的服务器会检查请求是来自浏览器还是来自任何脚本。如果他们发现来自脚本的请求,他们只需禁用页面内容。
So that I have to make a request similar as browser request. So I have used following code to get 2nd url contents. It might be different for different web server. Because they might keep different checks.
所以我必须发出类似于浏览器请求的请求。所以我使用以下代码来获取第二个 url 内容。对于不同的 Web 服务器,它可能会有所不同。因为他们可能会保留不同的支票。
Even though why dont you try to use following code! If you are lucky this might work for you!!
即使您为什么不尝试使用以下代码!如果你幸运的话,这可能对你有用!!
function getUrlContent($url) {
fopen("cookies.txt", "w");
$parts = parse_url($url);
$host = $parts['host'];
$ch = curl_init();
$header = array('GET /1575051 HTTP/1.1',
"Host: {$host}",
'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language:en-US,en;q=0.8',
'Cache-Control:max-age=0',
'Connection:keep-alive',
'Host:adfoc.us',
'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36',
);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
$url = "http://adfoc.us/1575051";
$html = getUrlContent($url);
Thanks everyone for the guidance.
谢谢大家的指导。
回答by fquinner
Unfortunately it looks like the second site blocks access from unrecognized browsers. Even using curl from the command line doesn't work:
不幸的是,第二个站点似乎阻止了来自无法识别的浏览器的访问。即使从命令行使用 curl 也不起作用:
curl -I http://adfoc.us/1575051
gives:
给出:
HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Fri, 28 Jun 2013 12:15:40 GMT
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.5.0
Set-Cookie: __cfduid=d7cd1bf18c136a288cc2b36065a3b31f01372421740; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.adfoc.us
CF-RAY: 85a4dc6829e06d0
but no content. Note it returns status 200 so if you check the returned string for boolean === false to see if it failed, it will actually appear as if it has worked.
但没有内容。请注意,它返回状态 200,因此如果您检查返回的字符串是否为 boolean === false 以查看它是否失败,它实际上看起来好像它已经工作了。
If you need to spoof the useragent (and possibly other things) to try and get the url to accept your request, you'll need to take the plunge with the curl libraries and try different combinations to try and get it working. Experimenting to see what works with the curl command line first would also be a good way to reduce development time in investigating this.
如果您需要欺骗用户代理(可能还有其他东西)来尝试获取 url 来接受您的请求,您需要尝试使用 curl 库并尝试不同的组合来尝试让它工作。尝试首先查看 curl 命令行的工作方式也是减少调查此问题的开发时间的好方法。
Here's someone who has been through this before:
这是以前经历过的人:
php curl: how can i emulate a get request exactly like a web browser?
回答by 13DaGGeR
looks like the second url answers too slow sometimes, maybe have redirects. try to use curl and set bigger timeout. also, turn errors on
看起来第二个 url 有时回答太慢,可能有重定向。尝试使用 curl 并设置更大的超时。另外,打开错误
error_reporting(-1);
ini_set('display_errors','On');
回答by Deepak dev
you can try this code also
你也可以试试这个代码
<?php
function getUrlContent($url) {
$parts = parse_url($url);
$host = $parts['host'];
$ch = curl_init();
$header = array('GET /1575051 HTTP/1.1',
"Host: {$host}",
'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language:en-US,en;q=0.8',
'Cache-Control:max-age=0',
'Connection:keep-alive',
'Host:adfoc.us',
'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36',
);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
$url = "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en";
$html = getUrlContent($url);
$xml = simplexml_load_string($html);
$json = json_encode($xml);
$array = json_decode($json,TRUE);
print_r($array);
?>