php 使用php从url获取内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2176180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get content from a url using php
提问by Pankaj Khurana
I want to get the dynamic contents from a particular url:
我想从特定 url 获取动态内容:
I have used the code
我已经使用了代码
echo $content=file_get_contents('http://www.punoftheday.com/cgi-bin/arandompun.pl');
I am getting following results:
我得到以下结果:
document.write('"Bakers have a great knead to make bread."
') document.write('? 1996-2007 Pun of the Day.com
')
How can i get the string Bakers have a great knead to make bread.Only string inside first document.write will change, other code will remain constant
我怎样才能得到绳子 面包师可以很好地揉捏做面包。只有第一个 document.write 中的字符串会发生变化,其他代码将保持不变
Regards,
问候,
Pankaj
潘卡伊
回答by Pekka
You are fetching a JavaScript snippet that is supposed to be built in directly into the document, not queried by a script. The code inside is JavaScript.
您正在获取一个 JavaScript 片段,该片段应该直接内置到文档中,而不是由脚本查询。里面的代码是 JavaScript。
You could pull out the code using a regular expression, but I would advise against it. First, it's probably not legal to do. Second, the format of the data they serve can change any time, breaking your script.
您可以使用正则表达式提取代码,但我建议不要这样做。首先,这样做可能不合法。其次,它们提供的数据格式可以随时更改,从而破坏您的脚本。
I think you should take at their RSS feed. You can parse that programmatically way easier than the JavaScript.
我认为你应该接受他们的 RSS 提要。您可以比 JavaScript 更容易地以编程方式解析该内容。
Check out this question on how to do that: Best way to parse RSS/Atom feeds with PHP
查看有关如何执行此操作的问题:Best way to parse RSS/Atom feeds with PHP
回答by T.Todua
1) several local methods
1)几种局部方法
<?php
echo readfile("http://example.com/"); //needs "Allow_url_include" enabled
echo include("http://example.com/"); //needs "Allow_url_include" enabled
echo file_get_contents("http://example.com/");
echo stream_get_contents(fopen('http://example.com/', "rb")); //you may use "r" instead of "rb" //needs "Allow_url_fopen" enabled
?>
2) Better Way is CURL:
2)更好的方法是 CURL:
echo get_remote_data('http://example.com'); // GET request
echo get_remote_data('http://example.com', "var2=something&var3=blabla" ); // POST request
//============= https://github.com/tazotodua/useful-php-scripts/ ===========
function get_remote_data($url, $post_paramtrs=false) { $c = curl_init();curl_setopt($c, CURLOPT_URL, $url);curl_setopt($c, CURLOPT_RETURNTRANSFER, 1); if($post_paramtrs){curl_setopt($c, CURLOPT_POST,TRUE); curl_setopt($c, CURLOPT_POSTFIELDS, "var1=bla&".$post_paramtrs );} curl_setopt($c, CURLOPT_SSL_VERIFYHOST,false);curl_setopt($c, CURLOPT_SSL_VERIFYPEER,false);curl_setopt($c, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:33.0) Gecko/20100101 Firefox/33.0"); curl_setopt($c, CURLOPT_COOKIE, 'CookieName1=Value;'); curl_setopt($c, CURLOPT_MAXREDIRS, 10); $follow_allowed= ( ini_get('open_basedir') || ini_get('safe_mode')) ? false:true; if ($follow_allowed){curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);}curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 9);curl_setopt($c, CURLOPT_REFERER, $url);curl_setopt($c, CURLOPT_TIMEOUT, 60);curl_setopt($c, CURLOPT_AUTOREFERER, true); curl_setopt($c, CURLOPT_ENCODING, 'gzip,deflate');$data=curl_exec($c);$status=curl_getinfo($c);curl_close($c);preg_match('/(http(|s)):\/\/(.*?)\/(.*\/|)/si', $status['url'],$link);$data=preg_replace('/(src|href|action)=(\'|\")((?!(http|https|javascript:|\/\/|\/)).*?)(\'|\")/si','='.$link[0].'', $data);$data=preg_replace('/(src|href|action)=(\'|\")((?!(http|https|javascript:|\/\/)).*?)(\'|\")/si','='.$link[1].'://'.$link[3].'', $data);if($status['http_code']==200) {return $data;} elseif($status['http_code']==301 || $status['http_code']==302) { if (!$follow_allowed){if(empty($redirURL)){if(!empty($status['redirect_url'])){$redirURL=$status['redirect_url'];}} if(empty($redirURL)){preg_match('/(Location:|URI:)(.*?)(\r|\n)/si', $data, $m);if (!empty($m[2])){ $redirURL=$m[2]; } } if(empty($redirURL)){preg_match('/href\=\"(.*?)\"(.*?)here\<\/a\>/si',$data,$m); if (!empty($m[1])){ $redirURL=$m[1]; } } if(!empty($redirURL)){$t=debug_backtrace(); return call_user_func( $t[0]["function"], trim($redirURL), $post_paramtrs);}}} return "ERRORCODE22 with $url!!<br/>Last status codes<b/>:".json_encode($status)."<br/><br/>Last data got<br/>:$data";}
NOTICE:It automatically handles FOLLOWLOCATION problem + Remote urls are automatically re-corrected! ( src="./imageblabla.png" --------> src="http://example.com/path/imageblabla.png" )
注意:它会自动处理 FOLLOWLOCATION 问题 + 远程 url 会自动重新更正!( src="./imageblabla.png" --------> src="http://example.com/path/imageblabla.png" )
p.s.on GNU/Linux distro servers, you might need to install the php5-curlpackage to use it.
pson GNU/Linux 发行版服务器,您可能需要安装该php5-curl软件包才能使用它。
回答by Luca Matteis
Pekka's answer is probably the best way of doing this. But anyway here's the regex you might want to use in case you find yourself doing something like this, and can't rely on RSS feeds etc.
Pekka 的回答可能是最好的方法。但无论如何,这是您可能想要使用的正则表达式,以防您发现自己在做这样的事情,并且不能依赖 RSS 提要等。
document\.write\(' // start tag
([^)]*) // the data to match
'\) // end tag
EDITfor example:
编辑例如:
<?php
$subject = "document.write('"Paying for college is often a matter of in-tuition."<br />')\ndocument.write('<i>© 1996-2007 <a target=\"_blank\" href=\"http://www.punoftheday.com\">Pun of the Day.com</a></i><br />')";
$pattern = "/document\.write\('([^)]*)'\)/";
preg_match($pattern, $subject, $matches);
print_r($matches);
?>

