php 使用php从url获取内容

Question

提问by Pankaj Khurana

I want to get the dynamic contents from a particular url:

我想从特定 url 获取动态内容：

I have used the code

我已经使用了代码

echo $content=file_get_contents('http://www.punoftheday.com/cgi-bin/arandompun.pl');

I am getting following results:

我得到以下结果：

document.write('"Bakers have a great knead to make bread."

') document.write('? 1996-2007 Pun of the Day.com
')

How can i get the string Bakers have a great knead to make bread.Only string inside first document.write will change, other code will remain constant

我怎样才能得到绳子 面包师可以很好地揉捏做面包。只有第一个 document.write 中的字符串会发生变化，其他代码将保持不变

Regards,

问候，

Pankaj

潘卡伊

Answer 1

回答by Pekka

You are fetching a JavaScript snippet that is supposed to be built in directly into the document, not queried by a script. The code inside is JavaScript.

您正在获取一个 JavaScript 片段，该片段应该直接内置到文档中，而不是由脚本查询。里面的代码是 JavaScript。

You could pull out the code using a regular expression, but I would advise against it. First, it's probably not legal to do. Second, the format of the data they serve can change any time, breaking your script.

您可以使用正则表达式提取代码，但我建议不要这样做。首先，这样做可能不合法。其次，它们提供的数据格式可以随时更改，从而破坏您的脚本。

I think you should take at their RSS feed. You can parse that programmatically way easier than the JavaScript.

我认为你应该接受他们的 RSS 提要。您可以比 JavaScript 更容易地以编程方式解析该内容。

Check out this question on how to do that: Best way to parse RSS/Atom feeds with PHP

查看有关如何执行此操作的问题：Best way to parse RSS/Atom feeds with PHP

Answer 2

回答by T.Todua

1) several local methods

1）几种局部方法

<?php
echo readfile("http://example.com/");            //needs "Allow_url_include" enabled
echo include("http://example.com/");             //needs "Allow_url_include" enabled
echo file_get_contents("http://example.com/");   
echo stream_get_contents(fopen('http://example.com/', "rb")); //you may use "r" instead of "rb"  //needs "Allow_url_fopen" enabled
?>

2) Better Way is CURL:

2）更好的方法是 CURL：

echo get_remote_data('http://example.com');                                // GET request 
echo get_remote_data('http://example.com', "var2=something&var3=blabla" ); // POST request


//============= https://github.com/tazotodua/useful-php-scripts/ ===========
function get_remote_data($url, $post_paramtrs=false)    {   $c = curl_init();curl_setopt($c, CURLOPT_URL, $url);curl_setopt($c, CURLOPT_RETURNTRANSFER, 1); if($post_paramtrs){curl_setopt($c, CURLOPT_POST,TRUE);  curl_setopt($c, CURLOPT_POSTFIELDS, "var1=bla&".$post_paramtrs );}  curl_setopt($c, CURLOPT_SSL_VERIFYHOST,false);curl_setopt($c, CURLOPT_SSL_VERIFYPEER,false);curl_setopt($c, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; rv:33.0) Gecko/20100101 Firefox/33.0"); curl_setopt($c, CURLOPT_COOKIE, 'CookieName1=Value;'); curl_setopt($c, CURLOPT_MAXREDIRS, 10);  $follow_allowed= ( ini_get('open_basedir') || ini_get('safe_mode')) ? false:true;  if ($follow_allowed){curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);}curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 9);curl_setopt($c, CURLOPT_REFERER, $url);curl_setopt($c, CURLOPT_TIMEOUT, 60);curl_setopt($c, CURLOPT_AUTOREFERER, true);         curl_setopt($c, CURLOPT_ENCODING, 'gzip,deflate');$data=curl_exec($c);$status=curl_getinfo($c);curl_close($c);preg_match('/(http(|s)):\/\/(.*?)\/(.*\/|)/si',  $status['url'],$link);$data=preg_replace('/(src|href|action)=(\'|\")((?!(http|https|javascript:|\/\/|\/)).*?)(\'|\")/si','='.$link[0].'', $data);$data=preg_replace('/(src|href|action)=(\'|\")((?!(http|https|javascript:|\/\/)).*?)(\'|\")/si','='.$link[1].'://'.$link[3].'', $data);if($status['http_code']==200) {return $data;} elseif($status['http_code']==301 || $status['http_code']==302) { if (!$follow_allowed){if(empty($redirURL)){if(!empty($status['redirect_url'])){$redirURL=$status['redirect_url'];}}   if(empty($redirURL)){preg_match('/(Location:|URI:)(.*?)(\r|\n)/si', $data, $m);if (!empty($m[2])){ $redirURL=$m[2]; } } if(empty($redirURL)){preg_match('/href\=\"(.*?)\"(.*?)here\<\/a\>/si',$data,$m); if (!empty($m[1])){ $redirURL=$m[1]; } }   if(!empty($redirURL)){$t=debug_backtrace(); return call_user_func( $t[0]["function"], trim($redirURL), $post_paramtrs);}}} return "ERRORCODE22 with $url!!<br/>Last status codes<b/>:".json_encode($status)."<br/><br/>Last data got<br/>:$data";}

NOTICE:It automatically handles FOLLOWLOCATION problem + Remote urls are automatically re-corrected! ( src="./imageblabla.png" --------> src="http://example.com/path/imageblabla.png" )

注意：它会自动处理 FOLLOWLOCATION 问题 + 远程 url 会自动重新更正！( src="./imageblabla.png" --------> src="http://example.com/path/imageblabla.png" )

p.s.on GNU/Linux distro servers, you might need to install the php5-curlpackage to use it.

pson GNU/Linux 发行版服务器，您可能需要安装该php5-curl软件包才能使用它。

Answer 3

回答by Luca Matteis

Pekka's answer is probably the best way of doing this. But anyway here's the regex you might want to use in case you find yourself doing something like this, and can't rely on RSS feeds etc.

Pekka 的回答可能是最好的方法。但无论如何，这是您可能想要使用的正则表达式，以防您发现自己在做这样的事情，并且不能依赖 RSS 提要等。

document\.write\('      // start tag
([^)]*)                 // the data to match
'\)                     // end tag

EDITfor example:

编辑例如：

<?php
$subject = "document.write('&quot;Paying for college is often a matter of in-tuition.&quot;<br />')\ndocument.write('<i>&copy; 1996-2007 <a target=\"_blank\" href=\"http://www.punoftheday.com\">Pun of the Day.com</a></i><br />')";
$pattern = "/document\.write\('([^)]*)'\)/";
preg_match($pattern, $subject, $matches);
print_r($matches);
?>

php 使用php从url获取内容

提问by Pankaj Khurana

回答by Pekka

回答by T.Todua

回答by Luca Matteis

相关推荐

最近更新

标签

php 使用php从url获取内容

提问by Pankaj Khurana

回答by Pekka

回答by T.Todua

回答by Luca Matteis

相关推荐

在 PHP 中将一种日期格式转换为另一种格式

php 允许跨域ajax请求

ob_start() 和 ob_end_flush() 的 PHP 标头问题

php 地址为“www”的网站不工作

相关推荐

最近更新

标签