如何在 PHP 中获取网页的 HTML 代码?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/819182/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 23:57:38  来源:igfitidea点击:

How do I get the HTML code of a web page in PHP?

phphtml

提问by Prashant

I want to retrieve the HTML code of a link (web page) in PHP. For example, if the link is

我想在 PHP 中检索链接(网页)的 HTML 代码。例如,如果链接是

https://stackoverflow.com/questions/ask

https://stackoverflow.com/questions/ask

then I want the HTML code of the page which is served. I want to retrieve this HTML code and store it in a PHP variable.

然后我想要提供的页面的 HTML 代码。我想检索此 HTML 代码并将其存储在 PHP 变量中。

How can I do this?

我怎样才能做到这一点?

回答by Greg

If your PHP server allows url fopen wrappers then the simplest way is:

如果您的 PHP 服务器允许 url fopen 包装器,那么最简单的方法是:

$html = file_get_contents('https://stackoverflow.com/questions/ask');

If you need more control then you should look at the cURLfunctions:

如果您需要更多控制,那么您应该查看cURL函数:

$c = curl_init('https://stackoverflow.com/questions/ask');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
//curl_setopt(... other options you want...)

$html = curl_exec($c);

if (curl_error($c))
    die(curl_error($c));

// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);

回答by Dmitri Pisarev

Also if you want to manipulate the retrieved page somehow, you might want to try some php DOM parser. I find PHP Simple HTML DOM Parservery easy to use.

此外,如果您想以某种方式操作检索到的页面,您可能想尝试一些 php DOM 解析器。我发现PHP Simple HTML DOM Parser非常易于使用。

回答by Ickmund

You may want to check out the YQL libraries from Yahoo: http://developer.yahoo.com/yql

您可能想查看 Yahoo 的 YQL 库:http: //developer.yahoo.com/yql

The task at hand is as simple as

手头的任务很简单

select * from html where url = 'http://stackoverflow.com/questions/ask'

You can try this out in the console at: http://developer.yahoo.com/yql/console(requires login)

您可以在控制台中尝试此操作:http: //developer.yahoo.com/yql/console(需要登录)

Also see Chris Heilmanns screencast for some nice ideas what more you can do: http://developer.yahoo.net/blogs/theater/archives/2009/04/screencast_collating_distributed_information.html

另请参阅 Chris Heilmanns 截屏视频,了解您还可以做什么:http: //developer.yahoo.net/blogs/theater/archives/2009/04/screencast_collat​​ing_distributed_information.html

回答by Stefan Gehrig

Simple way:Use file_get_contents():

简单的方法:使用file_get_contents()

$page = file_get_contents('http://stackoverflow.com/questions/ask');

Please note that allow_url_fopenmust be truein you php.inito be able to use URL-aware fopen wrappers.

请注意,allow_url_fopen一定要true在你php.ini能够使用URL的fopen封装。

More advanced way:If you cannot change your PHP configuration, allow_url_fopenis falseby default and if ext/curl is installed, use the cURLlibraryto connect to the desired page.

更先进的方式:如果你不能改变你的PHP配置,allow_url_fopenfalse在默认情况下,如果安装了分机/卷曲,使用cURL连接到所需的页面。

回答by piglot

you could use file_get_contents if you are wanting to store the source as a variable however curl is a better practive.

如果您想将源存储为变量,则可以使用 file_get_contents,但 curl 是更好的做法。

$url = file_get_contents('http://example.com');
echo $url; 

this solution will display the webpage on your site. However curl is a better option.

此解决方案将在您的网站上显示网页。但是 curl 是更好的选择。

回答by sarath

include_once('simple_html_dom.php');
$url="http://stackoverflow.com/questions/ask";
$html = file_get_html($url);

You can get the whole HTML code as an array (parsed form) using this code Download the 'simple_html_dom.php' file here http://sourceforge.net/projects/simplehtmldom/files/simple_html_dom.php/download

您可以使用此代码将整个 HTML 代码作为数组(解析形式)在此处下载“simple_html_dom.php”文件 http://sourceforge.net/projects/simplehtmldom/files/simple_html_dom.php/download

回答by T.Todua

Here is two different, simple ways to get content from URL:

这是从 URL 获取内容的两种不同的简单方法

1) the first method

1)第一种方法

Enable Allow_url_include from your hosting (php.ini or somewhere)

从您的主机(php.ini 或其他地方)启用 Allow_url_include

<?php
$variableee = readfile("http://example.com/");
echo $variableee;
?> 

or

或者

2)the second method

2)第二种方法

Enable php_curl, php_imap and php_openssl

启用 php_curl、php_imap 和 php_openssl

<?php
// you can add anoother curl options too
// see here - http://php.net/manual/en/function.curl-setopt.php
function get_dataa($url) {
  $ch = curl_init();
  $timeout = 5;
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)");
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,false);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
  curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}

$variableee = get_dataa('http://example.com');
echo $variableee;
?>

回答by Krishnamoorthy Acharya

you can use the DomDocument method to get an individual HTML tag level variable too

您也可以使用 DomDocument 方法来获取单个 HTML 标记级别的变量

$homepage = file_get_contents('https://www.example.com/');
$doc = new DOMDocument;
$doc->loadHTML($homepage);
$titles = $doc->getElementsByTagName('h3');
echo $titles->item(0)->nodeValue;

回答by Ken

$output = file("http://www.example.com");didn't work until I enabled: allow_url_fopen, allow_url_include,and file_uploadsin php.inifor PHP7

$output = file("http://www.example.com");没有工作,直到我启用了:allow_url_fopen, allow_url_include,file_uploadsphp.ini对PHP7