javascript 从另一个站点获取 HTML 内容

Question

提问by Souza

I would like to dynamically retrieve the html contents from another website, I have the permission of the company.

我想从另一个网站动态检索html内容，我有公司的许可。

Please, don't point me to JSONP, because I can't edit Site A, only Site B

请不要将我指向 JSONP，因为我无法编辑站点 A，只能编辑站点 B

Answer 1

回答by Chris Baker

Because of cross-domain security issues, you won't be able to do this client-side, unless you're content with an iframe.

由于跨域安全问题，您将无法在客户端执行此操作，除非您对iframe.

With PHP, you can use several methods of "scraping" the content. The approach you use depends on whether you need to use cookies in your requests (i.e. the data is behind a login).

使用 PHP，您可以使用多种“抓取”内容的方法。您使用的方法取决于您是否需要在请求中使用 cookie（即数据位于登录后）。

Either way, to start things off on the client side you'll issue a standard AJAX request to your own server:

无论哪种方式，要从客户端开始，您将向您自己的服务器发出标准的 AJAX 请求：

$.ajax({
  type: "POST",
  url: "localProxy.php",
  data: {url: "maybe_send_your_url_here.php?product_id=1"}
}).done(function( html ) {
   // do something with your HTML!
});

If you need cookies set (if the remote site requires login, you need 'em), you're going to use cURL. The full mechanics of logging in with post data and accepting cookies is a little beyond the scope of this answer, but your requests would look something like this:

如果您需要设置 cookie（如果远程站点需要登录，则需要它们），您将使用 cURL。使用发布数据登录和接受 cookie 的完整机制有点超出本答案的范围，但您的请求将如下所示：

$ch = curl_init(); 
curl_setopt ($ch, CURLOPT_URL, 'http://thirdpartydomain.internet/login_url.php'); 
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.jar'); 
curl_setopt ($ch, CURLOPT_POSTFIELDS, 'email='.$username.'&password='.$password); 
curl_setopt ($ch, CURLOPT_POST, 1); 
$result = curl_exec ($ch); 
curl_close($ch);

At that point, you can check the $resultvariable and make sure the login worked. If so, you'd then use cURL to issue anotherrequest to grab the page content. The second request won't have all the post junk, and you'd use the URL that you're trying to fetch. You'd end up with a large string full of HTML.

此时，您可以检查$result变量并确保登录有效。如果是这样，那么您将使用 cURL 发出另一个请求来获取页面内容。第二个请求不会包含所有垃圾帖子，您将使用您尝试获取的 URL。您最终会得到一个充满 HTML 的大字符串。

If you only need a portion of that page's content, you can use the method below to load the string into a DomDocument, use the loadHTMLmethod instead of loadHTMLFile(see below)

如果您只需要该页面的一部分内容，您可以使用下面的方法将字符串加载到 DomDocument 中，使用该loadHTML方法代替loadHTMLFile（见下文）

Speaking of DomDocument, if you don'tneed cookies, then you can use DomDocument directly to fetch the page, skipping cURL:

说到的DomDocument的，如果你没有需要的cookie，那么你可以使用的DomDocument直接抓取页面，跳过卷曲：

$doc = new DOMDocument('1.0', 'UTF-8');
// load the string into the DOM (this is your page's HTML), see below for more info
$doc->loadHTMLFile ('http://third_party_url_here.php?query=string');

// since we are working with HTML fragments here, remove <!DOCTYPE 
$doc->removeChild($doc->firstChild);            

// remove <html></html> and any junk
$body = $doc->getElementsByTagName('body'); 
$doc->replaceChild($body->item(0), $doc->firstChild);

// now, you can get any portion of the html (target a div, for example) using familiar DOM methods

// echo the HTML (or desired portion thereof)
die($doc->saveHTML());

Documentation

文档

HTML iframeon MDN - https://developer.mozilla.org/en/HTML/Element/iframe
jQuery.ajax()- http://api.jquery.com/jQuery.ajax/
PHP's cURL- http://php.net/manual/en/book.curl.php
Curl::set_opt(information about using cookies) - http://www.php.net/manual/en/function.curl-setopt.php
PHP's DomDocument- http://php.net/manual/en/class.domdocument.php
DomDocument::loadHTMLFile- http://www.php.net/manual/en/domdocument.loadhtmlfile.php
DomDocument::loadHTML- http://www.php.net/manual/en/domdocument.loadhtml.php

iframeMDN 上的HTML - https://developer.mozilla.org/en/HTML/Element/iframe
jQuery.ajax()- http://api.jquery.com/jQuery.ajax/
PHP cURL- http://php.net/manual/en/book.curl.php
Curl::set_opt（有关使用 cookie 的信息）- http://www.php.net/manual/en/function.curl-setopt.php
PHP DomDocument- http://php.net/manual/en/class.domdocument.php
DomDocument::loadHTMLFile- http://www.php.net/manual/en/domdocument.loadhtmlfile.php
DomDocument::loadHTML- http://www.php.net/manual/en/domdocument.loadhtml.php

javascript 从另一个站点获取 HTML 内容

提问by Souza

回答by Chris Baker

相关推荐

最近更新

标签

javascript 从另一个站点获取 HTML 内容

提问by Souza

回答by Chris Baker

相关推荐

只需要重置 Javascript 数组的索引

javascript 如果 <td> 包含“a”，则将 <tr> 背景设为红色，但 each() 会变慢

javascript 计算两个 GPS 坐标之间的罗盘方位的问题

javascript 将从 jquery 接收的文件转换为字节数组

相关推荐

最近更新

标签