Javascript 使用 jquery 和 ajax 抓取网站
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1936495/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Website scraping using jquery and ajax
提问by Joe
I want to be able to manipulate the html of a given url. Something like html scraping. I know this can be done using curl or some scraping library.But i would like to know if it is possible to use jquery to make a get request to the url using ajax and retrieve the html of the url, and run jquery code on the html returned ?
我希望能够操作给定 url 的 html。像 html 抓取之类的东西。我知道这可以使用 curl 或一些抓取库来完成。 html返回?
Thank You
谢谢你
回答by Alex
I would like to point out that there are situations where it is perfectly acceptable to use jQuery to scrape screens across domains. Windows Sidebar gadgets run in a 'Local Machine Zone' that allows cross domain scripting.
我想指出,在某些情况下,使用 jQuery 跨域抓取屏幕是完全可以接受的。Windows 边栏小工具在允许跨域脚本编写的“本地机器区域”中运行。
And jQuery does have the ability to apply selectors to retreived html content. You just need to add the selector to a load() method's url parameter after a space.
jQuery 确实能够将选择器应用于检索的 html 内容。您只需要在一个空格后将选择器添加到 load() 方法的 url 参数中。
The example gadget code below checks this page every hour and reports the total number of page views.
下面的示例小工具代码每小时检查一次该页面并报告页面浏览的总数。
<html>
<head>
<script type="text/javascript" src="jquery.min.js"></script>
<style>
body {
height: 120px;
width: 130px;
background-color: white;
};
</style>
</head>
<body>
Question Viewed:
<div id="data"></div>
<script type="text/javascript">
var url = "http://stackoverflow.com/questions/1936495/website-scraping-using-jquery-and-ajax"
updateGadget();
inervalID = setInterval("updateGadget();", 60 * 1000);
function updateGadget(){
$(document).ready(function(){
$("#data").load(url + " .label-value:contains('times')");
});
}
</script>
</body>
</html>
回答by Pascal MARTIN
You cannot do Ajax request to another domain-name than the one your website is on, because of the Same Origin Policy; which means you will not be quite able to do what you want... At least directly.
由于同源策略,您不能对网站所在域名以外的其他域名进行 Ajax 请求;这意味着你将无法做你想做的事......至少直接。
A solution would be to :
一个解决方案是:
- have some kind of "proxy" on your own server,
- send your Ajax request to that proxy,
- which, in turn, will fetch the page on the other domain name ; and return it to your JS code as response to the Ajax request.
- 在你自己的服务器上有某种“代理”,
- 将您的 Ajax 请求发送到该代理,
- 反过来,它将获取另一个域名上的页面;并将其作为对 Ajax 请求的响应返回给您的 JS 代码。
This can be done in a couple of lines with almost any language (like PHP, using curl, for instance)... Or you might be able to use some functionnality of your webserver (see mod_proxyand mod_proxy_http, for instance, for Apache)
这可以用几乎任何语言(例如 PHP,例如使用 curl)在几行中完成......或者您可以使用您的网络服务器的某些功能(参见mod_proxy和mod_proxy_http,例如,对于 Apache)
回答by foxybagga
Its not that difficult.
它不是那么困难。
$(document).ready(function() {
baseUrl = "http://www.somedomain.com/";
$.ajax({
url: baseUrl,
type: "get",
dataType: "",
success: function(data) {
//do something with data
}
});
});
I think this can give you a good clue - http://jsfiddle.net/skelly/m4QCt/
我认为这可以给你一个很好的线索 - http://jsfiddle.net/skelly/m4QCt/
回答by knoopx
http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/
http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/
The only problem is that due to security in both Internet Explorer and in FireFox, the XMLHTTPRequest object is not allowed to make cross-domain, cross-protocol, or cross-port requests.
唯一的问题是,由于 Internet Explorer 和 FireFox 中的安全性,不允许 XMLHTTPRequest 对象进行跨域、跨协议或跨端口请求。
回答by Annie
Instead of curl, you could use a tool like Seleniumwhich will automate loading the page in the browser. You can run JavaScript with it.
您可以使用Selenium 之类的工具代替 curl,它会自动在浏览器中加载页面。你可以用它运行 JavaScript。

