使用 node.js 和请求提取所有超链接(来自外部网站)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15343292/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
extract all hyperlinks ( from external website ) using node.js and request
提问by Michael Moeller
Right now our app writes the source code of nodejs.org to the console.
We'd like it to write all hyperlinks of nodejs.org instead.
Maybe we need just one line of code to get the links from body.
现在我们的应用程序将 nodejs.org 的源代码写入控制台。我们希望它改为编写 nodejs.org 的所有超链接。也许我们只需要一行代码就可以从body.
app.js:
应用程序.js:
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');
var request = require("request");
request("http://nodejs.org/", function (error, response, body) {
if (!error)
console.log(body);
else
console.log(error);
});
回答by user568109
You are probably looking for either jsdom, jqueryor cheerio. What you are doing is called screen scraping, extracting data from a site. jsdom/jquery offer complete set of tools but cheerio is much faster.
您可能正在寻找jsdom、jquery或cheerio。您正在做的称为屏幕抓取,即从站点中提取数据。jsdom/jquery 提供了完整的工具集,但cheerio 更快。
Here is a cheerio example :
这是一个cheerio示例:
var request = require('request');
var cheerio = require('cheerio');
var searchTerm = 'screen+scraping';
var url = 'http://www.bing.com/search?q=' + searchTerm;
request(url, function(err, resp, body){
$ = cheerio.load(body);
links = $('a'); //jquery get all hyperlinks
$(links).each(function(i, link){
console.log($(link).text() + ':\n ' + $(link).attr('href'));
});
});
You choose whatever is best for you.
你选择最适合你的。

