使用 node.js 和请求提取所有超链接（来自外部网站）

Question

提问by Michael Moeller

Right now our app writes the source code of nodejs.org to the console. We'd like it to write all hyperlinks of nodejs.org instead. Maybe we need just one line of code to get the links from body.

现在我们的应用程序将 nodejs.org 的源代码写入控制台。我们希望它改为编写 nodejs.org 的所有超链接。也许我们只需要一行代码就可以从body.

app.js:

应用程序.js：

var http = require('http');

http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');

var request = require("request");



request("http://nodejs.org/", function (error, response, body) {
    if (!error)
        console.log(body);
    else
        console.log(error);
});

Answer 1

回答by user568109

You are probably looking for either jsdom, jqueryor cheerio. What you are doing is called screen scraping, extracting data from a site. jsdom/jquery offer complete set of tools but cheerio is much faster.

您可能正在寻找jsdom、jquery或cheerio。您正在做的称为屏幕抓取，即从站点中提取数据。jsdom/jquery 提供了完整的工具集，但cheerio 更快。

Here is a cheerio example :

这是一个cheerio示例：

var request = require('request');
var cheerio = require('cheerio');
var searchTerm = 'screen+scraping';
var url = 'http://www.bing.com/search?q=' + searchTerm;
request(url, function(err, resp, body){
  $ = cheerio.load(body);
  links = $('a'); //jquery get all hyperlinks
  $(links).each(function(i, link){
    console.log($(link).text() + ':\n  ' + $(link).attr('href'));
  });
});

You choose whatever is best for you.

你选择最适合你的。

使用 node.js 和请求提取所有超链接（来自外部网站）

提问by Michael Moeller

回答by user568109

相关推荐

最近更新

标签

使用 node.js 和请求提取所有超链接（来自外部网站）

提问by Michael Moeller

回答by user568109

相关推荐

如何在 Node.js 中解析包含“NaN”的 JSON 字符串

node.js Express.js 的仅会话 cookie

node.js AngularJS 仅适用于单页应用程序 (SPA) 吗？

是否可以为 NodeJS 应用程序设置基本 URL？

相关推荐

最近更新

标签