使用 SQLite3 + Node.js 的最佳实践

Question

提问by Chris Wilson

I've got a modest Node.js script that pulls down data from Wikipedia via the API and stores it in a SQLite database. I'm using this node-sqlite3module.

我有一个普通的 Node.js 脚本，它通过 API 从维基百科中提取数据并将其存储在 SQLite 数据库中。我正在使用这个node-sqlite3模块。

In some cases, I'm pulling down data on upward of 600,000 articles and storing some metadata about each one in a row in the db. The articles are retrieved in groups of 500 from the API.

在某些情况下，我会提取多达 600,000 篇文章的数据，并在数据库中连续存储有关每篇文章的一些元数据。文章以 500 为一组从 API 中检索。

The request that retrieves the JSON object with the data on the 500 articles passes the object to this callback:

检索包含 500 篇文章数据的 JSON 对象的请求将该对象传递给此回调：

//(db already instantiated as 'new sqlite.Database("wikipedia.sqlite");')

function callback(articles) {
    articles.forEach(function(article) {
        db.run("INSERT OR IGNORE INTO articles (name, id, created) VALUES (?,?,?)", [article["title"], article["pageid"], article["timestamp"]]);
    });
}

The modules operates by default in parallel, but the documentation for node-sqlite3 includes one example of serial operations like so:

这些模块默认并行运行，但 node-sqlite3 的文档包含一个串行操作示例，如下所示：

db.serialize(function() {
  db.run("CREATE TABLE lorem (info TEXT)");

  var stmt = db.prepare("INSERT INTO lorem VALUES (?)");
  for (var i = 0; i < 10; i++) {
    stmt.run("Ipsum " + i);
  }
  stmt.finalize();
}

I tried to imitate this and saw almost no performance difference. Am I doing it wrong? Right now, the data retrieves from the API much faster than it writes to the DB, though it's not intolerably slow. But pummeling the DB with 600K individual INSERTcommands feels clumsy.

我试图模仿这一点，但几乎看不到性能差异。我做错了吗？现在，从 API 中检索数据的速度比写入数据库的速度要快得多，尽管速度并不慢得令人无法忍受。但是用 60 万条单独的INSERT命令打击 DB感觉很笨拙。

UPDATE: Per accepted answer, this appears to work for node-sqlite3, in lieu of a native solution. (See this Issue).

更新：根据接受的答案，这似乎适用于 node-sqlite3，而不是本机解决方案。（见本期）。

    db.run("BEGIN TRANSACTION");
function callback(articles) {
        articles.forEach(function(article) {
            db.run("INSERT OR IGNORE INTO articles (name, id, created) VALUES (?,?,?)", [article["title"], article["pageid"], article["timestamp"]]);
        });
    }
    db.run("END");

Answer 1

回答by Robert Harvey

When you are doing several insertions into a SQLite database, you need to wrap the collection of insertions into a transaction. Otherwise, SQLite will wait for the disk platters to spin completely around for each insert, while it does a read-after-write verify for each record that you insert.

当您对 SQLite 数据库进行多次插入时，您需要将插入的集合包装到一个事务中。否则，SQLite 将等待磁盘盘片在每次插入时完全旋转，同时对您插入的每条记录进行写后读验证。

At 7200 RPM, it takes about 1/60th of a second for the disk platter to spin around again, which is an eternity in computer time.

在 7200 RPM 下，磁盘盘片再次旋转大约需要 1/60 秒，这在计算机时间中是永恒的。

使用 SQLite3 + Node.js 的最佳实践

提问by Chris Wilson

回答by Robert Harvey

相关推荐

最近更新

标签

使用 SQLite3 + Node.js 的最佳实践

提问by Chris Wilson

回答by Robert Harvey

相关推荐

node.js 节点js错误：产生ENOENT

node.js 如何在没有我自己的注册表的情况下安装私有 NPM 模块？

node.js 将已解决的承诺值传递到最终的“then”链的最佳方法是什么

Node.js - 使用异步库 - async.foreach 与对象

相关推荐

最近更新

标签