postgresql 具有大量查询的节点 postgres
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29100807/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
node-postgres with massive amount of queries
提问by Daniel Sutantyo
I just started playing around with node.js with postgres, using node-postgres. One of the things I tried to do is to write a short js to populate my database, using a file with about 200,000 entries.
我刚刚开始使用带有 postgres 的 node.js,使用 node-postgres。我尝试做的一件事是编写一个简短的 js 来填充我的数据库,使用一个包含大约 200,000 个条目的文件。
I noticed that after sometime (less than 10 seconds), I start to get "Error: Connection terminated". I am not sure whether this is problem with how I use node-postgres, or if it's because I was spamming postgres.
我注意到一段时间后(不到 10 秒),我开始收到“错误:连接终止”。我不确定这是否与我使用 node-postgres 的方式有关,还是因为我在向 postgres 发送垃圾邮件。
Anyway, here is a simple code that shows this behaviour:
无论如何,这是一个显示这种行为的简单代码:
var pg = require('pg');
var connectionString = "postgres://xxxx:xxxx@localhost/xxxx";
pg.connect(connectionString, function(err,client,done){
if(err) {
return console.error('could not connect to postgres', err);
}
client.query("DROP TABLE IF EXISTS testDB");
client.query("CREATE TABLE IF NOT EXISTS testDB (id int, first int, second int)");
done();
for (i = 0; i < 1000000; i++){
client.query("INSERT INTO testDB VALUES (" + i.toString() + "," + (1000000-i).toString() + "," + (-i).toString() + ")", function(err,result){
if (err) {
return console.error('Error inserting query', err);
}
done();
});
}
});
It fails after about 18,000-20,000 queries. Is this the wrong way to use client.query? I tried changing the default client number, but it didn't seem to help.
它在大约 18,000-20,000 次查询后失败。这是使用 client.query 的错误方法吗?我尝试更改默认客户端编号,但似乎没有帮助。
client.connect() doesn't seem to help either, but that was because I had too many clients, so I definitely think client pooling is the way to go.
client.connect() 似乎也没有帮助,但那是因为我有太多的客户端,所以我绝对认为客户端池是要走的路。
Thanks for any help!
谢谢你的帮助!
回答by vitaly-t
UPDATE
更新
This answer has been since superseded with this article: Data Imports, which represents the most up-to-date approach.
此答案已被本文取代:数据导入,它代表了最新的方法。
In order to replicate your scenario I used pg-promiselibrary, and I can confirm that trying it head-on will never work, no matter which library you use, it is the approach that matters.
为了复制您的场景,我使用了pg-promise库,我可以确认正面尝试永远不会奏效,无论您使用哪个库,重要的是方法。
Below is a modified approach where we partition inserts into chunks and then execute each chunk within a transaction, which is load balancing (aka throttling):
下面是一种修改后的方法,我们将插入分成块,然后在事务中执行每个块,这就是负载平衡(又名节流):
function insertRecords(N) {
return db.tx(function (ctx) {
var queries = [];
for (var i = 1; i <= N; i++) {
queries.push(ctx.none('insert into test(name) values()', 'name-' + i));
}
return promise.all(queries);
});
}
function insertAll(idx) {
if (!idx) {
idx = 0;
}
return insertRecords(100000)
.then(function () {
if (idx >= 9) {
return promise.resolve('SUCCESS');
} else {
return insertAll(++idx);
}
}, function (reason) {
return promise.reject(reason);
});
}
insertAll()
.then(function (data) {
console.log(data);
}, function (reason) {
console.log(reason);
})
.done(function () {
pgp.end();
});
This produced 1000,000 records in about 4 minutes, dramatically slowing after the first 3 transactions. I was using Node JS 0.10.38 (64-bit), which consumed about 340MB of memory. This way we inserted 100,000 records, 10 times in a row.
这在大约 4 分钟内产生了 1000,000 条记录,在前 3 次交易后显着放缓。我使用的是 Node JS 0.10.38(64 位),它消耗了大约 340MB 的内存。这样我们插入了 100,000 条记录,连续插入 10 次。
If we do the same, only this time insert 10,000 records within 100 transactions, the same 1,000,000 records are added in just 1m25s, no slowing down, with Node JS consuming around 100MB of memory, which tells us that partitioning data like this is a very good idea.
如果我们这样做,只是这次在 100 个事务中插入 10,000 条记录,同样的 1,000,000 条记录在 1m25s 内添加,没有变慢,Node JS 消耗大约 100MB 内存,这告诉我们像这样对数据进行分区是非常好主意。
It doesn't matter which library you use, the approach should be the same:
无论您使用哪个库,方法都应该相同:
- Partition/throttle your inserts into multiple transactions;
- Keep the list of inserts in a single transaction at around 10,000 records;
- Execute all your transactions in a synchronous chain.
- Release connection back to the pool after each transaction's COMMIT.
- 将您的插入分区/限制为多个事务;
- 将单个事务中的插入列表保持在大约 10,000 条记录;
- 在同步链中执行所有交易。
- 在每个事务的 COMMIT 之后将连接释放回池。
If you break any of those rules, you're guaranteed trouble. For example, if you break rule 3, your Node JS process is likely to run out of memory real quick and throw an error. Rule 4 in my example was provided by the library.
如果你违反了这些规则中的任何一条,你肯定会遇到麻烦。例如,如果您违反规则 3,您的 Node JS 进程可能会很快耗尽内存并引发错误。我的例子中的规则 4 是由图书馆提供的。
And if you follow this pattern, you don't need to trouble yourself with the connection pool settings.
如果您遵循此模式,则无需为连接池设置而烦恼。
UPDATE 1
更新 1
Later versions of pg-promisesupport such scenarios perfectly, as shown below:
后续版本的pg-promise完美支持这样的场景,如下图:
function factory(index) {
if (index < 1000000) {
return this.query('insert into test(name) values()', 'name-' + index);
}
}
db.tx(function () {
return this.batch([
this.none('drop table if exists test'),
this.none('create table test(id serial, name text)'),
this.sequence(factory), // key method
this.one('select count(*) from test')
]);
})
.then(function (data) {
console.log("COUNT:", data[3].count);
})
.catch(function (error) {
console.log("ERROR:", error);
});
and if you do not want to include anything extra, like table creation, then it looks even simpler:
如果你不想包含任何额外的东西,比如表创建,那么它看起来更简单:
function factory(index) {
if (index < 1000000) {
return this.query('insert into test(name) values()', 'name-' + index);
}
}
db.tx(function () {
return this.sequence(factory);
})
.then(function (data) {
// success;
})
.catch(function (error) {
// error;
});
See Synchronous Transactionsfor details.
有关详细信息,请参阅同步事务。
Using Bluebird
as the promise library, for example, it takes 1m43s on my production machine to insert 1,000,000 records (without long stack traces enabled).
使用Bluebird
作为承诺库,例如,它需要我生产的机器上1m43s插入1,000,000记录时,(未启用长堆栈跟踪)。
You would just have your factory
method return requests according to the index
, till you have none left, simple as that.
你只会让你的factory
方法根据 , 返回请求index
,直到你没有剩下的,就这么简单。
And the best part, this isn't just fast, but also creates little load on your NodeJS process. Memory test process stays under 60MB during the entire test, consuming only 7-8% of the CPU time.
最好的部分是,这不仅速度快,而且对您的 NodeJS 进程几乎没有负载。整个测试过程中内存测试过程保持在60MB以下,仅占用7-8%的CPU时间。
UPDATE 2
更新 2
Starting with version 1.7.2, pg-promisesupports super-massive transactions with ease. See chapter Synchronous Transactions.
从 1.7.2 版本开始,pg-promise轻松支持超大规模交易。请参阅同步事务一章。
For example, I could insert 10,000,000 records in a single transaction in just 15 minutes on my home PC, with Windows 8.1 64-bit.
例如,我可以在使用 Windows 8.1 64 位的家用 PC 上在 15 分钟内在单个事务中插入 10,000,000 条记录。
For the test I set my PC to production mode, and used Bluebirdas the promise library. During the test, memory consumption didn't go over 75MB for the entire NodeJS 0.12.5 process (64-bit), while my i7-4770 CPU showed consistent 15% load.
在测试中,我将我的 PC 设置为生产模式,并使用Bluebird作为 Promise 库。在测试期间,整个 NodeJS 0.12.5 进程(64 位)的内存消耗没有超过 75MB,而我的 i7-4770 CPU 显示一致的 15% 负载。
Inserting 100m records the same way would require just more patience, but not more computer resources.
以同样的方式插入 100m 记录只需要更多的耐心,但不需要更多的计算机资源。
In the meantime, the previous test for 1m inserts dropped from 1m43s to 1m31s.
与此同时,之前对 1m 插入的测试从 1m43s 下降到 1m31s。
UPDATE 3
更新 3
The following considerations can make a huge difference: Performance Boost.
以下注意事项可以产生巨大的差异:性能提升。
UPDATE 4
更新 4
Related question, with a better implementation example: Massive inserts with pg-promise.
相关问题,有一个更好的实现示例: Massive inserts with pg-promise。
UPDATE 5
更新 5
A better and newer example can be found here: nodeJS inserting Data into PostgreSQL error
可以在这里找到一个更好和更新的示例:nodeJS inserting Data into PostgreSQL error
回答by Anurag Peshne
I'm guessing that you are reaching max pool size. Since client.query
is asynchronous, prolly all the available connections are used before they are returned.
我猜你正在达到最大池大小。由于client.query
是异步的,因此在返回之前会使用所有可用的连接。
Default Pool size is 10. Check here: https://github.com/brianc/node-postgres/blob/master/lib/defaults.js#L27
默认池大小为 10。请在此处查看:https: //github.com/brianc/node-postgres/blob/master/lib/defaults.js#L27
You can increase default pool size by setting pg.defaults.poolSize
:
您可以通过设置来增加默认池大小pg.defaults.poolSize
:
pg.defaults.poolSize = 20;
Update: Execute another query after freeing a connection.
更新:释放连接后执行另一个查询。
var pg = require('pg');
var connectionString = "postgres://xxxx:xxxx@localhost/xxxx";
var MAX_POOL_SIZE = 25;
pg.defaults.poolSize = MAX_POOL_SIZE;
pg.connect(connectionString, function(err,client,done){
if(err) {
return console.error('could not connect to postgres', err);
}
var release = function() {
done();
i++;
if(i < 1000000)
insertQ();
};
var insertQ = function() {
client.query("INSERT INTO testDB VALUES (" + i.toString() + "," + (1000000-i).toString() + "," + (-i).toString() + ")", function(err,result){
if (err) {
return console.error('Error inserting query', err);
}
release();
});
};
client.query("DROP TABLE IF EXISTS testDB");
client.query("CREATE TABLE IF NOT EXISTS testDB (id int, first int, second int)");
done();
for (i = 0; i < MAX_POOL_SIZE; i++){
insertQ();
}
});
The basic idea is since you are enqueuing a large number of queries with relatively small connection pool size, you are reaching max pool size. Here we make new query only after an existing connection is freed.
基本思想是,由于您使用相对较小的连接池大小对大量查询进行排队,因此您将达到最大池大小。在这里,我们仅在现有连接被释放后才进行新查询。