MySQL JDBC批量插入性能
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2993251/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JDBC batch insert performance
提问by Bertil Chapuis
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
我需要在 mysql 数据库中插入几亿条记录。我一次批量插入 100 万个。请在下面查看我的代码。它似乎很慢。有什么办法可以优化吗?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
回答by Bertil Chapuis
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmtsand the rewriteBatchedStatementsproperties in the connection url.
我有一个与 mysql 类似的性能问题,并通过在连接 url 中设置useServerPrepStmts和rewriteBatchedStatements属性来解决它。
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
回答by Eran
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
我想扩展 Bertil 的回答,因为我一直在试验连接 URL 参数。
rewriteBatchedStatements=true
is the important parameter. useServerPrepStmts
is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
rewriteBatchedStatements=true
是重要参数。useServerPrepStmts
默认情况下已经是 false ,即使将其更改为 true 在批量插入性能方面也没有太大区别。
Now I think is the time to write how rewriteBatchedStatements=true
improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch()
(Source). That means that instead of sending the following n
INSERT statements to the mysql server each time executeBatch()
is called :
现在我想是时候写下如何rewriteBatchedStatements=true
如此显着地提高性能了。它是通过rewriting of prepared statements for INSERT into multi-value inserts when executeBatch()
( Source) 实现的。这意味着不是n
每次executeBatch()
调用以下INSERT 语句到 mysql 服务器:
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
它会发送一个 INSERT 语句:
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1
) which would log into a file each statement sent to the mysql server.
您可以通过切换 mysql 日志记录 (by SET global general_log = 1
)来观察它,它会将发送到 mysql 服务器的每个语句登录到一个文件中。
回答by nos
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);
, you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3);
(It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
您可以使用一个 insert 语句插入多行,一次执行几千行可以大大加快速度,也就是说INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);
,您可以这样做INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3);
(可能是 JDBC .addBatch() 做了类似的优化)现在 - 尽管 mysql addBatch 过去完全未优化并且只是发出单独的查询 - 我不知道最近的驱动程序是否仍然如此)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE, we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
如果您确实需要速度,请使用LOAD DATA INFILE从逗号分隔的文件中加载数据,与执行数千万次插入相比,这样做的速度提高了大约 7-8 倍。
回答by Wrikken
If:
如果:
- It's a new table, or the amount to be inserted is greater then the already inserted data
- There are indexes on the table
- You do not need other access to the table during the insert
- 这是一个新表,或者插入的数量大于已经插入的数据
- 表上有索引
- 在插入期间您不需要对表的其他访问
Then ALTER TABLE tbl_name DISABLE KEYS
can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS
to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
那么ALTER TABLE tbl_name DISABLE KEYS
可以大大提高你的插入速度。完成后,运行ALTER TABLE tbl_name ENABLE KEYS
以开始构建索引,这可能需要一段时间,但不会像每次插入一样长。
回答by Lalith
You may try using DDBulkLoad object.
您可以尝试使用 DDBulkLoad 对象。
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
回答by Hieu HoangChi
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);