MySQL JDBC批量插入性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2993251/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:14:07  来源:igfitidea点击:

JDBC batch insert performance

mysqlperformancejdbcbatch-file

提问by Bertil Chapuis

I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?

我需要在 mysql 数据库中插入几亿条记录。我一次批量插入 100 万个。请在下面查看我的代码。它似乎很慢。有什么办法可以优化吗?

try {
        // Disable auto-commit
        connection.setAutoCommit(false);

        // Create a prepared statement
        String sql = "INSERT INTO mytable (xxx), VALUES(?)";
        PreparedStatement pstmt = connection.prepareStatement(sql);

        Object[] vals=set.toArray();
        for (int i=0; i<vals.length; i++) {
            pstmt.setString(1, vals[i].toString());
            pstmt.addBatch();
        }

        // Execute the batch
        int [] updateCounts = pstmt.executeBatch();
        System.out.append("inserted "+updateCounts.length);

回答by Bertil Chapuis

I had a similar performance issue with mysql and solved it by setting the useServerPrepStmtsand the rewriteBatchedStatementsproperties in the connection url.

我有一个与 mysql 类似的性能问题,并通过在连接 url 中设置useServerPrepStmtsrewriteBatchedStatements属性来解决它。

Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");

回答by Eran

I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.

我想扩展 Bertil 的回答,因为我一直在试验连接 URL 参数。

rewriteBatchedStatements=trueis the important parameter. useServerPrepStmtsis already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.

rewriteBatchedStatements=true是重要参数。useServerPrepStmts默认情况下已经是 false ,即使将其更改为 true 在批量插入性能方面也没有太大区别。

Now I think is the time to write how rewriteBatchedStatements=trueimproves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch()(Source). That means that instead of sending the following nINSERT statements to the mysql server each time executeBatch()is called :

现在我想是时候写下如何rewriteBatchedStatements=true如此显着地提高性能了。它是通过rewriting of prepared statements for INSERT into multi-value inserts when executeBatch()( Source) 实现的。这意味着不是n每次executeBatch()调用以下INSERT 语句到 mysql 服务器:

INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)

It would send a single INSERT statement :

它会发送一个 INSERT 语句:

INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)

You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.

您可以通过切换 mysql 日志记录 (by SET global general_log = 1)来观察它,它会将发送到 mysql 服务器的每个语句登录到一个文件中。

回答by nos

You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);, you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3);(It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)

您可以使用一个 insert 语句插入多行,一次执行几千行可以大大加快速度,也就是说INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);,您可以这样做INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3);(可能是 JDBC .addBatch() 做了类似的优化)现在 - 尽管 mysql addBatch 过去完全未优化并且只是发出单独的查询 - 我不知道最近的驱动程序是否仍然如此)

If you really need speed, load your data from a comma separated file with LOAD DATA INFILE, we get around 7-8 times speedup doing that vs doing tens of millions of inserts.

如果您确实需要速度,请使用LOAD DATA INFILE从逗号分隔的文件中加载数据,与执行数千万次插入相比,这样做的速度提高了大约 7-8 倍。

回答by Wrikken

If:

如果:

  1. It's a new table, or the amount to be inserted is greater then the already inserted data
  2. There are indexes on the table
  3. You do not need other access to the table during the insert
  1. 这是一个新表,或者插入的数量大于已经插入的数据
  2. 表上有索引
  3. 在插入期间您不需要对表的其他访问

Then ALTER TABLE tbl_name DISABLE KEYScan greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYSto start building the indexes, which can take a while, but not nearly as long as doing it for every insert.

那么ALTER TABLE tbl_name DISABLE KEYS可以大大提高你的插入速度。完成后,运行ALTER TABLE tbl_name ENABLE KEYS以开始构建索引,这可能需要一段时间,但不会像每次插入一样长。

回答by Lalith

You may try using DDBulkLoad object.

您可以尝试使用 DDBulkLoad 对象。

// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);

回答by Hieu HoangChi

try {
        // Disable auto-commit
        connection.setAutoCommit(false);
        int maxInsertBatch = 10000;     
        // Create a prepared statement
        String sql = "INSERT INTO mytable (xxx), VALUES(?)";
        PreparedStatement pstmt = connection.prepareStatement(sql);

        Object[] vals=set.toArray();
        int count = 1;
        for (int i=0; i<vals.length; i++) {
            pstmt.setString(1, vals[i].toString());
            pstmt.addBatch();
            if(count%maxInsertBatch == 0){
                 pstmt.executeBatch();
            }
            count++;
        }

        // Execute the batch
        pstmt.executeBatch();
        System.out.append("inserted "+count);