MySQL JDBC批量插入性能

Question

提问by Bertil Chapuis

I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?

我需要在 mysql 数据库中插入几亿条记录。我一次批量插入 100 万个。请在下面查看我的代码。它似乎很慢。有什么办法可以优化吗？

try {
        // Disable auto-commit
        connection.setAutoCommit(false);

        // Create a prepared statement
        String sql = "INSERT INTO mytable (xxx), VALUES(?)";
        PreparedStatement pstmt = connection.prepareStatement(sql);

        Object[] vals=set.toArray();
        for (int i=0; i<vals.length; i++) {
            pstmt.setString(1, vals[i].toString());
            pstmt.addBatch();
        }

        // Execute the batch
        int [] updateCounts = pstmt.executeBatch();
        System.out.append("inserted "+updateCounts.length);

Answer 1

回答by Bertil Chapuis

I had a similar performance issue with mysql and solved it by setting the useServerPrepStmtsand the rewriteBatchedStatementsproperties in the connection url.

我有一个与 mysql 类似的性能问题，并通过在连接 url 中设置useServerPrepStmts和rewriteBatchedStatements属性来解决它。

Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");

Answer 2

回答by Eran

I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.

我想扩展 Bertil 的回答，因为我一直在试验连接 URL 参数。

rewriteBatchedStatements=trueis the important parameter. useServerPrepStmtsis already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.

rewriteBatchedStatements=true是重要参数。useServerPrepStmts默认情况下已经是 false ，即使将其更改为 true 在批量插入性能方面也没有太大区别。

Now I think is the time to write how rewriteBatchedStatements=trueimproves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch()(Source). That means that instead of sending the following nINSERT statements to the mysql server each time executeBatch()is called :

现在我想是时候写下如何rewriteBatchedStatements=true如此显着地提高性能了。它是通过rewriting of prepared statements for INSERT into multi-value inserts when executeBatch()( Source) 实现的。这意味着不是n每次executeBatch()调用以下INSERT 语句到 mysql 服务器：

INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)

It would send a single INSERT statement :

它会发送一个 INSERT 语句：

INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)

You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.

您可以通过切换 mysql 日志记录 (by SET global general_log = 1)来观察它，它会将发送到 mysql 服务器的每个语句登录到一个文件中。

Answer 3

回答by nos

You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);, you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3);(It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)

您可以使用一个 insert 语句插入多行，一次执行几千行可以大大加快速度，也就是说INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);，您可以这样做INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3);（可能是 JDBC .addBatch() 做了类似的优化）现在 - 尽管 mysql addBatch 过去完全未优化并且只是发出单独的查询 - 我不知道最近的驱动程序是否仍然如此）

If you really need speed, load your data from a comma separated file with LOAD DATA INFILE, we get around 7-8 times speedup doing that vs doing tens of millions of inserts.

如果您确实需要速度，请使用LOAD DATA INFILE从逗号分隔的文件中加载数据，与执行数千万次插入相比，这样做的速度提高了大约 7-8 倍。

Answer 4

回答by Wrikken

If:

如果：

It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert

这是一个新表，或者插入的数量大于已经插入的数据
表上有索引
在插入期间您不需要对表的其他访问

Then ALTER TABLE tbl_name DISABLE KEYScan greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYSto start building the indexes, which can take a while, but not nearly as long as doing it for every insert.

那么ALTER TABLE tbl_name DISABLE KEYS可以大大提高你的插入速度。完成后，运行ALTER TABLE tbl_name ENABLE KEYS以开始构建索引，这可能需要一段时间，但不会像每次插入一样长。

Answer 5

回答by Lalith

You may try using DDBulkLoad object.

您可以尝试使用 DDBulkLoad 对象。

// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);

Answer 6

回答by Hieu HoangChi

try {
        // Disable auto-commit
        connection.setAutoCommit(false);
        int maxInsertBatch = 10000;     
        // Create a prepared statement
        String sql = "INSERT INTO mytable (xxx), VALUES(?)";
        PreparedStatement pstmt = connection.prepareStatement(sql);

        Object[] vals=set.toArray();
        int count = 1;
        for (int i=0; i<vals.length; i++) {
            pstmt.setString(1, vals[i].toString());
            pstmt.addBatch();
            if(count%maxInsertBatch == 0){
                 pstmt.executeBatch();
            }
            count++;
        }

        // Execute the batch
        pstmt.executeBatch();
        System.out.append("inserted "+count);

MySQL JDBC批量插入性能

提问by Bertil Chapuis

回答by Bertil Chapuis

回答by Eran

回答by nos

回答by Wrikken

回答by Lalith

回答by Hieu HoangChi

相关推荐

最近更新

标签

MySQL JDBC批量插入性能

提问by Bertil Chapuis

回答by Bertil Chapuis

回答by Eran

回答by nos

回答by Wrikken

回答by Lalith

回答by Hieu HoangChi

相关推荐

在 MYSQL 中批量插入

我如何在 MYSQL 中运行查询而不将其写入二进制日志

MySQL 复制单个表

mySQL 错误 1040：连接过多

相关推荐

最近更新

标签