Java 为什么 Spring 的 jdbcTemplate.batchUpdate() 这么慢?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20360574/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 01:22:26  来源:igfitidea点击:

Why Spring's jdbcTemplate.batchUpdate() so slow?

javamysqlspringspring-batchjdbctemplate

提问by user2602807

I'm trying to find the faster way to do batch insert.

我正在尝试找到更快的方法进行批量插入

I tried to insert several batches with jdbcTemplate.update(String sql), where sql was builded by StringBuilder and looks like:

我尝试使用jdbcTemplate.update(String sql)插入多个批次,其中 sql 由 StringBuilder构建,如下所示:

INSERT INTO TABLE(x, y, i) VALUES(1,2,3), (1,2,3), ... , (1,2,3)

Batch size was exactly 1000. I inserted nearly 100 batches. I checked the time using StopWatch and found out insert time:

批次大小正好是 1000。我插入了近 100 个批次。我使用秒表检查了时间并发现了插入时间:

min[38ms], avg[50ms], max[190ms] per batch

I was glad but I wanted to make my code better.

我很高兴,但我想让我的代码更好。

After that, I tried to use jdbcTemplate.batchUpdate in way like:

之后,我尝试以如下方式使用 jdbcTemplate.batchUpdate:

    jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
        @Override
        public void setValues(PreparedStatement ps, int i) throws SQLException {
                       // ...
        }
        @Override
        public int getBatchSize() {
            return 1000;
        }
    });

where sql was look like

sql看起来像

INSERT INTO TABLE(x, y, i) VALUES(1,2,3);

and I was disappointed! jdbcTemplate executed every single insert of 1000 lines batch in separated way. I loked at mysql_log and found there a thousand inserts. I checked the time using StopWatch and found out insert time:

我很失望!jdbcTemplate 以单独的方式批量执行 1000 行的每个插入。我查看了 mysql_log,发现有一千个插入。我使用秒表检查了时间并发现了插入时间:

min[900ms], avg[1100ms], max[2000ms] per Batch

每批最小[900ms]、平均[1100ms]、最大[2000ms]

So, can anybody explain to me, why jdbcTemplate doing separated inserts in this method? Why method's name is batchUpdate? Or may be I am using this method in wrong way?

那么,谁能向我解释一下,为什么 jdbcTemplate 在这种方法中进行分离插入?为什么方法的名称是batchUpdate?或者我可能以错误的方式使用这种方法?

回答by Evgeni Dimitrov

Change your sql insert to INSERT INTO TABLE(x, y, i) VALUES(1,2,3). The framework creates a loop for you. For example:

将您的 sql 插入更改为INSERT INTO TABLE(x, y, i) VALUES(1,2,3). 该框架为您创建了一个循环。例如:

public void insertBatch(final List<Customer> customers){

  String sql = "INSERT INTO CUSTOMER " +
    "(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";

  getJdbcTemplate().batchUpdate(sql, new BatchPreparedStatementSetter() {

    @Override
    public void setValues(PreparedStatement ps, int i) throws SQLException {
        Customer customer = customers.get(i);
        ps.setLong(1, customer.getCustId());
        ps.setString(2, customer.getName());
        ps.setInt(3, customer.getAge() );
    }

    @Override
    public int getBatchSize() {
        return customers.size();
    }
  });
}

IF you have something like this. Spring will do something like:

如果你有这样的事情。Spring 将执行以下操作:

for(int i = 0; i < getBatchSize(); i++){
   execute the prepared statement with the parameters for the current iteration
}

The framework first creates PreparedStatement from the query (the sqlvariable) then the setValues method is called and the statement is executed. that is repeated as much times as you specify in the getBatchSize()method. So the right way to write the insert statement is with only one values clause. You can take a look at http://docs.spring.io/spring/docs/3.0.x/reference/jdbc.html

框架首先从查询(sql变量)创建 PreparedStatement,然后调用 setValues 方法并执行语句。重复您在getBatchSize()方法中指定的次数。所以编写插入语句的正确方法是只有一个 values 子句。你可以看看http://docs.spring.io/spring/docs/3.0.x/reference/jdbc.html

回答by reblace

I don't know if this will work for you, but here's a Spring-free way that I ended up using. It was significantly faster than the various Spring methods I tried. I even tried using the JDBC template batch update method the other answer describes, but even that was slower than I wanted. I'm not sure what the deal was and the Internets didn't have many answers either. I suspected it had to do with how commits were being handled.

我不知道这是否适合您,但这是我最终使用的一种无 Spring 方式。它比我尝试过的各种 Spring 方法要快得多。我什至尝试使用另一个答案描述的 JDBC 模板批量更新方法,但即使这样也比我想要的要慢。我不确定交易是什么,互联网也没有很多答案。我怀疑这与提交的处理方式有关。

This approach is just straight JDBC using the java.sql packages and PreparedStatement's batch interface. This was the fastest way that I could get 24M records into a MySQL DB.

这种方法只是使用 java.sql 包和 PreparedStatement 的批处理接口的直接 JDBC。这是我将 24M 记录放入 MySQL 数据库的最快方法。

I more or less just built up collections of "record" objects and then called the below code in a method that batch inserted all the records. The loop that built the collections was responsible for managing the batch size.

我或多或少只是建立了“记录”对象的集合,然后在批量插入所有记录的方法中调用以下代码。构建集合的循环负责管理批量大小。

I was trying to insert 24M records into a MySQL DB and it was going ~200 records per second using Spring batch. When I switched to this method, it went up to ~2500 records per second. so my 24M record load went from a theoretical 1.5 days to about 2.5 hours.

我试图将 24M 记录插入到 MySQL 数据库中,并且使用 Spring 批处理每秒可以达到约 200 条记录。当我切换到这种方法时,它达到了每秒约 2500 条记录。所以我的 24M 记录负载从理论上的 1.5 天变成了大约 2.5 小时。

First create a connection...

首先建立一个连接...

Connection conn = null;
try{
    Class.forName("com.mysql.jdbc.Driver");
    conn = DriverManager.getConnection(connectionUrl, username, password);
}catch(SQLException e){}catch(ClassNotFoundException e){}

Then create a prepared statement and load it with batches of values for insert, and then execute as a single batch insert...

然后创建一个准备好的语句并使用批量插入值加载它,然后作为单个批量插入执行...

PreparedStatement ps = null;
try{
    conn.setAutoCommit(false);
    ps = conn.prepareStatement(sql); // INSERT INTO TABLE(x, y, i) VALUES(1,2,3)
    for(MyRecord record : records){
        try{
            ps.setString(1, record.getX());
            ps.setString(2, record.getY());
            ps.setString(3, record.getI());

            ps.addBatch();
        } catch (Exception e){
            ps.clearParameters();
            logger.warn("Skipping record...", e);
        }
    }

    ps.executeBatch();
    conn.commit();
} catch (SQLException e){
} finally {
    if(null != ps){
        try {ps.close();} catch (SQLException e){}
    }
}

Obviously I've removed error handling and the query and Record object is notional and whatnot.

显然我已经删除了错误处理,并且查询和记录对象是名义上的等等。

Edit:Since your original question was comparing the insert into foobar values (?,?,?), (?,?,?)...(?,?,?) method to Spring batch, here's a more direct response to that:

编辑:由于您最初的问题是将插入到 foobar 值 (?,?,?), (?,?,?)...(?,?,?) 方法与 Spring 批处理进行比较,这里有一个更直接的回应:

It looks like your original method is likely the fastest way to do bulk data loads into MySQL without using something like the "LOAD DATA INFILE" approach. A quote from the MysQL docs (http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html):

看起来您的原始方法可能是将批量数据加载到 MySQL 中而不使用“LOAD DATA INFILE”之类的方法的最快方法。引自 MysQL 文档(http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html):

If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.

如果您同时从同一客户端插入多行,请使用带有多个 VALUES 列表的 INSERT 语句一次插入多行。这比使用单独的单行 INSERT 语句快得多(在某些情况下快很多倍)。

You could modify the Spring JDBC Template batchUpdate method to do an insert with multiple VALUES specified per 'setValues' call, but you'd have to manually keep track of the index values as you iterate over the set of things being inserted. And you'd run into a nasty edge case at the end when the total number of things being inserted isn't a multiple of the number of VALUES lists you have in your prepared statement.

您可以修改 Spring JDBC 模板 batchUpdate 方法以使用每个 'setValues' 调用指定的多个 VALUES 进行插入,但您必须在迭代插入的一组内容时手动跟踪索引值。当插入的内容总数不是准备好的语句中的 VALUES 列表数的倍数时,你会在最后遇到一个令人讨厌的边缘情况。

If you use the approach I outline, you could do the same thing (use a prepared statement with multiple VALUES lists) and then when you get to that edge case at the end, it's a little easier to deal with because you can build and execute one last statement with exactly the right number of VALUES lists. It's a bit hacky, but most optimized things are.

如果您使用我概述的方法,您可以做同样的事情(使用带有多个 VALUES 列表的准备好的语句),然后当您最终遇到那个边缘情况时,处理起来会容易一些,因为您可以构建和执行具有完全正确数量的 VALUES 列表的最后一个语句。这有点hacky,但大多数优化的东西都是。

回答by Rakesh Soni

I have also faced the same issue with Spring JDBC template. Probably with Spring Batch the statement was executed and committed on every insert or on chunks, that slowed things down.

我也遇到了与 Spring JDBC 模板相同的问题。可能使用 Spring Batch 语句在每次插入或块上执行和提交,这会减慢速度。

I have replaced the jdbcTemplate.batchUpdate() code with original JDBC batch insertion code and found the Major performance improvement.

我已经用原来的 JDBC 批量插入代码替换了 jdbcTemplate.batchUpdate() 代码,发现了主要的性能改进

DataSource ds = jdbcTemplate.getDataSource();
Connection connection = ds.getConnection();
connection.setAutoCommit(false);
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;

for (Employee employee: employees) {

    ps.setString(1, employee.getName());
    ps.setString(2, employee.getCity());
    ps.setString(3, employee.getPhone());
    ps.addBatch();

    ++count;

    if(count % batchSize == 0 || count == employees.size()) {
        ps.executeBatch();
        ps.clearBatch(); 
    }
}

connection.commit();
ps.close();

Check this link as well JDBC batch insert performance

检查此链接以及 JDBC 批量插入性能

回答by teu

These parameters in the JDBC connection URL can make a big difference in the speed of batched statements --- in my experience, they speed things up:

JDBC 连接 URL 中的这些参数可以对批处理语句的速度产生很大影响 --- 根据我的经验,它们可以加快速度:

?useServerPrepStmts=false&rewriteBatchedStatements=true

?useServerPrepStmts=false&rewriteBatchedStatements=true

See: JDBC batch insert performance

请参阅:JDBC 批量插入性能

回答by Mike

Simply use transaction. Add @Transactional on method.

只需使用事务。在方法上添加@Transactional。

Be sure to declare the correct TX manager if using several datasources @Transactional("dsTxManager"). I have a case where inserting 60000 records. It takes about 15s. No other tweak:

如果使用多个数据源 @Transactional("dsTxManager"),请务必声明正确的 TX 管理器。我有一个插入 60000 条记录的情况。大约需要15s。没有其他调整:

@Transactional("myDataSourceTxManager")
public void save(...) {
...
    jdbcTemplate.batchUpdate(query, new BatchPreparedStatementSetter() {

            @Override
            public void setValues(PreparedStatement ps, int i) throws SQLException {
                ...

            }

            @Override
            public int getBatchSize() {
                if(data == null){
                    return 0;
                }
                return data.size();
            }
        });
    }

回答by Carlos Cuesta

I found a major improvementsetting the argTypes array in the call.

我发现在调用中设置 argTypes 数组的重大改进

In my case, with Spring 4.1.4 and Oracle 12c, for insertion of 5000 rows with 35 fields:

在我的例子中,使用 Spring 4.1.4 和 Oracle 12c,插入 5000 行有 35 个字段:

jdbcTemplate.batchUpdate(insert, parameters); // Take 7 seconds

jdbcTemplate.batchUpdate(insert, parameters, argTypes); // Take 0.08 seconds!!!

The argTypes param is an int array where you set each field in this way:

argTypes 参数是一个 int 数组,您可以在其中以这种方式设置每个字段:

int[] argTypes = new int[35];
argTypes[0] = Types.VARCHAR;
argTypes[1] = Types.VARCHAR;
argTypes[2] = Types.VARCHAR;
argTypes[3] = Types.DECIMAL;
argTypes[4] = Types.TIMESTAMP;
.....

I debugged org\springframework\jdbc\core\JdbcTemplate.java and found that most of the time was consumed trying to know the nature of each field, and this was made for each record.

我调试了 org\springframework\jdbc\core\JdbcTemplate.java ,发现大部分时间都花在了试图了解每个字段的性质上,这是为每个记录制作的。

Hope this helps !

希望这可以帮助 !

回答by Pratidnya

Solution given by @Rakesh worked for me. Significant improvement in performance. Earlier time was 8 min, with this solution taking less than 2 min.

@Rakesh 给出的解决方案对我有用。性能显着提升。较早的时间是 8 分钟,此解决方案耗时不到 2 分钟。

DataSource ds = jdbcTemplate.getDataSource();
Connection connection = ds.getConnection();
connection.setAutoCommit(false);
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;

for (Employee employee: employees) {

    ps.setString(1, employee.getName());
    ps.setString(2, employee.getCity());
    ps.setString(3, employee.getPhone());
    ps.addBatch();

    ++count;

    if(count % batchSize == 0 || count == employees.size()) {
        ps.executeBatch();
        ps.clearBatch(); 
    }
}

connection.commit();
ps.close();

回答by Jefferson Quesado

I had also some bad time with Spring JDBC batch template. In my case, it would be, like, insane to use pure JDBC, so instead I used NamedParameterJdbcTemplate. This was a must have in my project. But it was way slow to insert hundreds os thousands of lines in the database.

我在使用 Spring JDBC 批处理模板时也遇到了一些不愉快。就我而言,使用纯 JDBC 会很疯狂,所以我使用NamedParameterJdbcTemplate. 这在我的项目中是必须的。但是在数据库中插入成百上千行的速度很慢。

To see what was going on, I've sampled it with VisualVM during the batch update and, voilà:

为了了解发生了什么,我在批量更新期间使用 VisualVM 对其进行了采样,瞧:

visualvm showing where it was slow

visualvm 显示速度慢的地方

What was slowing the process was that, while setting the parameters, Spring JDBC was querying the database to know the metadata eachparameter. And seemed to me that it was querying the database for each parameter for each line every time. So I just taught Spring to ignore the parameter types:

减慢进程的原因是,在设置参数时,Spring JDBC 正在查询数据库以了解每个参数的元数据。而在我看来,这是查询每个参数的每一行数据库每次。所以我只是教 Spring 忽略参数类型:

    @Bean(name = "named-jdbc-tenant")
    public synchronized NamedParameterJdbcTemplate getNamedJdbcTemplate(@Autowired TenantRoutingDataSource tenantDataSource) {
        System.setProperty("spring.jdbc.getParameterType.ignore", "true");
        return new NamedParameterJdbcTemplate(tenantDataSource);
    }

Note: the system property must be set beforecreating the JDBC Template object. It would be possible to just set in the application.properties, but this solved and I've never after touched this again

注意:必须创建 JDBC 模板对象之前设置系统属性。可以只设置application.properties,但这解决了,我再也没有碰过这个