java 需要在 5 秒内使用休眠在 mysql 中插入 100000 行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44243608/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 08:01:59  来源:igfitidea点击:

Need to insert 100000 rows in mysql using hibernate in under 5 seconds

javamysqlhibernatejpabatch-insert

提问by Kumar Manish

I am trying to insert 100,000 rows in a MYSQL table under 5 seconds using Hibernate(JPA). I have tried every trick hibernate offers and still can not do better than 35 seconds.

我正在尝试使用 Hibernate(JPA) 在 5 秒内在 MYSQL 表中插入 100,000 行。我已经尝试了 hibernate 提供的所有技巧,但仍然不能超过 35 秒。

1st optimisation : I started with IDENTITY sequence generator which was resulting in 60 seconds to insert. I later abandoned the sequence generator and started assigning the @Idfield myself by reading the MAX(id)and using AtomicInteger.incrementAndGet()to assign fields myself. That reduced the insert time to 35 seconds.

第一次优化:我从 IDENTITY 序列生成器开始,它导致 60 秒的插入。后来我放弃了序列生成器,并@Id通过阅读MAX(id)和使用AtomicInteger.incrementAndGet()自己分配字段开始自己分配字段。这将插入时间减少到 35 秒。

2nd optimisation : I enabled batch inserts, by adding

第二个优化:我启用了批量插入,通过添加

<prop key="hibernate.jdbc.batch_size">30</prop> <prop key="hibernate.order_inserts">true</prop> <prop key="hibernate.current_session_context_class">thread</prop> <prop key="hibernate.jdbc.batch_versioned_data">true</prop>

<prop key="hibernate.jdbc.batch_size">30</prop> <prop key="hibernate.order_inserts">true</prop> <prop key="hibernate.current_session_context_class">thread</prop> <prop key="hibernate.jdbc.batch_versioned_data">true</prop>

to the configuration. I was shocked to find that batch inserts did absolutely nothing to decrease insert time. It was still 35 seconds!

到配置。我震惊地发现批量插入对减少插入时间完全没有作用。还是35秒!

Now, I am thinking about trying to insert using multiple threads. Anyone has any pointers? Should I have chosen MongoDB?

现在,我正在考虑尝试使用多线程插入。有人有任何指示吗?我应该选择MongoDB吗?

Below is my configuration: 1. Hibernate configuration `

以下是我的配置: 1. Hibernate 配置 `

<bean id="entityManagerFactoryBean" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
        <property name="dataSource" ref="dataSource" />
        <property name="packagesToScan" value="com.progresssoft.manishkr" />
        <property name="jpaVendorAdapter">
            <bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" />
        </property>
        <property name="jpaProperties">
            <props>
                <prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto}</prop>
                <prop key="hibernate.dialect">${hibernate.dialect}</prop>
                <prop key="hibernate.show_sql">${hibernate.show_sql}</prop>
                <prop key="hibernate.format_sql">${hibernate.format_sql}</prop>
                <prop key="hibernate.jdbc.batch_size">30</prop>
                <prop key="hibernate.order_inserts">true</prop>
                <prop key="hibernate.current_session_context_class">thread</prop>
                <prop key="hibernate.jdbc.batch_versioned_data">true</prop>
            </props>
        </property>
    </bean>

    <bean class="org.springframework.jdbc.datasource.DriverManagerDataSource"
          id="dataSource">
        <property name="driverClassName" value="${database.driver}"></property>
        <property name="url" value="${database.url}"></property>
        <property name="username" value="${database.username}"></property>
        <property name="password" value="${database.password}"></property>
    </bean>

    <bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
        <property name="entityManagerFactory" ref="entityManagerFactoryBean" />
    </bean>



    <tx:annotation-driven transaction-manager="transactionManager" />

`

`

  1. Entity configuration :
  1. 实体配置:

`

`

@Entity
@Table(name = "myEntity")
public class MyEntity {

    @Id
    private Integer id;

    @Column(name = "deal_id")
    private String dealId;

    ....
    ....

    @Temporal(TemporalType.TIMESTAMP)
    @Column(name = "timestamp")
    private Date timestamp;

    @Column(name = "amount")
    private BigDecimal amount;

    @OneToOne(cascade = CascadeType.ALL)
    @JoinColumn(name = "source_file")
    private MyFile sourceFile;

    public Deal(Integer id,String dealId, ....., Timestamp timestamp, BigDecimal amount, SourceFile sourceFile) {
        this.id = id;
        this.dealId = dealId;
        ...
        ...
        ...
        this.amount = amount;
        this.sourceFile = sourceFile;
    }


    public String getDealId() {
        return dealId;
    }

    public void setDealId(String dealId) {
        this.dealId = dealId;
    }

   ...

   ...


    ....

    public BigDecimal getAmount() {
        return amount;
    }

    public void setAmount(BigDecimal amount) {
        this.amount = amount;
    }

    ....


    public Integer getId() {
        return id;
    }

    public void setId(Integer id) {
        this.id = id;
    }

`

`

  1. Persisting code (service) :
  1. 持久化代码(服务):

`

`

@Service
@Transactional
public class ServiceImpl implements MyService{

    @Autowired
    private MyDao dao;
....

`void foo(){
        for(MyObject d : listOfObjects_100000){
            dao.persist(d);
        }
}

` 4. Dao class :

` 4.道类:

`

`

@Repository
public class DaoImpl implements MyDao{

    @PersistenceContext
    private EntityManager em;

    public void persist(Deal deal){
        em.persist(deal);
    }
}

`

`

Logs: `

日志:`

DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] 

... ...

……

DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 27
18:26:34.011 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - update deal_source_file set invalid_rows=?, source_file=?, valid_rows=? where id=?
18:26:34.015 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 1
18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - committed JDBC Connection
18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - re-enabling autocommit
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.orm.jpa.JpaTransactionManager - Closing JPA EntityManager [org.hibernate.jpa.internal.EntityManagerImpl@2354fb09] after transaction
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.o.jpa.EntityManagerFactoryUtils - Closing JPA EntityManager
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.internal.JdbcCoordinatorImpl - HHH000420: Closing un-released batch
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Releasing JDBC connection
18:26:34.033 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Released JDBC connection

'

'

采纳答案by Kumar Manish

After trying all possible solutions I finally found a solution to insert 100,000 rows under 5 seconds!

在尝试了所有可能的解决方案后,我终于找到了在 5 秒内插入 100,000 行的解决方案!

Things I tried:

我尝试过的事情:

1) Replaced hibernate/database's AUTOINCREMENT/GENERATED id's by self generated ID's using AtomicInteger

1) 使用 AtomicInteger 将休眠/数据库的 AUTOINCREMENT/GENERATED id 替换为自己生成的 ID

2) Enabling batch_inserts with batch_size=50

2)使用batch_size=50启用batch_inserts

3) Flushing cache after every 'batch_size' number of persist() calls

3) 在每 'batch_size' 次persist() 调用后刷新缓存

4) multithreading (did not attempt this one)

4)多线程(这个没试过)

Finally what worked was using a native multi-insert queryand inserting 1000 rows in one sql insert query instead of using persist()on every entity. For inserting 100,000 entities, I create a native query like this "INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)"[1000 row inserts in one sql insert query]

最后,有效的是使用本机多插入查询并在一个 sql 插入查询中插入 1000 行,而不是在每个实体上使用persist()。为了插入 100,000 个实体,我创建了一个像这样的本机查询"INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)"[在一个 sql 插入查询中插入 1000 行]

Now it takes around 3 seconds for inserting 100,000 records! So the bottleneck was the orm itself! For bulk inserts, the only thing that seems to work is native insert queries!

现在插入 100,000 条记录大约需要 3 秒!所以瓶颈是orm本身!对于批量插入,似乎唯一有效的是本机插入查询!

回答by M. Deinum

  1. You are using Spring for managing the transaction but break it by using threadas the current session context. When using Spring to manage your transactions don't mess around with the hibernate.current_session_context_classproperty. Remove it.

  2. Don't use the DriverManagerDataSourceuse a proper connection pool like HikariCP.

  3. In your for loop you should flushand clearthe EntityManagerat regular intervals, preferably the same as your batch size. If you don't a single persist takes longer and longer, because when you do that Hibernate checks the first level cache for dirty objects, the more objects the more time it takes. With 10 or 100 it is acceptable but checking 10000s of objects for each persist will take its toll.

  1. 您正在使用 Spring 来管理事务,但通过thread用作当前会话上下文来破坏它。当使用 Spring 来管理您的事务时,不要乱搞hibernate.current_session_context_class属性。去掉它。

  2. 不要使用DriverManagerDataSource像 HikariCP 这样合适的连接池。

  3. 在你的循环,你应该flushclearEntityManager定期,最好与您的批量大小。如果你没有一个持久化需要越来越长的时间,因为当你这样做时,Hibernate 会检查第一级缓存是否有脏对象,对象越多花费的时间就越多。使用 10 或 100 是可以接受的,但是为每个持久化检查 10000 个对象会造成损失。

-

——

@Service
@Transactional
public class ServiceImpl implements MyService{

    @Autowired
    private MyDao dao;

    @PersistenceContext
    private EntityManager em;


    void foo(){
        int count = 0;
        for(MyObject d : listOfObjects_100000){
            dao.persist(d);
            count++;
            if ( (count % 30) == 0) {
               em.flush();
               em.clear();
            }    
        }
    }

For a more in depth explanation see this blogand this blog.

有关更深入的解释,请参阅此博客此博客

回答by Justas

Another option to consider is StatelessSession:

另一个要考虑的选择是StatelessSession

A command-oriented API for performing bulk operations against a database.

A stateless session does not implement a first-level cache nor interact with any second-level cache, nor does it implement transactional write-behind or automatic dirty checking, nor do operations cascade to associated instances. Collections are ignored by a stateless session. Operations performed via a stateless session bypass Hibernate's event model and interceptors. Stateless sessions are vulnerable to data aliasing effects, due to the lack of a first-level cache.

For certain kinds of transactions, a stateless session may perform slightly faster than a stateful session.

面向命令的 API,用于对数据库执行批量操作。

无状态会话不实现一级缓存,也不与任何二级缓存交互,也不实现事务性后写或自动脏检查,也不执行级联到关联实例的操作。无状态会话会忽略集合。通过无状态会话执行的操作绕过 Hibernate 的事件模型和拦截器。由于缺乏一级缓存,无状态会话容易受到数据混叠效应的影响。

对于某些类型的事务,无状态会话的执行速度可能比有状态会话稍快。

Related discussion: Using StatelessSession for Batch processing

相关讨论: 使用 StatelessSession 进行批处理

回答by M46

Uff. You can do a lot of things to increase speed.

噗。你可以做很多事情来提高速度。

1.) Use @DynamicInsert and @DynamicUpdate to prevent the DB from inserting non-empty columns and updating changed columns.

1.) 使用@DynamicInsert 和@DynamicUpdate 来防止数据库插入非空列和更新更改的列。

2.) Try to insert the columns directly (without using hibernate) into your database to see if hibernate is really your bottleneck.

2.) 尝试将列直接(不使用休眠)插入到您的数据库中,以查看休眠是否真的是您的瓶颈。

3.) Use a sessionfactory and only commit your transaction every e.g. 100 inserts. Or only open and close the transaction once and flush your data every 100 inserts.

3.) 使用 sessionfactory 并且仅每插入 100 次就提交一次您的事务。或者只打开和关闭事务一次,每 100 次插入刷新数据。

4.) Use the ID generation strategy "sequence" and let hibernate preallocate (via the parameter allocationsize) the IDs.

4.) 使用 ID 生成策略“sequence”并让 hibernate 预分配(通过参数 allocationsize)ID。

5.) Use caches.

5.) 使用缓存。

Some of this possible solutions can have timing disadvantages when not used correctly. But you have a lot of opportunities.

如果使用不当,其中一些可能的解决方案可能会出现计时劣势。但是你有很多机会。