Java 提高 JDBC 性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25850569/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 01:16:36  来源:igfitidea点击:

Improve JDBC Performance

javasqlmultithreadingjdbc

提问by user2133404

I am executing the following set of statements in my java application. It connects to a oracle database.

我在我的 java 应用程序中执行以下语句集。它连接到一个 oracle 数据库。

stat=connection.createStatement();
stat1=commection.createstatement();
ResultSet rs = stat.executeQuery(BIGQUERY);
while(rs.next()) {
    obj1.setAttr1(rs.getString(1));
    obj1.setAttr2(rs.getString(1));
    obj1.setAttr3(rs.getString(1));
    obj1.setAttr4(rs.getString(1));

    ResultSet rs1 = stat1.executeQuery(SMALLQ1);
    while(rs1.next()) {
       obj1.setAttr5(rs1.getString(1));
    }

     ResultSet rs2 = stat1.executeQuery(SMALLQ2);
    while(rs2.next()) {
       obj1.setAttr6(rs2.getString(1));

    }
         .
         .
         .
 LinkedBlockingqueue.add(obj1);
 }
 //all staements and connections close

The BIGQUERYreturns around 4.5 million records and for each record, I have to execute the smaller queries, which are 14 in number. Each small query has 3 inner join statements.

BIGQUERY约4.5万条记录,并为每个记录返回,我必须执行较小的查询,这是在14号。每个小查询都有 3 个内部连接语句。

My multi threaded application now can process 90,000 in one hour. But I may have to run the code daily, so I want to process all the records in 20 hours. I am using about 200 threads which process the above code and stores the records in linked blocking queue.

我的多线程应用程序现在可以在一小时内处理 90,000 个。但是我可能每天都要运行代码,所以我想在 20 小时内处理所有记录。我正在使用大约 200 个线程来处理上述代码并将记录存储在链接的阻塞队列中。

Does increasing the thread count blindly helps increase the performance or is there some other way in which I can increase the performance of the result sets?

盲目地增加线程数有助于提高性能还是有其他方法可以提高结果集的性能?

PS : I am unable to post the query here, but I am assured that all queries are optimized.

PS:我无法在此处发布查询,但我确信所有查询都已优化。

回答by przemek hertel

To improve JDBC performance for your scenario you can apply some modifications.

为了提高您的场景的 JDBC 性能,您可以应用一些修改。

As you will see, all these modifications can significantly speed your task.

正如您将看到的,所有这些修改都可以显着加快您的任务。

1. Using batch operations.

1.使用批量操作。

You can read your big query and store results in some kind of buffer. And only when buffer is full you should run subquery for all data collected in buffer. This significantly reduces number of SQL statements to execute.

您可以读取大查询并将结果存储在某种缓冲区中。只有当缓冲区已满时,您才应该对缓冲区中收集的所有数据运行子查询。这显着减少了要执行的 SQL 语句的数量。

static final int BATCH_SIZE = 1000; 
List<MyData> buffer = new ArrayList<>(BATCH_SIZE);

while (rs.hasNext()) {

  MyData record = new MyData( rs.getString(1), ..., rs.getString(4) );
  buffer.add( record );

  if (buffer.size() == BATCH_SIZE) {
    processBatch( buffer );
  }  
}

void processBatch( List<MyData> buffer ) {

  String sql = "select ... where X and id in (" + getIDs(buffer) + ")";
  stat1.executeQuery(sql);  // query for all IDs in buffer
  while(stat1.hasNext()) { ... }
  ...  
}

2. Using efficient maps to store content from many selects.

2. 使用高效的地图来存储来自多个选择的内容。

If your records are no so big you can store them all at once event for 4 mln table.

如果您的记录不是那么大,您可以一次将它们全部存储为 400 万个表。

I used this approach many times for night processes (with no normal users). Such approach may need much heap memory (i.e. 100 MB - 1 GB) - but is much faster that approach 1).

我在夜间进程中多次使用这种方法(没有普通用户)。这种方法可能需要很多堆内存(即 100 MB - 1 GB)——但比方法 1) 要快得多。

To do that you need efficient map implementation, i.e. - gnu.trove.map.TIntObjectMap (etc) which is much better that java standard library maps.

要做到这一点,您需要高效的地图实现,即 - gnu.trove.map.TIntObjectMap (etc) 这比 java 标准库地图要好得多。

final TIntObjectMap<MyData> map = new TIntObjectHashMap<MyData>(10000, 0.8f);

// query 1
while (rs.hasNext()) {
  MyData record = new MyData( rs.getInt(1), rs.getString(2), ..., rs.getString(4) );
  map.put(record.getId(), record);
}

// query 2
while (rs.hasNext()) {
  int id = rs.getInt(1);   // my data id
  String x = rs.getString(...);
  int y = rs.getInt(...);

  MyData record = map.get(id);
  record.add( new MyDetail(x,y) );
}

// query 3
// same pattern as query 2 

After this you have map filled with all data collected. Probably with a lot of memory allocated. This is why you can use that method only if you hava such resources.

在此之后,您的地图已填充了收集的所有数据。可能分配了大量内存。这就是为什么只有在拥有此类资源时才能使用该方法的原因。

Another topic is how to write MyData and MyDetail classes to be as small as possible. You can use some tricks:

另一个主题是如何将 MyData 和 MyDetail 类编写得尽可能小。你可以使用一些技巧:

  1. storing 3 integers (with limited range) in 1 long variable (using util for bit shifting)
  2. storing Date objects as integer (yymmdd)
  3. calling str.intern() for each string fetched from DB
  1. 在 1 个长变量中存储 3 个整数(范围有限)(使用 util 进行位移)
  2. 将 Date 对象存储为整数 (yymmdd)
  3. 为从数据库获取的每个字符串调用 str.intern()

3. Transactions

3. 交易

If you have to do some updates or inserts than 4 mln records is too much to handle in on transactions. This is too much for most database configurations. Use approach 1) and commit transaction for each batch. On each new inserted record you can have something like RUN_ID and if everything went well you can mark this RUN_ID as successful.

如果您必须进行一些更新或插入,则超过 400 万条记录对于事务处理来说太多了。这对于大多数数据库配置来说太多了。使用方法 1) 并为每个批次提交事务。在每个新插入的记录上,您可以有类似 RUN_ID 的内容,如果一切顺利,您可以将此 RUN_ID 标记为成功。

If your queries only read - there is no problem. However you can mark transaction as Read-only to help your database.

如果您的查询只是读取 - 没有问题。但是,您可以将事务标记为只读以帮助您的数据库。

4. Jdbc fetch size.

4. Jdbc 获取大小。

When you load a lot of records from database it is very, very important to set proper fetch size on your jdbc connection. This reduces number of physical hits to database socket and speeds your process.

当您从数据库加载大量记录时,在 jdbc 连接上设置适当的获取大小非常非常重要。这减少了对数据库套接字的物理命中次数并加快了进程。

Example:

例子:

// jdbc
statement.setFetchSize(500);

// spring     
JdbcTemplate jdbc = new JdbcTemplate(datasource);
jdbc.setFetchSize(500);

Here you can find some benchmarksand patterns for using fetch size:

在这里您可以找到一些使用 fetch size 的基准和模式:

http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html

http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html

5. PreparedStatement

5. 准备好的语句

Use PreparedStatement rather than Statement.

使用 PreparedStatement 而不是 Statement。

6. Number of sql statements.

6. sql 语句数。

Always try to minimize number of sql statements you send to database.

始终尽量减少发送到数据库的 sql 语句的数量。

回答by Safwan Muhammed

Try this resultSet.setFetchSize(100);

尝试这个 resultSet.setFetchSize(100);

while(resultSet.next) { ... }

while(resultSet.next) { ... }

The parameter is the number of rows that should be retrieved from the database in each roundtrip

该参数是每次往返应从数据库中检索的行数