Java 更新单表的多行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20354684/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Updating multiple rows of single table
提问by Mohammad Adil
I need to update every row of a table having more then 60k rows. Currently I'm doing it like this -
我需要更新具有超过 60k 行的表的每一行。目前我正在这样做 -
public void updateRank(Map<Integer,Double> map){
Iterator<Entry<Integer, Double>> it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<Integer,Double> pairs = (Map.Entry<Integer,Double>)it.next();
String query = "update profile set rank = "+ pairs.getValue()+ " where profileId = "+pairs.getKey();
DBUtil.update(query);
it.remove();
}
}
This method alone took around 20+ mins to complete, hitting the database for each row(60k) is what i think the reason here.(though i'm using dbcp for connecton pooling, with 50 maximum active connections)
仅此方法就需要大约 20 多分钟才能完成,每行(60k)访问数据库是我认为这里的原因。(尽管我使用 dbcp 进行连接池,最大活动连接数为 50)
It'd be great if i'd be able to update rows with single database hit. Is that Possible ? How ?
如果我能够使用单个数据库命中来更新行,那就太好了。那可能吗 ?如何 ?
Or any other way to improve timing here ?
或者有什么其他方法可以改善时间?
采纳答案by Stefan Winkler
If every row should get a different value that cannot be derived from the existing data in the database, there is not much you can do to optimize the overall complexity. So do not expect too much wonders.
如果每一行都应该得到一个不同的值,而这个值不能从数据库中的现有数据中推导出来,那么你就无法优化整体复杂性。所以不要期望太多的奇迹。
That said, you should start using prepared statements and batching:
也就是说,您应该开始使用准备好的语句和批处理:
public void updateRank(Map<Integer,Double> map){
Iterator<Entry<Integer, Double>> it = map.entrySet().iterator();
String query = "";
int i = 0;
Connection connection = getConnection(); // get the DB connection from somewhere
PreparedStatement stmt = connection.prepareStatement("update profile set rank = ? where profileId = ?");
while (it.hasNext()) {
Map.Entry<Integer,Double> pairs = (Map.Entry<Integer,Double>)it.next();
stmt.setInt(1, pairs.getValue());
stmt.setDouble(2, pairs.getKey());
stmt.addBatch(); // this will just collect the data values
it.remove();
}
stmt.executeBatch(); // this will actually execute the updates all in one
}
What this does:
这是做什么的:
- the prepared statement causes the SQL parser to only parse the SQL once
- the batching minimizes the client-server-roundtrips so that not one for every update
- the communication between client and server is minimized because the SQL is only transmitted once and the data is collected and sent as a packet (or at least fewer packets)
- 准备好的语句导致 SQL 解析器只解析一次 SQL
- 批处理最大限度地减少了客户端-服务器的往返次数,这样每次更新都不会发生一次
- 客户端和服务器之间的通信被最小化,因为 SQL 只传输一次,数据是作为一个数据包(或至少更少的数据包)收集和发送的
In addition:
此外:
- Please check if the database column
profileId
is using an index so that looking up the respective row is fast enough - You could check if your connection is set to auto-commit. If so try to disable auto-commit and explicitly commit the transaction after all rows are updated. This way the single update operations could be faster as well.
- 请检查数据库列
profileId
是否正在使用索引,以便查找相应行的速度足够快 - 您可以检查您的连接是否设置为自动提交。如果是这样,请尝试禁用自动提交并在更新所有行后显式提交事务。这样,单个更新操作也可以更快。
回答by kgautron
You could concatenate your queries (separate them by a ;
) and send only batches of 100 queries for example.
例如,您可以连接您的查询(用 分隔它们;
)并仅发送 100 个查询的批次。
public void updateRank(Map<Integer,Double> map){
Iterator<Entry<Integer, Double>> it = map.entrySet().iterator();
String queries = "";
int i = 0;
while (it.hasNext()) {
Map.Entry<Integer,Double> pairs = (Map.Entry<Integer,Double>)it.next();
queries += "update profile set rank = "+ pairs.getValue()+ " where profileId = "+pairs.getKey() + ";";
it.remove();
if(i++ % 100 == 99){
DBUtil.update(queries);
queries = "";
}
}
}
回答by str
Right now you execute each query independently which causes a huge connection overhead (even when using connection pooling). Instead use a batch mechanism to execute several queries together.
现在您独立执行每个查询,这会导致巨大的连接开销(即使在使用连接池时)。而是使用批处理机制一起执行多个查询。
Using JDBC (which DBCP apparently is using) and prepared statements, you can do this very easily by using addBatch()
and executeBatch()
. I recently had to do this my own and a batch size of around 1000 queries was the fastest. Though this may be entirely different in your case.
使用 JDBC(显然 DBCP 正在使用)和准备好的语句,您可以通过使用addBatch()
和轻松完成此操作executeBatch()
。我最近不得不自己做这件事,大约 1000 个查询的批量大小是最快的。尽管这在您的情况下可能完全不同。
References
参考