Java 使用 JDBC 遍历大表的最快方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1080852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 23:18:48  来源:igfitidea点击:

Fastest way to iterate through large table using JDBC

javamysqljdbc

提问by Ish

I'm trying to create a java program to cleanup and merge rows in my table. The table is large, about 500k rows and my current solution is running very slowly. The first thing I want to do is simply get an in-memory array of objects representing all the rows of my table. Here is what I'm doing:

我正在尝试创建一个 java 程序来清理和合并表中的行。该表很大,大约有 50 万行,我当前的解决方案运行速度非常慢。我想做的第一件事就是简单地获取一个内存中的对象数组,代表我的表的所有行。这是我在做什么:

  • pick an increment of say 1000 rows at a time
  • use JDBC to fetch a resultset on the following SQL query SELECT * FROM TABLE WHERE ID > 0 AND ID < 1000
  • add the resulting data to an in-memory array
  • continue querying all the way up to 500,000 in increments of 1000, each time adding results.
  • 一次选择 1000 行的增量
  • 使用 JDBC 获取以下 SQL 查询的结果集 SELECT * FROM TABLE WHERE ID > 0 AND ID < 1000
  • 将结果数据添加到内存数组
  • 以 1000 为增量继续查询直至 500,000,每次添加结果。

This is taking way to long. In fact its not even getting past the second increment from 1000 to 2000. The query takes forever to finish (although when I run the same thing directly through a MySQL browser its decently fast). Its been a while since I've used JDBC directly. Is there a faster alternative?

这需要很长时间。事实上,它甚至没有超过从 1000 到 2000 的第二个增量。查询需要永远完成(尽管当我直接通过 MySQL 浏览器运行相同的事情时,它的速度相当快)。自从我直接使用 JDBC 以来已经有一段时间了。有没有更快的选择?

采纳答案by pablochan

First of all, are you sure you need the whole table in memory? Maybe you should consider (if possible) selecting rows that you want to update/merge/etc. If you really have to have the whole table you could consider using a scrollable ResultSet. You can create it like this.

首先,你确定你需要内存中的整个表吗?也许您应该考虑(如果可能)选择要更新/合并/等的行。如果您真的必须拥有整个表格,您可以考虑使用可滚动的 ResultSet。你可以像这样创建它。

// make sure autocommit is off (postgres)
con.setAutoCommit(false);

Statement stmt = con.createStatement(
                   ResultSet.TYPE_SCROLL_INSENSITIVE, //or ResultSet.TYPE_FORWARD_ONLY
                   ResultSet.CONCUR_READ_ONLY);
ResultSet srs = stmt.executeQuery("select * from ...");

It enables you to move to any row you want by using 'absolute' and 'relative' methods.

它使您可以使用“绝对”和“相对”方法移动到所需的任何行。

回答by Steve B.

Although it's probably not optimum, your solution seems like it ought to be fine for a one-off database cleanup routine. It shouldn't take that long to run a query like that and get the results (I'm assuming that since it's a one off a couple of seconds would be fine). Possible problems -

虽然它可能不是最佳的,但您的解决方案似乎应该适用于一次性数据库清理例程。运行这样的查询并获得结果不应该花那么长时间(我假设因为它是一个关闭几秒钟就可以了)。可能的问题——

  • is your network (or at least your connection to mysql ) very slow? You could try running the process locally on the mysql box if so, or something better connected.

  • is there something in the table structure that's causing it? pulling down 10k of data for every row? 200 fields? calculating the id values to get based on a non-indexed row? You could try finding a more db-friendly way of pulling the data (e.g. just the columns you need, have the db aggregate values, etc.etc)

  • 您的网络(或至少您与 mysql 的连接)很慢吗?如果是这样,您可以尝试在 mysql 机器上本地运行该进程,或者连接更好的东西。

  • 表结构中是否有导致它的东西?为每一行提取 10k 数据?200个字段?基于非索引行计算要获得的 id 值?您可以尝试找到一种对数据库更友好的提取数据的方法(例如,只需要您需要的列,具有数据库聚合值等)

If you're not getting through the second increment something is really wrong - efficient or not, you shouldn't have any problem dumping 2000, or 20,000 rows into memory on a running JVM. Maybe you're storing the data redundantly or extremely inefficiently?

如果您没有完成第二个增量,那么某些事情确实是错误的——不管效率与否,在正在运行的 JVM 上将 2000 或 20,000 行转储到内存中应该没有任何问题。也许您正在冗余或极其低效地存储数据?

回答by Shashikant Kore

One thing that helped me was Statement.setFetchSize(Integer.MIN_VALUE). I got this idea from Jason's blog. This cut down execution time by more than half. Memory consumed went down dramatically (as only one row is read at a time.)

帮助我的一件事是Statement.setFetchSize(Integer.MIN_VALUE)。我从Jason 的博客中得到了这个想法。这将执行时间减少了一半以上。消耗的内存急剧下降(因为一次只读取一行。)

This trick doesn't work for PreparedStatement, though.

不过,这个技巧对 无效PreparedStatement