Java 如何从大表中读取所有行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3682614/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 03:28:00  来源:igfitidea点击:

How to read all rows from huge table?

javapostgresqljdbc

提问by marioosh

I have a problem with processing all rows from database (PostgreSQL). I get an error: org.postgresql.util.PSQLException: Ran out of memory retrieving query results.I think that I need to read all rows in small pieces, but it doesn't work - it reads only 100 rows (code below). How to do that?

我在处理数据库 (PostgreSQL) 中的所有行时遇到问题。我收到一个错误:org.postgresql.util.PSQLException: Ran out of memory retrieving query results.我认为我需要读取小块的所有行,但它不起作用 - 它只读取 100 行(下面的代码)。怎么做?

    int i = 0;      
    Statement s = connection.createStatement();
    s.setMaxRows(100); // bacause of: org.postgresql.util.PSQLException: Ran out of memory retrieving query results.
    ResultSet rs = s.executeQuery("select * from " + tabName);      
    for (;;) {
        while (rs.next()) {
            i++;
            // do something...
        }
        if ((s.getMoreResults() == false) && (s.getUpdateCount() == -1)) {
            break;
        }           
    }

采纳答案by Frank Heikens

Use a CURSOR in PostgreSQLor let the JDBC-driver handle this for you.

在 PostgreSQL 中使用CURSOR让 JDBC 驱动程序为您处理

LIMIT and OFFSET will get slow when handling large datasets.

处理大型数据集时 LIMIT 和 OFFSET 会变慢。

回答by Benoit Courtine

I think your question is similar to this thread: JDBC Paginationwhich contains solutions for your need.

我认为您的问题类似于此线程:JDBC 分页,其中包含满足您需求的解决方案。

In particular, for PostgreSQL, you can use the LIMIT and OFFSET keywords in your request: http://www.petefreitag.com/item/451.cfm

特别是对于 PostgreSQL,您可以在请求中使用 LIMIT 和 OFFSET 关键字:http: //www.petefreitag.com/item/451.cfm

PS: In Java code, I suggest you to use PreparedStatement instead of simple Statements: http://download.oracle.com/javase/tutorial/jdbc/basics/prepared.html

PS:在Java代码中,我建议你使用PreparedStatement而不是简单的Statement:http: //download.oracle.com/javase/tutorial/jdbc/basics/prepared.html

回答by marioosh

I did it like below. Not the best way i think, but it works :)

我像下面那样做。不是我认为的最好方法,但它有效:)

    Connection c = DriverManager.getConnection("jdbc:postgresql://....");
    PreparedStatement s = c.prepareStatement("select * from " + tabName + " where id > ? order by id");
    s.setMaxRows(100);
    int lastId = 0;
    for (;;) {
        s.setInt(1, lastId);
        ResultSet rs = s.executeQuery();

        int lastIdBefore = lastId;
        while (rs.next()) {
            lastId = Integer.parseInt(rs.getObject(1).toString());
            // ...
        }

        if (lastIdBefore == lastId) {
            break;
        }
    }

回答by nos

The short version is, call stmt.setFetchSize(50);and conn.setAutoCommit(false);to avoid reading the entire ResultSetinto memory.

简短的版本是,调用stmt.setFetchSize(50);conn.setAutoCommit(false);避免将整个读ResultSet入内存。

Here's what the docssay:

这是文档所说的:

Getting results based on a cursor

By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.

A small number of rows are cached on the client side of the connection and when exhausted the next block of rows is retrieved by repositioning the cursor.

Note:

  • Cursor based ResultSets cannot be used in all situations. There a number of restrictions which will make the driver silently fall back to fetching the whole ResultSet at once.

  • The connection to the server must be using the V3 protocol. This is the default for (and is only supported by) server versions 7.4 and later.-

  • The Connection must not be in autocommit mode. The backend closes cursors at the end of transactions, so in autocommit mode the backend will have closed the cursor before anything can be fetched from it.-

  • The Statement must be created with a ResultSet type of ResultSet.TYPE_FORWARD_ONLY. This is the default, so no code will need to be rewritten to take advantage of this, but it also means that you cannot scroll backwards or otherwise jump around in the ResultSet.-

  • The query given must be a single statement, not multiple statements strung together with semicolons.

基于游标获取结果

默认情况下,驱动程序一次收集查询的所有结果。这对于大型数据集可能很不方便,因此 JDBC 驱动程序提供了一种方法,可以将 ResultSet 建立在数据库游标的基础上,并且只获取少量行。

少量行缓存在连接的客户端,当耗尽时,通过重新定位游标来检索下一个行块。

笔记:

  • 不能在所有情况下都使用基于游标的 ResultSet。有许多限制将使驱动程序默默地回退到一次获取整个 ResultSet。

  • 与服务器的连接必须使用 V3 协议。这是(并且仅受支持)服务器版本 7.4 及更高版本的默认设置。-

  • 连接不得处于自动提交模式。后端在事务结束时关闭游标,因此在自动提交模式下,后端将在从中获取任何内容之前关闭游标。-

  • 该语句必须使用 ResultSet.TYPE_FORWARD_ONLY 的 ResultSet 类型创建。这是默认设置,因此无需重写代码即可利用此功能,但这也意味着您无法在 ResultSet 中向后滚动或以其他方式跳转。-

  • 给出的查询必须是单个语句,而不是用分号串在一起的多个语句。

Example 5.2. Setting fetch size to turn cursors on and off.

例 5.2。设置提取大小以打开和关闭游标。

Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour).

将代码更改为游标模式就像将 Statement 的提取大小设置为适当的大小一样简单。将提取大小设置回 0 将导致所有行都被缓存(默认行为)。

// make sure autocommit is off
conn.setAutoCommit(false);
Statement st = conn.createStatement();

// Turn use of the cursor on.
st.setFetchSize(50);
ResultSet rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next()) {
   System.out.print("a row was returned.");
}
rs.close();

// Turn the cursor off.
st.setFetchSize(0);
rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next()) {
   System.out.print("many rows were returned.");
}
rs.close();

// Close the statement.
st.close();


回答by ntg

At lest in my case the problem was on the client that tries to fetch the results.

至少在我的情况下,问题出在试图获取结果的客户端上。

Wanted to get a .csv with ALL the results.

想要获得包含所有结果的 .csv。

I found the solution by using

我通过使用找到了解决方案

psql -U postgres -d dbname  -c "COPY (SELECT * FROM T) TO STDOUT WITH DELIMITER ','"

(where dbname the name of the db...) and redirecting to a file.

(其中 dbname 数据库的名称...)并重定向到一个文件。

回答by rogerdpack

So it turns out that the crux of the problem is that by default, Postgres starts in "autoCommit" mode, and also it needs/uses cursors to be able to "page" through data (ex: read the first 10K results, then the next, then the next), however cursors can only exist within a transaction. So the default is to read all rows, always, into RAM, and then allow your program to start processing "the first result row, then the second" after it has all arrived, for two reasons, it's not in a transaction (so cursors don't work), and also a fetch size hasn't been set.

因此,问题的关键在于,默认情况下,Postgres 以“自动提交”模式启动,并且它还需要/使用游标才能“翻页”数据(例如:读取前 10K 结果,然后下一个,然后下一个),但是游标只能存在于一个事务中。所以默认是总是将所有行读入 RAM,然后让您的程序在它全部到达后开始处理“第一个结果行,然后是第二个”,出于两个原因,它不在事务中(因此游标不工作),并且还没有设置提取大小。

So how the psqlcommand line tool achieves batched response (its FETCH_COUNTsetting) for queries, is to "wrap" its select queries within a short-term transaction (if a transaction isn't yet open), so that cursors can work. You can do something like that also with JDBC:

因此,psql命令行工具如何实现FETCH_COUNT查询的批处理响应(其设置),是将其选择查询“包装”在短期事务中(如果事务尚未打开),以便游标可以工作。你也可以用 JDBC 做类似的事情:

  static void readLargeQueryInChunksJdbcWay(Connection conn, String originalQuery, int fetchCount, ConsumerWithException<ResultSet, SQLException> consumer) throws SQLException {
    boolean originalAutoCommit = conn.getAutoCommit();
    if (originalAutoCommit) {
      conn.setAutoCommit(false); // start temp transaction
    }
    try (Statement statement = conn.createStatement()) {
      statement.setFetchSize(fetchCount);
      ResultSet rs = statement.executeQuery(originalQuery);
      while (rs.next()) {
        consumer.accept(rs); // or just do you work here
      }
    } finally {
      if (originalAutoCommit) {
        conn.setAutoCommit(true); // reset it, also ends (commits) temp transaction
      }
    }
  }
  @FunctionalInterface
  public interface ConsumerWithException<T, E extends Exception> {
    void accept(T t) throws E;
  }

This gives the benefit of requiring less RAM, and, in my results, seemed to run overall faster, even if you don't need to save the RAM. Weird. It also gives the benefit that your processing of the first row "starts faster" (since it process it a page at a time).

这带来了需要更少 RAM 的好处,并且在我的结果中,即使您不需要保存 RAM,总体运行速度似乎也更快。奇怪的。它还提供了您对第一行的处理“开始更快”的好处(因为它一次处理一页)。

And here's how to do it the "raw postgres cursor" way, along with full demo code, though in my experiments it seemed the JDBC way, above, was slightly faster for whatever reason.

下面是如何使用“原始 postgres 游标”方式以及完整的演示代码进行操作,尽管在我的实验中,无论出于何种原因,上面的 JDBC 方式似乎都稍快一些。

Another option would be to have autoCommitmode off, everywhere, though you still have to always manually specify a fetchSize for each new Statement (or you can set a default fetch size in the URL string).

另一种选择是在autoCommit任何地方都关闭模式,尽管您仍然必须始终为每个新语句手动指定 fetchSize (或者您可以在 URL 字符串中设置默认提取大小)。