Java 从大表中检索所有记录时如何避免 OOM(内存不足)错误?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1107732/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 23:50:36  来源:igfitidea点击:

How to avoid OOM (Out of memory) error when retrieving all records from huge table?

javasqlout-of-memorylarge-data-volumes

提问by janetsmith

I am given a task to convert a huge table to custom XML file. I will be using Java for this job.

我的任务是将一个巨大的表格转换为自定义 XML 文件。我将使用 Java 来完成这项工作。

If I simply issue a "SELECT * FROM customer", it may return huge amount of data that eventually causing OOM. I wonder, is there a way i can process the record immediately once it become available, and remove the record from memory after that during sql retrieving process?

如果我只是发出“SELECT * FROM customer”,它可能会返回大量数据,最终导致 OOM。我想知道,有没有办法在记录可用后立即处理记录,然后在 sql 检索过程中从内存中删除记录?

--- edited on 13 Jul 2009

--- 2009 年 7 月 13 日编辑

Let me elaborate my question. I have 1 db server and 1 application server. When I issue a select query in application, the data will travel from db server to app server.

让我详细说明我的问题。我有 1 个数据库服务器和 1 个应用程序服务器。当我在应用程序中发出选择查询时,数据将从数据库服务器传输到应用程序服务器。

I believe (correct me if I am wrong) ResultSet will need to wait until receiving all records in the query. Even if we set fetch size as 4, for a 1000-record table, we still end up having 1000 records in heap memory of app server, is it correct? Fetch size only affect the number of round trip from/to db server.

我相信(如果我错了,请纠正我) ResultSet 需要等到收到查询中的所有记录。即使我们将 fetch size 设置为 4,对于 1000 条记录的表,我们最终在应用服务器的堆内存中仍然有 1000 条记录,对吗?获取大小仅影响往返数据库服务器的次数。

My question is, how to start processing on that 4 (or any number) records immediately after it's arrival to app server, and dispose it to free up memory in app server?

我的问题是,如何在到达应用服务器后立即开始处理这 4 条(或任意数量)记录,并对其进行处理以释放应用服务器中的内存?

采纳答案by javamonkey79

I think you could use the same solution as this one. A scrollable resultset.

我想你可以使用相同的解决方案,这一项。可滚动的结果集。

回答by Clint

With a little more information I can get a more helpful answer.

有了更多信息,我可以获得更有帮助的答案。

If you are using MySQL:

如果您使用的是 MySQL:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
       java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

from http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.html:

来自http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.html

java.util.Properties info = new java.util.Properties();
info.put ("user", "scott");
info.put ("password","tiger");
info.put ("defaultRowPrefetch","15");
getConnection ("jdbc:oracle:oci:@",info);

回答by Dean Povey

If you are using JDBC you can use a ResultSet with a cursor which you iterate through one record at a time. You need to makes sure then that you write your XML out to a file one record at a time rather than using DOM to build the XML.

如果您使用的是 JDBC,您可以使用带有游标的 ResultSet,您可以一次遍历一条记录。然后,您需要确保将 XML 一次一条记录写入文件,而不是使用 DOM 来构建 XML。

回答by Kirtan

One rule of thumb that I've learnt from my experience is that you NEVER bring ALL the data from the database to your application server. One thing you can do is implement a procedure to page your data.

我从我的经验中学到的一条经验法则是,永远不要将数据库中的所有数据都带到应用程序服务器。您可以做的一件事是实现一个过程来分页您的数据。

You can bring one page of data containing around 1000-5000 records, process them, then again fetch the data for the next page.

您可以带上一页包含大约 1000-5000 条记录的数据,处理它们,然后再次获取下一页的数据。

回答by n002213f

At which stage is the OOM error occurring, is it on data retrieval or processing data to XML file?

OOM错误发生在哪个阶段,是数据检索还是处理数据到XML文件?

If its data retrieval, get the data in batches. Get the total number of rows first, order the selects by the primary key and limit the rows selected to chewable sizes.

如果其数据检索,则批量获取数据。首先获取总行数,按主键排序选择并将选择的行限制为可咀嚼的大小。

If its at creating the XML file, send the XML node of each customer to System.out.println, don't hold it in memory. Launch the program via commad line and redirect all output to a file;

如果是在创建 XML 文件时,将每个客户的 XML 节点发送到 System.out.println,不要将其保存在内存中。通过命令行启动程序并将所有输出重定向到一个文件;

java MyConverter > results.txt

As you loop through the record all is saved in the file.

当您遍历记录时,所有内容都保存在文件中。

回答by akarnokd

A concept for exporting the entire table. (Note to experts: I'm aware of its shortcomings.)

导出整个表的概念。(专家提示:我知道它的缺点。)

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
public class FullTableExport {
    public static String toXML(String s) {
        if (s != null) {
            StringBuilder b = new StringBuilder(s.length());
            for (int i = 0, count = s.length(); i < count; i++) {
                char c = s.charAt(i);
                switch (c) {
                case '<':
                    b.append("&lt;");
                    break;
                case '>':
                    b.append("&gt;");
                    break;
                case '\'':
                    b.append("&#39;");
                    break;
                case '"':
                    b.append("&quot;");
                    break;
                case '&':
                    b.append("&amp;");
                    break;
                default:
                    b.append(c);
                }
            }
            return b.toString();
        }
        return "";
    }
    public static void main(String[] args) throws Exception {
        String table = "CUSTOMER";
        int batch = 100;

        Class.forName("oracle.jdbc.driver.OracleDriver");
        Connection conn = DriverManager.getConnection(
            "jdbc:oracle:thin:@server:orcl", "user", "pass");
        PreparedStatement pstmt = conn.prepareStatement(
            "SELECT /*+FIRST_ROWS(" + batch + ") */ * FROM " + table);
        ResultSet rs = pstmt.executeQuery();
        rs.setFetchSize(batch);
        ResultSetMetaData rsm = rs.getMetaData();
        File output = new File("result.xml");
        PrintWriter out = new PrintWriter(new BufferedWriter(
            new OutputStreamWriter(
            new FileOutputStream(output), "UTF-8")), false);
        out.printf("<?xml version='1.0' encoding='UTF-8'?>%n");
        out.printf("<table name='%s'>%n", toXML(table));
        int j = 1;
        while (rs.next()) {
            out.printf("\t<row id='%d'>%n", j++);
            for (int i = 1; i <= rsm.getColumnCount(); i++) {
                out.printf("\t\t<col name='%s'>%s</col>%n", 
                    toXML(rsm.getColumnName(i)), 
                    toXML(rs.getString(i)));
            }
            out.printf("\t</row>%n");
        }
        out.printf("</table>%n", table);
        out.flush();
    }
}

EditThe shortcomings (thanks @J.S.):

编辑缺点(感谢@JS):

  • No external libraries used beyond the ojdbc
  • Nothing is closed
  • A generic Exception is thrown
  • It is a main method
  • Usage of print for XML generation
  • Oracle specific SQL
  • Plain text password
  • Some columns look awkward in string representation
  • UTF-8 is too international
  • XML structure footprint is large
  • 除了 ojdbc 之外没有使用外部库
  • 什么都没有关闭
  • 抛出一个通用异常
  • 这是一个主要的方法
  • 使用打印生成 XML
  • Oracle 特定的 SQL
  • 纯文本密码
  • 某些列在字符串表示中看起来很尴尬
  • UTF-8 太国际化了
  • XML 结构占用空间大