java 如何读取大于 40MB 的 XLSX 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11345146/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 04:42:45  来源:igfitidea点击:

How to read XLSX file of size >40MB

javaout-of-memoryxlsx

提问by Avinash

I am using XSSFof apache-POIto read the XLSX file. I was getting an error java.lang.OutOfMemoryError: Java heap space. Later, increased the heap size using -Xmx1024mfor the java class still the same error repeats.

我正在使用XSSFofapache-POI来读取 XLSX 文件。我收到一个错误java.lang.OutOfMemoryError: Java heap space。后来,使用-Xmx1024mjava 类增加堆大小仍然重复相同的错误。

Code:

代码:

String filename = "D:\filename.xlsx";
FileInputStream fis = null;
try {
   fis = new FileInputStream(filename);
   XSSFWorkbook workbook = new XSSFWorkbook(fis);

In the above code segment, the execution stops at XSSFWorkbookand throws the specified error. Can someone suggest better approach to read large XLSX files.

在上面的代码段中,执行停止XSSFWorkbook并抛出指定的错误。有人可以建议更好的方法来读取大型 XLSX 文件。

回答by waxwing

POI allows you to read excel files in a streaming manner. The API is pretty much a wrapper around SAX. Make sure you open the OPC package in the correct way, using the constructor that takes a String. Otherwise you could run out of memory immediately.

POI 允许您以流式方式读取 excel 文件。API 几乎是 SAX 的包装器。确保您以正确的方式打开 OPC 包,使用接受字符串的构造函数。否则,您可能会立即耗尽内存。

OPCPackage pkg = OPCPackage.open(file.getPath());
XSSFReader reader = new XSSFReader(pkg);

Now, reader will allow you to get InputStreamsfor the different parts. If you want to do the XML parsing yourself (using SAX or StAX), you can use these. But it requires being very familiar with the format.

现在,读者将允许您获得InputStreams不同的部分。如果您想自己进行 XML 解析(使用 SAX 或 StAX),您可以使用这些。但这需要非常熟悉格式。

An easier option is to use XSSFSheetXMLHandler. Here is an example that reads the first sheet:

一个更简单的选择是使用XSSFSheetXMLHandler。这是读取第一张纸的示例:

StylesTable styles = reader.getStylesTable();
ReadOnlySharedStringsTable sharedStrings = new ReadOnlySharedStringsTable(pkg);
ContentHandler handler = new XSSFSheetXMLHandler(styles, sharedStrings, mySheetContentsHandler, true);

XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(handler);
parser.parse(new InputSource(reader.getSheetsData().next()));

Where mySheetsContentHandler should be your own implementation of XSSFSheetXMLHandler.SheetContentsHandler. This class will be fed rows and cells.

其中 mySheetsContentHandler 应该是您自己的XSSFSheetXMLHandler.SheetContentsHandler实现。此类将被馈送行和单元格。

Note however that this can be moderately memory consuming if your shared strings table is huge (which happens if you don't have any duplicate strings in your huge sheets). If memory is still a problem, I recommend using the raw XML streams (also provided by XSSFReader).

但是请注意,如果您的共享字符串表很大(如果您的大表中没有任何重复的字符串,则会发生这种情况),这可能会适度消耗内存。如果内存仍然存在问题,我建议使用原始 XML 流(也由 XSSFReader 提供)。