Java EXcel Sheet POI 验证:内存不足错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18147585/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
EXcel Sheet POI Validation : Out Of Memory Error
提问by Abhishek Singh
I am trying to validate an excel file using java before dumping it to database.
我正在尝试在将其转储到数据库之前使用 java 验证 excel 文件。
Here is my code snippet which causes error.
这是我的代码片段,它导致错误。
try {
fis = new FileInputStream(file);
wb = new XSSFWorkbook(fis);
XSSFSheet sh = wb.getSheet("Sheet1");
for(int i = 0 ; i < 44 ; i++){
XSSFCell a1 = sh.getRow(1).getCell(i);
printXSSFCellType(a1);
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Here is the error which i get
这是我得到的错误
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
at java.util.ArrayList.<init>(Unknown Source)
at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:78)
at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:75)
at org.apache.xmlbeans.impl.values.NamespaceContext.getNamespaceContextStack(NamespaceContext.java:98)
at org.apache.xmlbeans.impl.values.NamespaceContext.push(NamespaceContext.java:106)
at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1273)
at org.apache.xmlbeans.impl.values.XmlObjectBase.stringValue(XmlObjectBase.java:1484)
at org.apache.xmlbeans.impl.values.XmlObjectBase.getStringValue(XmlObjectBase.java:1492)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl.getR(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFCell.<init>(XSSFCell.java:105)
at org.apache.poi.xssf.usermodel.XSSFRow.<init>(XSSFRow.java:70)
at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:179)
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:143)
at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:130)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:286)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:207)
at com.xls.validate.ExcelValidator.main(ExcelValidator.java:79)
This works perfectly fine when xlsx file is less than 1 MB.
当 xlsx 文件小于 1 MB 时,这工作得很好。
I understand this is because my xlsx file is around 5-10 MB and POI tries to load the entire sheet at once in JVM Memory
我明白这是因为我的 xlsx 文件大约 5-10 MB 并且 POI 尝试在 JVM 内存中一次加载整个工作表
What can be a possible workaround?
什么是可能的解决方法?
Please help.
请帮忙。
Thanks in Advance!
提前致谢!
采纳答案by Gagravarr
There are two options available to you. Option #1 - increase the size of your JVM Heap, so that Java has more memory available to it. Processing Excel files in POI using the UserModel code is DOM based, so the whole file (including parsed form) needs to be buffered into memory. Try a question like this onefor advice on how to increase the help.
有两种选择可供您选择。选项 #1 - 增加 JVM 堆的大小,以便 Java 有更多可用内存。使用 UserModel 代码处理 POI 中的 Excel 文件是基于 DOM 的,因此需要将整个文件(包括解析的表单)缓存到内存中。尝试类似这样的问题以获取有关如何增加帮助的建议。
Option #2, which is more work - switch to event based (SAX) processing. This only processes part of the file at a time, so needs much much less memory. However, it requires more work from you, which is why you might be better throwing a few more GB of memory at the problem - memory is cheap while programmers aren't! The SpreadSheet howto pagehas instructions on how to do SAX parsing of .xlsx files, and there are various example files provided by POIyou can look at for advice.
选项 #2,这是更多的工作 - 切换到基于事件 (SAX) 处理。这一次只处理文件的一部分,因此需要的内存要少得多。然而,它需要你做更多的工作,这就是为什么你最好在这个问题上多投入几 GB 的内存——内存很便宜,而程序员不是!该电子表格HOWTO页有关于如何做的的.xlsx文件的SAX解析说明,并有通过POI提供的各种示例文件,你可以看看的意见。
.
.
Also, another thing - you seem to be loading a File via a stream, which is bad as it means even more stuff needs buffering into memory. See the POI Documentation for more on this, including instructions on how to work with the File directly.
此外,另一件事 - 您似乎正在通过流加载文件,这很糟糕,因为这意味着更多的东西需要缓冲到内存中。有关更多信息,请参阅POI 文档,包括有关如何直接使用文件的说明。
回答by Abhishek Singh
Well, here's a link with some detailed info about your error, and how to fix it: http://javarevisited.blogspot.com/2011/09/javalangoutofmemoryerror-permgen-space.html?m=1.
好吧,这里有一个链接,其中包含有关您的错误的一些详细信息以及如何修复它:http: //javarevisited.blogspot.com/2011/09/javalangoutofmemoryerror-permgen-space.html?m=1。
Well, let me try to explain your error:
好吧,让我试着解释你的错误:
The java.lang.OutOfMemoryError
has two variants. One in the Java Heap Space, and the other in PermGen Space.
将java.lang.OutOfMemoryError
有两个变种。一个在 Java 堆空间中,另一个在永久代空间中。
Your error could be caused by a memory leak, a low amount of system RAM, or very little RAM allocated to the Java Virtual Machine.
您的错误可能是由内存泄漏、系统 RAM 量过少或分配给 Java 虚拟机的 RAM 太少引起的。
The difference between the Java Heap Space and PermGen Space variants is that PermGen Space stores pools of Strings and data on the primitive types, such as int, as well as how to read methods and classes, the Java Heap Space works differently. So if you have a lot of strings or classes in your project, and not enough allocated/system RAM, you will get an OutOfMemoryError. The default amount of RAM the JVM allocates to PermGen is 64 MB, which is quite a small bit of memory space. The linked article explains much more about this error and provides detailed information about how to fix this.
Java Heap Space 和 PermGen Space 变体之间的区别在于 PermGen Space 存储字符串池和基本类型(例如 int)的数据,以及如何读取方法和类,Java Heap Space 的工作方式不同。因此,如果您的项目中有很多字符串或类,而分配的/系统 RAM 不足,您将收到 OutOfMemoryError。JVM 分配给 PermGen 的默认 RAM 量为 64 MB,这是相当小的内存空间。链接的文章解释了有关此错误的更多信息,并提供了有关如何解决此问题的详细信息。
Hope this helps!
希望这可以帮助!
回答by Luca Basso Ricci
The event API is newer than the User API. It is intended for intermediate developers who are willing to learn a little bit of the low level API structures. Its relatively simple to use, but requires a basic understanding of the parts of an Excel file (or willingness to learn). The advantage provided is that you can read an XLS witha relatively small memory footprint.
事件 API 比用户 API 更新。它适用于愿意学习一些低级 API 结构的中级开发人员。它使用起来相对简单,但需要对 Excel 文件的各个部分有基本的了解(或愿意学习)。提供的优点是您可以读取占用内存相对较小的 XLS。
回答by Meer Nasirudeen
I too faced the same issue of OOM while parsing xlsx file...after two days of struggle, I finally found out the below code that was really perfect;
我在解析xlsx文件时也遇到了同样的OOM问题......经过两天的挣扎,我终于找到了下面的代码,非常完美;
This code is based on sjxlsx. It reads the xlsx and stores in a HSSF sheet.
此代码基于 sjxlsx。它读取 xlsx 并存储在 HSSF 表中。
[code=java]
// read the xlsx file
SimpleXLSXWorkbook = new SimpleXLSXWorkbook(new File("C:/test.xlsx"));
HSSFWorkbook hsfWorkbook = new HSSFWorkbook();
org.apache.poi.ss.usermodel.Sheet hsfSheet = hsfWorkbook.createSheet();
Sheet sheetToRead = workbook.getSheet(0, false);
SheetRowReader reader = sheetToRead.newReader();
Cell[] row;
int rowPos = 0;
while ((row = reader.readRow()) != null) {
org.apache.poi.ss.usermodel.Row hfsRow = hsfSheet.createRow(rowPos);
int cellPos = 0;
for (Cell cell : row) {
if(cell != null){
org.apache.poi.ss.usermodel.Cell hfsCell = hfsRow.createCell(cellPos);
hfsCell.setCellType(org.apache.poi.ss.usermodel.Cell.CELL_TYPE_STRING);
hfsCell.setCellValue(cell.getValue());
}
cellPos++;
}
rowPos++;
}
return hsfSheet;[/code]
回答by Gautam Tadigoppula
You can use SXSSF workbook from POI for memory related issues. Refer here
您可以使用 POI 中的 SXSSF 工作簿来解决与内存相关的问题。参考这里
I faced the similar issue while reading and merging multiple CSVs into a single XLSX file. I had a total of 3 csv sheets each with 30k rows totalling to 90k.
我在读取多个 CSV 并将其合并到一个 XLSX 文件中时遇到了类似的问题。我总共有 3 个 csv 表,每个表有 30k 行,总计 90k。
It got resolved by using SXSFF as below,
它通过使用 SXSFF 得到解决,如下所示,
public static void mergeCSVsToXLSX(Long jobExecutionId, Map<String, String> csvSheetNameAndFile, String xlsxFile) {
try (SXSSFWorkbook wb = new SXSSFWorkbook(100);) { // keep 100 rows in memory, exceeding rows will be flushed to
// disk
csvSheetNameAndFile.forEach((sheetName, csv) -> {
try (CSVReader reader = new CSVReader(new FileReader(csv))) {
wb.setCompressTempFiles(true);
SXSSFSheet sheet = wb.createSheet(sheetName);
sheet.setRandomAccessWindowSize(100);
String[] nextLine;
int r = 0;
while ((nextLine = reader.readNext()) != null) {
Row row = sheet.createRow((short) r++);
for (int i = 0; i < nextLine.length; i++) {
Cell cell = row.createCell(i);
cell.setCellValue(nextLine[i]);
}
}
} catch (IOException ioException) {
logger.error("Error in reading CSV file {} for jobId {} with exception {}", csv, jobExecutionId,
ioException.getMessage());
}
});
FileOutputStream out = new FileOutputStream(xlsxFile);
wb.write(out);
wb.dispose();
} catch (IOException ioException) {
logger.error("Error in creating workbook for jobId {} with exception {}", jobExecutionId,
ioException.getMessage());
}
}