使用java编写巨大excel文件的API

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1486120/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 13:21:05  来源:igfitidea点击:

API to write huge excel files using java

javaexcelapache-poi

提问by Jaskirat

I am looking to write to an excel (.xls MS Excel 2003 format) file programatically using Java. The excel output files may contain ~200,000 rows which I plan to split over number of sheets (64k rows per sheet, due to the excel limit).

我希望使用 Java 以编程方式写入 excel(.xls MS Excel 2003 格式)文件。excel 输出文件可能包含约 200,000 行,我计划将这些行拆分为几张纸(由于 excel 限制,每张纸 64k 行)。

I have tried using the apache POI APIs but it seems to be a memory hog due to the API object model. I am forced to add cells/sheets to the workbook object in memory and only once all data is added, I can write the workbook to a file! Here is a sample of how the apache recommends i write excel files using their API:

我曾尝试使用 apache POI API,但由于 API 对象模型,它似乎是一个内存猪。我被迫将单元格/工作表添加到内存中的工作簿对象,只有在添加所有数据后,我才能将工作簿写入文件!以下是 apache 建议我如何使用他们的 API 编写 excel 文件的示例:

Workbook wb = new HSSFWorkbook();
Sheet sheet = wb.createSheet("new sheet");

//Create a row and put some cells in it
Row row = sheet.createRow((short)0);

// Create a cell and put a value in it.
Cell cell = row.createCell(0);
cell.setCellValue(1);

// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();

Clearly, writing ~20k rows(with some 10-20 columns in each row) gives me the dreaded "java.lang.OutOfMemoryError: Java heap space".

显然,写入约 20k 行(每行约 10-20 列)给了我可怕的“java.lang.OutOfMemoryError:Java 堆空间”。

I have tried increasing JVM initial heapsize and max heap size using Xms and Xmx parameters as Xms512m and Xmx1024. Still cant write more than 150k rows to the file.

我尝试使用 Xms 和 Xmx 参数作为 Xms512m 和 Xmx1024 来增加 JVM 初始堆大小和最大堆大小。仍然无法向文件写入超过 15 万行。

I am looking for a way to stream to an excel file instead of building the entire file in memory before writing it to disk which will hopefully save a lot of memory usage. Any alternative API or solutions would be appreciated, but I am restricted to usage of java. Thanks! :)

我正在寻找一种流到 excel 文件的方法,而不是在将整个文件写入磁盘之前在内存中构建整个文件,这有望节省大量内存使用量。任何替代 API 或解决方案将不胜感激,但我仅限于使用 java。谢谢!:)

采纳答案by Aaron Digulla

All existing Java APIs try to build the whole document in RAM at once. Try to write an XML file which conforms to the new xslx file format instead. To get you started, I suggest to build a small file in the desired form in Excel and save it. Then open it and examine the structure and replace the parts you want.

所有现有的 Java API 都尝试在 RAM 中一次构建整个文档。尝试编写一个符合新 xslx 文件格式的 XML 文件。为了让您开始,我建议在 Excel 中以所需格式构建一个小文件并保存。然后打开它并检查结构并更换您想要的零件。

Wikipedia has a good article about the overall format.

维基百科有一篇关于整体格式好文章

回答by i need help

Is this memory issue happen when you insert data into cell, or when you perform data computation/generation?

当您将数据插入单元格或执行数据计算/生成时,是否会发生此内存问题?

If you are going to load files into an excel that consist of predefined static template format, then better to save a template and reuse multiple time. Normally template cases happen when you are going to generate daily sales report or etc...

如果您要将文件加载到包含预定义静态模板格式的 Excel 中,那么最好保存模板并多次重复使用。通常模板案例发生在您要生成每日销售报告等时...

Else, every time you need to create new row, border, column etc from scratch.

否则,每次您需要从头开始创建新的行、边框、列等时。

So far, Apache POI is the only choice I found.

到目前为止,Apache POI 是我找到的唯一选择。

"Clearly, writing ~20k rows(with some 10-20 columns in each row) gives me the dreaded "java.lang.OutOfMemoryError: Java heap space"."

“显然,写入约 20k 行(每行约 10-20 列)给了我可怕的“java.lang.OutOfMemoryError:Java 堆空间”。”

"Enterprise IT"

“企业信息化”

What YOU CAN DO is- perform batch data insertion. Create a queuetask table, everytime after generate 1 page, rest for seconds, then continue second portion. If you are worry about the dynamic data changes during your queue task, you can first get the primary key into the excel (by hiding and lock the column from user view). First run will be insert primary key, then second queue run onwards will read out from notepad and do the task portion by portion.

你可以做的是 - 执行批量数据插入。创建一个queuetask表,每次生成1页后,休息几秒钟,然后继续第二部分。如果您担心队列任务期间的动态数据更改,您可以先将主键获取到 excel 中(通过从用户视图隐藏和锁定列)。第一次运行将插入主键,然后第二个队列运行将从记事本中读出并逐部分执行任务。

回答by IAdapter

There also is JExcelApi, but its uses more memory. i think you should create .csv file and open it in excel. it allows you to pass a lot of data, but you wont be able to do any "excel magic".

还有 JExcelApi,但它使用更多的内存。我认为您应该创建 .csv 文件并在 excel 中打开它。它允许您传递大量数据,但您将无法执行任何“excel 魔术”。

回答by fvu

We did something quite similar, same amount of data, and we had to switch to JExcelapi because POI is so heavy on resources. Try JexcelApi, you won't regret it when you have to manipulate big Excel-files!

我们做了一些非常相似的事情,同样数量的数据,我们不得不切换到 JExcelapi,因为 POI 对资源的占用太大了。试试 JexcelApi,当你必须操作大型 Excel 文件时,你不会后悔的!

回答by pgras

Have a look at the HSSF serializerfrom the cocoon project.

查看cocoon 项目中的HSSF 序列化程序

The HSSF serializer catches SAX events and creates a spreadsheet in the XLS format used by Microsoft Excel

HSSF 序列化程序捕获 SAX 事件并以 Microsoft Excel 使用的 XLS 格式创建电子表格

回答by BalusC

Consider using CSV format. This way you aren't limited by memory anymore --well, maybe only during prepopulating the data for CSV, but this can be done efficiently as well, for example querying subsets of rows from DB using for example LIMIT/OFFSETand immediately write it to file instead of hauling the entire DB table contents into Java's memory before writing any line. The Excel limitation of the amount rows in one "sheet" will increase to about one million.

考虑使用 CSV 格式。这样你就不再受内存限制了——好吧,也许只是在为 CSV 预填充数据期间,但这也可以有效地完成,例如使用例如从 DB 查询行的子集LIMIT/OFFSET并立即将其写入文件在写入任何行之前将整个 DB 表内容拖入 Java 的内存。Excel 对一张“工作表”中的行数限制将增加到大约一百万。

That said, if the data is actually coming from a DB, then I would highly reconsider if Java is the right tool for this. Most decent DB's have an export-to-CSV function which can do this task undoubtely much more efficient. In case of for example MySQL, you can use the LOAD DATA INFILEcommand for this.

也就是说,如果数据实际上来自数据库,那么我会高度重新考虑 Java 是否是正确的工具。大多数体面的数据库都有一个导出到 CSV 的功能,可以毫无疑问地更有效地完成这项任务。在例如 MySQL 的情况下,您可以使用该LOAD DATA INFILE命令。

回答by Chris Dale

I had to split my files into several excel files in order to overcome the heap space exception. I figured that around 5k rows with 22 columns was about it, so I just made my logic so that every 5k row I would end the file, start a new one and just numerate the files accordingly.

为了克服堆空间异常,我不得不将我的文件拆分成几个 excel 文件。我认为大约有 22 列的 5k 行,所以我只是制定了我的逻辑,以便每 5k 行我都会结束文件,开始一个新的文件并相应地计算文件。

In the cases where I had 20k + rows to be written I would have 4+ different files representing the data.

在我有 20k + 行要写入的情况下,我将有 4 个以上不同的文件来表示数据。

回答by Serhii Bohutskyi

Try to use SXSSFworkbook, thats great thing for huge xls documents, its build document and don't eat RAM at all, becase using nio

尝试使用SXSSF工作簿,这对于巨大的 xls 文档及其构建文档来说是件好事,并且根本不占用RAM,因为使用 nio

回答by jbaliuka

We developed a java library for this purpose and currently it is available as open source project https://github.com/jbaliuka/x4j-analytic. We use it for operational reporting. We generate huge Excel files, ~200,000 should work without problems, Excel manages to open such files too. Our code uses POI to load template but generated content is streamed directly to file without XML or Object model layer in memory.

我们为此开发了一个 java 库,目前它可以作为开源项目https://github.com/jbaliuka/x4j-analytic 使用。我们将其用于运营报告。我们生成巨大的 Excel 文件,大约 200,000 个应该可以正常工作,Excel 也设法打开此类文件。我们的代码使用 POI 加载模板,但生成的内容直接流式传输到文件中,而无需内存中的 XML 或对象模型层。