java 如何使用java读取excel文件(xlsx)的大量数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33887699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 22:14:33  来源:igfitidea点击:

how to read a large data of excel file (xlsx) using java

javaexcel

提问by Raghul Varun

This coding is able to read the small data of excel file... but not reading the large data files in excel files.... how to modify the code further?

这个编码可以读取excel文件的小数据...但不能读取excel文件中的大数据文件....如何进一步修改代码?

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.sql.SQLException;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

/**
 *
 * @author Administrator
 */
public class ReadExcelNdArray {


    public static void main(String[] args) throws Exception {
        long start = System.currentTimeMillis();

        System.out.println("Time taken: " + (System.currentTimeMillis() - start) + " ms");

       File myFile = new File("D://Raghulpr/Transaction Data.xlsx");
            FileInputStream fis = new FileInputStream(myFile);

            // Finds the workbook instance for XLSX file
            XSSFWorkbook myWorkBook = new XSSFWorkbook (fis);

            // Return first sheet from the XLSX workbook
            XSSFSheet mySheet = myWorkBook.getSheetAt(0);

            // Get iterator to all the rows in current sheet
            Iterator<Row> rowIterator = mySheet.iterator();

            // Traversing over each row of XLSX file
            while (rowIterator.hasNext()) {
                Row row = rowIterator.next();

                // For each row, iterate through each columns
                Iterator<Cell> cellIterator = row.cellIterator();
                while (cellIterator.hasNext()) {

                    Cell cell = cellIterator.next();

                    switch (cell.getCellType()) {
                    case Cell.CELL_TYPE_STRING:
                        System.out.print(cell.getStringCellValue() + "\t");
                        break;
                    case Cell.CELL_TYPE_NUMERIC:
                        System.out.print(cell.getNumericCellValue() + "\t");
                        break;
                    case Cell.CELL_TYPE_BOOLEAN:
                        System.out.print(cell.getBooleanCellValue() + "\t");
                        break;
                    default :

                    }
                }
                System.out.println("");
            }
    }      
}

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

线程“main”中的异常 java.lang.OutOfMemoryError: Java heap space

at java.io.ByteArrayOutputStream.<init>(ByteArrayOutputStream.java:77)
at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:121)
at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:55)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:88)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:272)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:37)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:254)
at readexcelndarray.ReadExcelNdArray.main(ReadExcelNdArray.java:36)

回答by Harmeet Singh Taara

Firstly you need to close all Input - outputstream object like FileInputStreametc in your code. Secondly, you can also increase your JVM heap space as mention in this link: Increase heap size in Java

首先,您需要关闭代码中的所有Input - output流对象,例如FileInputStreametc。其次,您还可以增加您的 JVM 堆空间,如此链接中所述:Increase heap size in Java

回答by aatif

I don't know if you still need answer to this, but I was also searching for the same and was struggling to read a large file . After spending a lot of time all over the internet I found one solution to this . You can check Excel streaming reader

我不知道你是否还需要回答这个问题,但我也在寻找同样的问题,并且正在努力阅读一个大文件。在互联网上花了很多时间后,我找到了一个解决方案。您可以查看 Excel 流式阅读器

import com.monitorjbl.xlsx.StreamingReader;
InputStream is = new FileInputStream(new File("G:\Book1.xlsx"));
    Workbook  workbook = StreamingReader.builder()
            .rowCacheSize(100)    
            .bufferSize(4096)     
            .open(is);            

Now you can use workbook to process your file further .

现在您可以使用工作簿进一步处理您的文件。

I was able to process xlsx file having more than 4 lac records .

我能够处理超过 4 个 lac 记录的 xlsx 文件。

回答by harshavmb

We have jxl api for reading, writing excel files. The problem with this api is at the max you can read and write 65535 rows while starting row is indexed at 0. But it's really flexible.

我们有 jxl api 用于读取、写入 excel 文件。这个 api 的问题是,当起始行索引为 0 时,您最多可以读写 65535 行。但它真的很灵活。

Since, number of rows are more than 65535 in your case, I would suggest you to prefer Apache POI. Virtually, there is no limit for this api.

由于在您的情况下行数超过 65535,我建议您更喜欢 Apache POI。实际上,这个 api 没有限制。

回答by Viktor Mellgren

I've had the same problem, if you change to the much lower level SAX parsing instead you will save a lot of memory. http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api

我遇到了同样的问题,如果您改为使用更低级别的 SAX 解析,您将节省大量内存。http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api

I think I reduced about 4.5 GB(!) memory usage (about 11MB file with a lot of formulas) down to something more manageable (don't remember exactly, but it was so low it didn't matter anymore, at least reduced by a factor of 10).

我想我减少了大约 4.5 GB(!) 内存使用量(大约 11MB 文件,有很多公式)到更易于管理的东西(不记得了,但它太低了,不再重要,至少减少了10 倍)。

Harder to implement but worth the time if you need to reduce memory footprint

如果您需要减少内存占用,则难以实施但值得花时间

回答by Abhishek Dalakoti

You need to increase the heap size so as to read the large files.I suggest using 64bit machine.

您需要增加堆大小才能读取大文件。我建议使用 64 位机器。