php PHPExcel 耗尽了 256、512 和 1024MB 的 RAM

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4817651/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 14:29:26  来源:igfitidea点击:

PHPExcel runs out of 256, 512 and also 1024MB of RAM

phpphpexcel

提问by Richard Knop

I don't understand it. The XSLX table is about 3MB large yet even 1024MB of RAM is not enough for PHPExcel to load it into memory?

我不明白。XSLX 表大约有 3MB 大,但即使是 1024MB 的 RAM 也不足以让 PHPExcel 将其加载到内存中?

I might be doing something horribly wrong here:

我可能在这里做错了什么:

function ReadXlsxTableIntoArray($theFilePath)
{
    require_once('PHPExcel/Classes/PHPExcel.php');
    $inputFileType = 'Excel2007';
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);
    $objReader->setReadDataOnly(true);
    $objPHPExcel = $objReader->load($theFilePath);
    $rowIterator = $objPHPExcel->getActiveSheet()->getRowIterator();
    $arrayData = $arrayOriginalColumnNames = $arrayColumnNames = array();
    foreach($rowIterator as $row){
        $cellIterator = $row->getCellIterator();
        $cellIterator->setIterateOnlyExistingCells(false); // Loop all cells, even if it is not set
        if(1 == $row->getRowIndex ()) {
            foreach ($cellIterator as $cell) {
                $value = $cell->getCalculatedValue();
                $arrayOriginalColumnNames[] = $value;
                // let's remove the diacritique
                $value = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $value);
                // and white spaces
                $valueExploded = explode(' ', $value);
                $value = '';
                // capitalize the first letter of each word
                foreach ($valueExploded as $word) {
                    $value .= ucfirst($word);
                }
                $arrayColumnNames[] = $value;
            }
            continue;
        } else {
            $rowIndex = $row->getRowIndex();
            reset($arrayColumnNames);
            foreach ($cellIterator as $cell) {
                $arrayData[$rowIndex][current($arrayColumnNames)] = $cell->getCalculatedValue();
                next($arrayColumnNames);
            }
        }
    }
    return array($arrayOriginalColumnNames, $arrayColumnNames, $arrayData);
}

The function above reads data from an excel table to an array.

上面的函数从一个excel表中读取数据到一个数组中。

Any suggestions?

有什么建议?

At first, I allowed PHP to use 256MB of RAM. It was not enough. I then doubled the amount and then also tried 1024MB. It still runs out of memory with this error:

起初,我允许 PHP 使用 256MB 的 RAM。这还不够。然后我将数量翻了一番,然后还尝试了 1024MB。它仍然会因此错误而耗尽内存:

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 50331648 bytes) in D:\data\o\WebLibThirdParty\src\PHPExcel\Classes\PHPExcel\Reader\Excel2007.php on line 688

Fatal error (shutdown): Allowed memory size of 1073741824 bytes exhausted (tried to allocate 50331648 bytes) in D:\data\o\WebLibThirdParty\src\PHPExcel\Classes\PHPExcel\Reader\Excel2007.php on line 688

回答by Mark Baker

There's plenty been written about the memory usage of PHPExcel on the PHPExcel forum; so reading through some of those previous discussions might give you a few ideas. PHPExcel holds an "in memory" representation of a spreadsheet, and is susceptible to PHP memory limitations.

PHPExcel 论坛上有很多关于 PHPExcel 内存使用的文章;所以通读之前的一些讨论可能会给你一些想法。PHPExcel 保存电子表格的“内存”表示,并且容易受到 PHP 内存限制的影响。

The physical size of the file is largely irrelevant... it's much more important to know how many cells (rows*columns on each worksheet) it contains.

文件的物理大小在很大程度上无关紧要......知道它包含多少单元格(每个工作表上的行*列)更为重要。

The "rule of thumb" that I've always used is an average of about 1k/cell, so a 5M cell workbook is going to require 5GB of memory. However, there are a number of ways that you can reduce that requirement. These can be combined, depending on exactly what information you need to access within your workbook, and what you want to do with it.

我一直使用的“经验法则”平均约为 1k/cell,因此 5M 单元工作簿将需要 5GB 内存。但是,您可以通过多种方式降低该要求。这些可以组合,具体取决于您需要在工作簿中访问哪些信息,以及您想用它做什么。

If you have multiple worksheets, but don't need to load all of them, then you can limit the worksheets that the Reader will load using the setLoadSheetsOnly() method. To load a single named worksheet:

如果您有多个工作表,但不需要加载所有工作表,则可以使用 setLoadSheetsOnly() 方法限制 Reader 将加载的工作表。要加载单个命名工作表:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example1.xls';
$sheetname = 'Data Sheet #2'; 
/**  Create a new Reader of the type defined in $inputFileType  **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/**  Advise the Reader of which WorkSheets we want to load  **/ 
$objReader->setLoadSheetsOnly($sheetname); 
/**  Load $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);

Or you can specify several worksheets with one call to setLoadSheetsOnly() by passing an array of names:

或者,您可以通过传递一组名称,通过一次调用 setLoadSheetsOnly() 来指定多个工作表:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example1.xls';
$sheetnames = array('Data Sheet #1','Data Sheet #3'); 
/** Create a new Reader of the type defined in $inputFileType **/ 
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader of which WorkSheets we want to load **/ 
$objReader->setLoadSheetsOnly($sheetnames); 
/**  Load $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);

If you only need to access part of a worksheet, then you can define a Read Filter to identify just which cells you actually want to load:

如果您只需要访问工作表的一部分,那么您可以定义一个读取过滤器来确定您实际要加载的单元格:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example1.xls';
$sheetname = 'Data Sheet #3'; 

/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */ 
class MyReadFilter implements PHPExcel_Reader_IReadFilter {
    public function readCell($column, $row, $worksheetName = '') {
        //  Read rows 1 to 7 and columns A to E only 
        if ($row >= 1 && $row <= 7) {
           if (in_array($column,range('A','E'))) { 
              return true;
           }
        } 
        return false;
    }
}

/**  Create an Instance of our Read Filter  **/ 
$filterSubset = new MyReadFilter(); 
/** Create a new Reader of the type defined in $inputFileType **/ 
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/**  Advise the Reader of which WorkSheets we want to load 
     It's more efficient to limit sheet loading in this manner rather than coding it into a Read Filter  **/ 
$objReader->setLoadSheetsOnly($sheetname); 
echo 'Loading Sheet using filter';
/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/ 
$objReader->setReadFilter($filterSubset); 
/**  Load only the rows and columns that match our filter from $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);

Using read filters, you can also read a workbook in "chunks", so that only a single chunk is memory-resident at any one time:

使用读取过滤器,您还可以读取“块”中的工作簿,以便在任何时候都只有一个块驻留在内存中:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example2.xls';

/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */ 
class chunkReadFilter implements PHPExcel_Reader_IReadFilter {
    private $_startRow = 0;
    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */ 
    public function setRows($startRow, $chunkSize) { 
        $this->_startRow    = $startRow; 
        $this->_endRow      = $startRow + $chunkSize;
    } 

    public function readCell($column, $row, $worksheetName = '') {
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow 
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) { 
           return true;
        }
        return false;
    } 
}

/**  Create a new Reader of the type defined in $inputFileType  **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/**  Define how many rows we want to read for each "chunk"  **/ 
$chunkSize = 20;
/**  Create a new Instance of our Read Filter  **/ 
$chunkFilter = new chunkReadFilter(); 
/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/ 
$objReader->setReadFilter($chunkFilter); 

/**  Loop to read our worksheet in "chunk size" blocks  **/ 
/**  $startRow is set to 2 initially because we always read the headings in row #1  **/
for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) { 
    /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/ 
    $chunkFilter->setRows($startRow,$chunkSize); 
    /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/ 
    $objPHPExcel = $objReader->load($inputFileName); 
    //    Do some processing here 

    //    Free up some of the memory 
    $objPHPExcel->disconnectWorksheets(); 
    unset($objPHPExcel); 
}

If you don't need to load formatting information, but only the worksheet data, then the setReadDataOnly() method will tell the reader only to load cell values, ignoring any cell formatting:

如果您不需要加载格式信息,而只需要加载工作表数据,那么 setReadDataOnly() 方法将告诉读取器仅加载单元格值,而忽略任何单元格格式:

$inputFileType = 'Excel5';
$inputFileName = './sampleData/example1.xls';
/** Create a new Reader of the type defined in $inputFileType **/ 
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader that we only want to load cell data, not formatting **/ 
$objReader->setReadDataOnly(true);
/**  Load $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);

Use cell caching. This is a method for reducing the PHP memory that is required for each cell, but at a cost in speed. It works by storing the cell objects in a compressed format, or outside of PHP's memory (eg. disk, APC, memcache)... but the more memory you save, the slower your scripts will execute. You can, however, reduce the memory required by each cell to about 300bytes, so the hypothetical 5M cells would require about 1.4GB of PHP memory.

使用单元格缓存。这是一种减少每个单元所需的 PHP 内存的方法,但以速度为代价。它的工作原理是以压缩格式存储单元格对象,或者存储在 PHP 内存之外(例如磁盘、APC、内存缓存)……但是您节省的内存越多,脚本执行的速度就越慢。但是,您可以将每个单元所需的内存减少到大约 300 字节,因此假设的 5M 单元将需要大约 1.4GB 的 PHP 内存。

Cell caching is described in section 4.2.1 of the Developer Documentation

单元缓存在开发人员文档的第 4.2.1 节中描述

EDIT

编辑

Looking at your code, you're using the iterators, which aren't particularly efficient, and building up an array of cell data. You might want to look at the toArray() method, which is already built into PHPExcel, and does this for you. Also take a look at this recent discussionon SO about the new variant method rangeToArray() to build an associative array of row data.

查看您的代码,您正在使用效率不高的迭代器,并构建了一组单元格数据。您可能想查看已内置到 PHPExcel 中的 toArray() 方法,并为您执行此操作。另请查看最近关于 SO 的关于新变体方法 rangeToArray() 的讨论,以构建行数据的关联数组。

回答by Adrien

I had the same memory issue problem with PHPExcel and actually all the other libraries. Reading the data in chunks, as Mark Baker suggested could fix the issue (caching works too), but it turned out that the memory issue became a time issue. The reading and writing time was exponential so for large spreadsheets, it was not a good fit.

我在 PHPExcel 和实际上所有其他库中遇到了相同的内存问题。以块的形式读取数据,正如 Mark Ba​​ker 建议的那样可以解决问题(缓存也可以),但结果证明内存问题变成了时间问题。读取和写入时间呈指数级增长,因此对于大型电子表格来说,它不太适合。

PHPExcel and others are not meant to handle large files so I created a library that solves this problem. You can check it out here: https://github.com/box/spout

PHPExcel 和其他人不打算处理大文件,所以我创建了一个解决这个问题的库。你可以在这里查看:https: //github.com/box/spout

Hope that helps!

希望有帮助!

回答by pancy1

There are plenty of measures you can take to reserve less memory when working with PHPExcel. I recommend you to take the following actions to optimize memory usage before modifying your server's memory limit in Apache.

在使用 PHPExcel 时,您可以采取很多措施来减少内存占用。我建议您在修改 Apache 中服务器的内存限制之前采取以下措施来优化内存使用。

/* Use the setReadDataOnly(true);*/
    $objReader->setReadDataOnly(true);

/*Load only Specific Sheets*/
    $objReader->setLoadSheetsOnly( array("1", "6", "6-1", "6-2", "6-3", "6-4", "6-5", "6-6", "6-7", "6-8") );

/*Free memory when you are done with a file*/
$objPHPExcel->disconnectWorksheets();
   unset($objPHPExcel);

Avoid using very large Exel files, remember it is the file size that makes the process run slowly and crash.

避免使用非常大的 Exel 文件,记住是文件大小导致进程运行缓慢和崩溃。

Avoid using the getCalculatedValue(); function when reading cells.

避免使用 getCalculatedValue(); 读取单元格时的功能。

回答by osm

Ypu can try PHP Excel http://ilia.ws/archives/237-PHP-Excel-Extension-0.9.1.htmlIts an C extension for php and its very fast. (Also uses less memory than PHP implementations)

Ypu 可以尝试 PHP Excel http://ilia.ws/archives/237-PHP-Excel-Extension-0.9.1.html它是 php 的 C 扩展并且速度非常快。(也比 PHP 实现使用更少的内存)

回答by Robert

In my case, phpexcel always iterated through 19999 rows. no matter, how many rows actually were filled. So 100 rows of data always ended up in a memory error.

就我而言,phpexcel 总是迭代 19999 行。不管,实际填满了多少行。所以 100 行数据总是以内存错误告终。

Perhaps you just have to check, if the cells in the current row are empty and then "continue" oder break the loop, that iterates the rows.

也许您只需要检查当前行中的单元格是否为空,然后“继续”或打破循环,迭代行。

回答by bazinac

Just reposting my post from another thread. It describes different approach to serverside generating or editing of Excel spreadsheets that should be taken in account. For large amount of data I would not recommend tools like PHPExcel or ApachePOI (for Java) because of their memory requirements. There is another quite convenient (although maybe little bit fiddly) way to inject data into spreadsheets. Serverside generation or updating of Excel spreadsheets can be achieved thus simple XML editing. You can have XLSX spreadsheet sitting on the server and every time data is gathered from dB, you unzip it using php. Then you access specific XML files that are holding contents of worksheets that need to be injected and insert data manually. Afterwards, you compress spreadsheet folder in order to distribute it as an regular XLSX file. Whole process is quite fast and reliable. Obviously, there are few issues and glitches related to inner organisation of XLSX/Open XML file (e. g. Excel tend to store all strings in separate table and use references to this table in worksheet files). But when injecting only data like numbers and strings, it is not that hard. If anyone is interested, I can provide some code.

只是从另一个线程重新发布我的帖子。它描述了应该考虑的服务器端生成或编辑 Excel 电子表格的不同方法。对于大量数据,我不推荐像 PHPExcel 或 ApachePOI(用于 Java)这样的工具,因为它们需要内存。还有另一种非常方便(虽然可能有点繁琐)将数据注入电子表格的方法。通过简单的 XML 编辑,可以实现 Excel 电子表格的服务器端生成或更新。您可以将 XLSX 电子表格放在服务器上,每次从 dB 收集数据时,您都可以使用 php 解压缩它。然后,您访问包含需要手动注入和插入数据的工作表内容的特定 XML 文件。之后,您压缩电子表格文件夹以将其作为常规 XLSX 文件分发。整个过程相当快速和可靠。显然,与 XLSX/Open XML 文件的内部组织相关的问题和故障很少(例如,Excel 倾向于将所有字符串存储在单独的表中,并在工作表文件中使用对该表的引用)。但是当只注入像数字和字符串这样的数据时,这并不难。如果有人感兴趣,我可以提供一些代码。

回答by Edward

I ran into this problem and unfortunately none of the suggested solutions could help me. I need the functionality that PHPExcel provides (formulas, conditional styling, etc) so using a different library was not an option.

我遇到了这个问题,不幸的是,没有一个建议的解决方案可以帮助我。我需要 PHPExcel 提供的功能(公式、条件样式等),因此不能选择使用不同的库。

What I eventually did was writing each worksheet to an individual (temporary) file, and then combining these separate files with some special software I wrote. This reduced my memory consumption from >512 Mb to well under 100 Mb. See https://github.com/infostreams/excel-mergeif you have the same problem.

我最终做的是将每个工作表写入一个单独的(临时)文件,然后将这些单独的文件与我编写的一些特殊软件结合起来。这将我的内存消耗从 > 512 Mb 减少到远低于 100 Mb。如果您有同样的问题,请参阅https://github.com/infostreams/excel-merge