php 如何使用 PHPExcel 从大型 Excel 文件 (27MB+) 中读取大型工作表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4666746/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 13:54:44  来源:igfitidea点击:

How to read large worksheets from large Excel files (27MB+) with PHPExcel?

phpphpexcel

提问by Edward Tanguay

I have large Excel worksheets that I want to be able to read into MySQL using PHPExcel.

我有大型 Excel 工作表,我希望能够使用 PHPExcel 将其读入 MySQL。

I am using the recent patchwhich allows you to read in Worksheets without opening the whole file. This way I can read one worksheet at a time.

我正在使用最近的补丁,它允许您在不打开整个文件的情况下阅读工作表。这样我一次可以阅读一张工作表。

However, one Excel file is 27MB large. I can successfully read in the first worksheet since it is small, but the second worksheet is so large that the cron job that started the process at 22:00 was not finished at 8:00 AM, the worksheet is simple too big.

但是,一个 Excel 文件有 27MB 大。我可以成功读取第一个工作表,因为它很小,但是第二个工作表太大了,以至于在 22:00 开始该过程的 cron 作业在上午 8:00 没有完成,工作表太简单了

Is there any way to read in a worksheet line by line, e.g. something like this:

有没有办法逐行读取工作表,例如:

$inputFileType = 'Excel2007';
$inputFileName = 'big_file.xlsx';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetNames = $objReader->listWorksheetNames($inputFileName);

foreach ($worksheetNames as $sheetName) {
    //BELOW IS "WISH CODE":
    foreach($row = 1; $row <=$max_rows; $row+= 100) {
        $dataset = $objReader->getWorksheetWithRows($row, $row+100);
        save_dataset_to_database($dataset);
    }
}


Addendum

附录

@mark, I used the code you posted to create the following example:

@mark,我使用您发布的代码创建了以下示例:

function readRowsFromWorksheet() {

    $file_name = htmlentities($_POST['file_name']);
    $file_type = htmlentities($_POST['file_type']);

    echo 'Read rows from worksheet:<br />';
    debug_log('----------start');
    $objReader = PHPExcel_IOFactory::createReader($file_type);
    $chunkSize = 20;
    $chunkFilter = new ChunkReadFilter();
    $objReader->setReadFilter($chunkFilter);

    for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
        $chunkFilter->setRows($startRow, $chunkSize);
        $objPHPExcel = $objReader->load('data/' . $file_name);
        debug_log('reading chunk starting at row '.$startRow);
        $sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
        var_dump($sheetData);
        echo '<hr />';
    }
    debug_log('end');
}

As the following log file shows, it runs fine on a small 8KExcel file, but when I run it on a 3 MBExcel file, it never gets past the first chunk, is there any way I can optimize this code for performance, otherwise it doesn't look like it is not performant enough to get chunks out of a large Excel file:

如以下日志文​​件所示,它在一个小的8KExcel 文件上运行良好,但是当我在一个3 MB 的Excel 文件上运行它时,它永远不会超过第一个块,有什么办法可以优化此代码以提高性能,否则看起来它的性能不足以从大型 Excel 文件中获取块

2011-01-12 11:07:15: ----------start
2011-01-12 11:07:15: reading chunk starting at row 2
2011-01-12 11:07:15: reading chunk starting at row 22
2011-01-12 11:07:15: reading chunk starting at row 42
2011-01-12 11:07:15: reading chunk starting at row 62
2011-01-12 11:07:15: reading chunk starting at row 82
2011-01-12 11:07:15: reading chunk starting at row 102
2011-01-12 11:07:15: reading chunk starting at row 122
2011-01-12 11:07:15: reading chunk starting at row 142
2011-01-12 11:07:15: reading chunk starting at row 162
2011-01-12 11:07:15: reading chunk starting at row 182
2011-01-12 11:07:15: reading chunk starting at row 202
2011-01-12 11:07:15: reading chunk starting at row 222
2011-01-12 11:07:15: end
2011-01-12 11:07:52: ----------start
2011-01-12 11:08:01: reading chunk starting at row 2
(...at 11:18, CPU usage at 93% still running...)


Addendum 2

附录 2

When I comment out:

当我注释掉时:

//$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
//var_dump($sheetData);

Then it parses at an acceptable speed(about 2 rows per second), is there anyway to increase the performance of toArray()?

然后它以可接受的速度(大约每秒 2 行)进行解析,无论如何要提高性能toArray()

2011-01-12 11:40:51: ----------start
2011-01-12 11:40:59: reading chunk starting at row 2
2011-01-12 11:41:07: reading chunk starting at row 22
2011-01-12 11:41:14: reading chunk starting at row 42
2011-01-12 11:41:22: reading chunk starting at row 62
2011-01-12 11:41:29: reading chunk starting at row 82
2011-01-12 11:41:37: reading chunk starting at row 102
2011-01-12 11:41:45: reading chunk starting at row 122
2011-01-12 11:41:52: reading chunk starting at row 142
2011-01-12 11:42:00: reading chunk starting at row 162
2011-01-12 11:42:07: reading chunk starting at row 182
2011-01-12 11:42:15: reading chunk starting at row 202
2011-01-12 11:42:22: reading chunk starting at row 222
2011-01-12 11:42:22: end


Addendum 3

附录 3

This seems to work adequately, for instance, at least on the 3 MBfile:

这似乎可以正常工作,例如,至少在3 MB文件上:

for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ', $startRow, ' to ', ($startRow + $chunkSize - 1), '<br />';
    $chunkFilter->setRows($startRow, $chunkSize);
    $objPHPExcel = $objReader->load('data/' . $file_name);
    debug_log('reading chunk starting at row ' . $startRow);
    foreach ($objPHPExcel->getActiveSheet()->getRowIterator() as $row) {
        $cellIterator = $row->getCellIterator();
        $cellIterator->setIterateOnlyExistingCells(false);
        echo '<tr>';
        foreach ($cellIterator as $cell) {
            if (!is_null($cell)) {
                //$value = $cell->getCalculatedValue();
                $rawValue = $cell->getValue();
                debug_log($rawValue);
            }
        }
    }
}

回答by Mark Baker

It is possible to read a worksheet in "chunks" using Read Filters, although I can make no guarantees about efficiency.

虽然我不能保证效率,但可以使用读取过滤器以“块”形式读取工作表。

$inputFileType = 'Excel5';
$inputFileName = './sampleData/example2.xls';


/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $_startRow = 0;

    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */
    public function setRows($startRow, $chunkSize) {
        $this->_startRow    = $startRow;
        $this->_endRow        = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
            return true;
        }
        return false;
    }
}


echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />';
/**  Create a new Reader of the type defined in $inputFileType  **/

$objReader = PHPExcel_IOFactory::createReader($inputFileType);



echo '<hr />';


/**  Define how many rows we want to read for each "chunk"  **/
$chunkSize = 20;
/**  Create a new Instance of our Read Filter  **/
$chunkFilter = new chunkReadFilter();

/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/
$objReader->setReadFilter($chunkFilter);

/**  Loop to read our worksheet in "chunk size" blocks  **/
/**  $startRow is set to 2 initially because we always read the headings in row #1  **/

for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />';
    /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/
    $chunkFilter->setRows($startRow,$chunkSize);
    /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/
    $objPHPExcel = $objReader->load($inputFileName);

    //    Do some processing here

    $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
    var_dump($sheetData);
    echo '<br /><br />';
}

Note that this Read Filter will always read the first row of the worksheet, as well as the rows defined by the chunk rule.

请注意,此读取过滤器将始终读取工作表的第一行,以及块规则定义的行。

When using a read filter, PHPExcel still parses the entire file, but only loads those cells that match the defined read filter, so it only uses the memory required by that number of cells. However, it will parse the file multiple times, once for each chunk, so it will be slower. This example reads 20 rows at a time: to read line by line, simply set $chunkSize to 1.

使用读取过滤器时,PHPExcel 仍会解析整个文件,但只会加载那些与定义的读取过滤器匹配的单元格,因此它仅使用该数量的单元格所需的内存。但是,它会多次解析文件,每个块解析一次,因此速度会变慢。此示例一次读取 20 行:要逐行读取,只需将 $chunkSize 设置为 1。

This can also cause problems if you have formulae that reference cells in different "chunks", because the data simply isn't available for cells outside of the current "chunk".

如果您有引用不同“块”中的单元格的公式,这也会导致问题,因为数据根本不适用于当前“块”之外的单元格。

回答by Leonardo Delfino

Currently to read .xlsx, .csvand .odsthe best option is spreadsheet-reader (https://github.com/nuovo/spreadsheet-reader) because it can read the files without loading it all into memory. For the .xlsextension it has limitations because it uses the PHPExcel for reading.

目前看.xlsx.csv.ods最好的选择是电子表格的阅读器(https://github.com/nuovo/spreadsheet-reader),因为它可以读取文件,而无需加载这一切到内存中。对于.xls扩展,它有局限性,因为它使用 PHPExcel 进行读取。

回答by Andres Paladines

This is the ChunkReadFilter.php :

这是 ChunkReadFilter.php :

<?php
Class ChunkReadFilter implements PHPExcel_Reader_IReadFilter {

    private $_startRow = 0;
    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */
    public function setRows($startRow, $chunkSize) {
        $this->_startRow = $startRow;
        $this->_endRow = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {

        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow 
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {

            return true;
        }
        return false;
    }

}
?>

And this is the index.php and a not perfect but basic implementation at the end of this file.

这是 index.php 和该文件末尾的一个不完美但基本的实现。

<?php

require_once './Classes/PHPExcel/IOFactory.php';
require_once 'ChunkReadFilter.php';

class Excelreader {

    /**
     * This function is used to read data from excel file in chunks and insert into database
     * @param string $filePath
     * @param integer $chunkSize
     */
    public function readFileAndDumpInDB($filePath, $chunkSize) {
        echo("Loading file " . $filePath . " ....." . PHP_EOL);
        /**  Create a new Reader of the type that has been identified  * */
        $objReader = PHPExcel_IOFactory::createReader(PHPExcel_IOFactory::identify($filePath));

        $spreadsheetInfo = $objReader->listWorksheetInfo($filePath);

        /**  Create a new Instance of our Read Filter  * */
        $chunkFilter = new ChunkReadFilter();

        /**  Tell the Reader that we want to use the Read Filter that we've Instantiated  * */
        $objReader->setReadFilter($chunkFilter);
        $objReader->setReadDataOnly(true);
        //$objReader->setLoadSheetsOnly("Sheet1");
        //get header column name
        $chunkFilter->setRows(0, 1);
        echo("Reading file " . $filePath . PHP_EOL . "<br>");
        $totalRows = $spreadsheetInfo[0]['totalRows'];
        echo("Total rows in file " . $totalRows . " " . PHP_EOL . "<br>");

        /**  Loop to read our worksheet in "chunk size" blocks  * */
        /**  $startRow is set to 1 initially because we always read the headings in row #1  * */
        for ($startRow = 1; $startRow <= $totalRows; $startRow += $chunkSize) {
            echo("Loading WorkSheet for rows " . $startRow . " to " . ($startRow + $chunkSize - 1) . PHP_EOL . "<br>");
            $i = 0;
            /**  Tell the Read Filter, the limits on which rows we want to read this iteration  * */
            $chunkFilter->setRows($startRow, $chunkSize);
            /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  * */
            $objPHPExcel = $objReader->load($filePath);
            $sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, false);

            $startIndex = ($startRow == 1) ? $startRow : $startRow - 1;
            //dumping in database
            if (!empty($sheetData) && $startRow < $totalRows) {
                /**
                 * $this->dumpInDb(array_slice($sheetData, $startIndex, $chunkSize));
                 */

                echo "<table border='1'>";
                foreach ($sheetData as $key => $value) {
                    $i++;
                    if ($value[0] != null) {
                        echo "<tr><td>id:$i</td><td>{$value[0]} </td><td>{$value[1]} </td><td>{$value[2]} </td><td>{$value[3]} </td></tr>";
                    }
                }
                echo "</table><br/><br/>";
            }
            $objPHPExcel->disconnectWorksheets();
            unset($objPHPExcel, $sheetData);
        }
        echo("File " . $filePath . " has been uploaded successfully in database" . PHP_EOL . "<br>");
    }

    /**
     * Insert data into database table 
     * @param Array $sheetData
     * @return boolean
     * @throws Exception
     * THE METHOD FOR THE DATABASE IS NOT WORKING, JUST THE PUBLIC METHOD..
     */
    protected function dumpInDb($sheetData) {

        $con = DbAdapter::getDBConnection();
        $query = "INSERT INTO employe(name,address)VALUES";

        for ($i = 1; $i < count($sheetData); $i++) {
            $query .= "(" . "'" . mysql_escape_string($sheetData[$i][0]) . "',"
                    . "'" . mysql_escape_string($sheetData[$i][1]) . "')";
        }

        $query = trim($query, ",");
        $query .="ON DUPLICATE KEY UPDATE name=VALUES(name),
                =VALUES(address),
               ";
        if (mysqli_query($con, $query)) {
            mysql_close($con);
            return true;
        } else {
            mysql_close($con);
            throw new Exception(mysqli_error($con));
        }
    }

    /**
     * This function returns list of files corresponding to given directory path
     * @param String $dataFolderPath
     * @return Array list of file
     */
    protected function getFileList($dataFolderPath) {
        if (!is_dir($dataFolderPath)) {
            throw new Exception("Directory " . $dataFolderPath . " is not exist");
        }
        $root = scandir($dataFolderPath);
        $fileList = array();
        foreach ($root as $value) {
            if ($value === '.' || $value === '..') {
                continue;
            }
            if (is_file("$dataFolderPath/$value")) {
                $fileList[] = "$dataFolderPath/$value";
                continue;
            }
        }
        return $fileList;
    }

}

$inputFileName = './prueba_para_batch.xls';
$excelReader = new Excelreader();
$excelReader->readFileAndDumpInDB($inputFileName, 500);

回答by Andrew Novikov

If it's possible to use another library, I recommend to use Spoutfor reading .xlsx, .ods, .csv
It read huge files, low memory use and works fast.

如果可以使用其他库,我建议使用Spout读取 .xlsx、.ods、.csv
它读取大文件、低内存使用并且运行速度快。

From documentation:

从文档:

use Box\Spout\Reader\ReaderFactory;
use Box\Spout\Common\Type;

$reader = ReaderFactory::create(Type::XLSX); // for XLSX files
//$reader = ReaderFactory::create(Type::CSV); // for CSV files
//$reader = ReaderFactory::create(Type::ODS); // for ODS files

$reader->open($filePath);

foreach ($reader->getSheetIterator() as $sheet) {
    foreach ($sheet->getRowIterator() as $row) {
        // do stuff with the row
    }
}

$reader->close();