php 处理非常大的csv文件而没有超时和内存错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7318768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Process very big csv file without timeout and memory error
提问by Julian
At the moment I'm writing an import script for a very big CSV file. The Problem is most times it stops after a while because of an timeout or it throws an memory error.
目前我正在为一个非常大的 CSV 文件编写一个导入脚本。问题是大多数时候它会因超时或引发内存错误而在一段时间后停止。
My Idea was now to parse the CSV file in "100 lines" steps and after 100 lines recall the script automatically. I tried to achieve this with header (location ...) and pass the current line with get but it didn't work out as I want to.
我现在的想法是在“100 行”步骤中解析 CSV 文件,并在 100 行后自动调用脚本。我试图用 header (location ...) 来实现这一点,并用 get 传递当前行,但它没有像我想要的那样解决。
Is there a better way to this or does someone have an idea how to get rid of the memory error and the timeout?
有没有更好的方法来解决这个问题,或者有人知道如何摆脱内存错误和超时?
回答by feeela
I've used fgetcsv
to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.
我曾经fgetcsv
以流方式读取 120MB csv(这是正确的英语吗?)。它逐行读取,然后我将每一行都插入到数据库中。这样,每次迭代时,内存中只保留一行。脚本仍然需要 20 分钟。跑步。也许下次我会尝试 Python……不要尝试将一个巨大的 csv 文件加载到一个数组中,这真的会消耗大量内存。
// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
// get the first row, which contains the column-titles (if necessary)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// resort/rewrite data and insert into DB here
// try to use conditions sparingly here, as those will cause slow-performance
// I don't know if this is really necessary, but it couldn't harm;
// see also: http://php.net/manual/en/features.gc.php
unset($data);
}
fclose($handle);
}
回答by Craigo
I find uploading the file and inserting using mysql's LOAD DATA LOCAL query a fast solution eg:
我发现上传文件并使用 mysql 的 LOAD DATA LOCAL 查询插入是一个快速的解决方案,例如:
$sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv'
REPLACE INTO TABLE table_name FIELDS TERMINATED BY ','
ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
$result = $mysqli->query($sql);
回答by 2ndkauboy
If you don't care about how long it takes and how much memory it needs, you can simply increase the values for this script. Just add the following lines to the top of your script:
如果您不关心需要多长时间和需要多少内存,您可以简单地增加此脚本的值。只需将以下行添加到脚本的顶部:
ini_set('memory_limit', '512M');
ini_set('max_execution_time', '180');
With the function memory_get_usage()you can find out how much memory your script needs to find a good value for the memory_limit.
使用函数memory_get_usage(),您可以找出脚本需要多少内存才能找到合适的 memory_limit 值。
You might also want to have a look at fgets()which allows you to read a file line by line. I am not sure if that takes less memory, but I really think this will work. But even in this case you have to increase the max_execution_time to a higher value.
您可能还想查看fgets(),它允许您逐行读取文件。我不确定这是否需要更少的内存,但我真的认为这会奏效。但即使在这种情况下,您也必须将 max_execution_time 增加到更高的值。
回答by rawdesk.be
There seems to be an enormous difference between fgetcsv() and fgets() when it comes to memory consumption. A simple CSV with only one column passed my 512M memory limit for just 50000 records with fgetcsv() and took 8 minutes to report that.
在内存消耗方面,fgetcsv() 和 fgets() 之间似乎存在巨大差异。一个只有一列的简单 CSV 超过了我的 512M 内存限制,只有 50000 条记录使用 fgetcsv() 并花了 8 分钟来报告。
With fgets() it took only 3 minutes to successfully process 649175 records, and my local server wasn't even gasping for additional air..
使用 fgets() 只用了 3 分钟就成功处理了 649175 条记录,而且我的本地服务器甚至都没有喘不过气来。
So my advice is to use fgets() if the number of columns in your csv is limited. In my case fgets() returned directly the string inside column 1. For more then one column, you might use explode() in a disposable array which you unset() after each record operation. Thumbed up answer 3 @ndkauboy
所以我的建议是,如果 csv 中的列数有限,请使用 fgets()。在我的情况下, fgets() 直接返回第 1 列内的字符串。对于多于一列,您可以在每次记录操作后 unset() 的一次性数组中使用 expand() 。大拇指回答 3 @ndkauboy
回答by Your Common Sense
Oh. Just make this script called as CLI, not via silly web interface. So, no execution time limit will affect it.
And do not keep parsed results forever but write them down immediately - so, you won't be affected by memory limit either.
哦。只需将此脚本称为 CLI,而不是通过愚蠢的 Web 界面。因此,没有执行时间限制会影响它。
并且不要永远保留解析的结果,而是立即将它们写下来 - 这样,您也不会受到内存限制的影响。