php 在php中从csv文件中读取大数据

Question

提问by Andy Martin

I am reading csv & checking with mysql that records are present in my table or not in php.

我正在阅读 csv 并使用 mysql 检查记录是否存在于我的表中或不在 php 中。

csv has near about 25000 records & when i run my code it display "Service Unavailable" error after 2m 10s (onload: 2m 10s)

csv 有大约 25000 条记录，当我运行我的代码时，它在 2m 10s 后显示“Service Unavailable”错误（加载：2m 10s）

here i have added code

在这里我添加了代码

// for set memory limit & execution time
ini_set('memory_limit', '512M');
ini_set('max_execution_time', '180');

//function to read csv file
function readCSV($csvFile)
{
    $file_handle = fopen($csvFile, 'r');
    while (!feof($file_handle) ) {

       set_time_limit(60); // you can enable this if you have lot of data

       $line_of_text[] = fgetcsv($file_handle, 1024);
   }
   fclose($file_handle);
   return $line_of_text;
 }

// Set path to CSV file
$csvFile = 'my_records.csv';

$csv = readCSV($csvFile);

for($i=1;$i<count($csv);$i++)
{
   $user_email= $csv[$i][1];

   $qry = "SELECT u.user_id, u.user_email_id FROM tbl_user as u WHERE u.user_email_id = '".$user_email."'";

   $result = @mysql_query($qry) or die("Couldn't execute query:".mysql_error().''.mysql_errno());

   $rec = @mysql_fetch_row($result);

   if($rec)
   {
      echo "Record exist";
   }
   else
   {
      echo "Record not exist"; 
   }
}

Note: I just want to list out records those are not exist in my table.

注意：我只想列出表中不存在的记录。

Please suggest me solution on this...

请建议我解决这个问题...

Answer 1

回答by Abela

An excellent method to deal with large files is located at: https://stackoverflow.com/a/5249971/797620

处理大文件的绝佳方法位于：https: //stackoverflow.com/a/5249971/797620

This method is used at ~~http://www.cuddlycactus.com/knownpasswords/~~(page has been taken down) to search through 170+ million passwords in just a few milliseconds.

这种方法在~~http://www.cuddlycactus.com/knownpasswords/~~（页面已被撤下）使用，可在几毫秒内搜索 170 多个密码。

Answer 2

回答by Raza Ahmed

After struggling a lot, finally i found a good solution, may be it help others also. When i tried 2,367KB csv file containing 18226 rows, the least time taken by different php scripts were (1) from php.net fgetcsvdocumentation named CsvImporter, and (2) file_get_contents => PHP Fatal error: Allowed memory exhausted

经过一番挣扎，终于找到了一个很好的解决方案，可能对其他人也有帮助。当我尝试包含 18226 行的 2,367KB csv 文件时，不同 php 脚本花费的最少时间是 (1) 来自fgetcsv名为的php.net文档CsvImporter，以及 (2) file_get_contents => PHP 致命错误：允许内存耗尽

(1) took 0.92574405670166 (2) took 0.12543702125549 (string form) & 0.52903485298157 (splitted to array) Note: this calculation not include adding to mysql.

(1) 取 0.92574405670166 (2) 取 0.12543702125549 (字符串形式) & 0.52903485298157 (拆分为数组) 注意：此计算不包括添加到 mysql。

The best solution i found uses 3.0644409656525total including adding to database and some conditional check also. It took 11 seconds in processing a 8MB file. solution is :

我发现的最佳解决方案使用3.0644409656525total ，包括添加到数据库和一些条件检查。处理一个 8MB 的文件需要 11 秒。解决方案是：

$csvInfo = analyse_file($file, 5);
    $lineSeperator = $csvInfo['line_ending']['value'];
    $fieldSeperator = $csvInfo['delimiter']['value'];
    $columns = getColumns($file);
    echo '<br>========Details========<br>';
    echo 'Line Sep: \t '.$lineSeperator;
    echo '<br>Field Sep:\t '.$fieldSeperator;
    echo '<br>Columns: ';print_r($columns);
    echo '<br>========Details========<br>';
    $ext = pathinfo($file, PATHINFO_EXTENSION);
    $table = str_replace(' ', '_', basename($file, "." . $ext));
    $rslt = table_insert($table, $columns);
    if($rslt){
        $query = "LOAD DATA LOCAL INFILE '".$file."' INTO TABLE $table FIELDS TERMINATED BY '$fieldSeperator' ";

        var_dump(addToDb($query, false));
    }


function addToDb($query, $getRec = true){
//echo '<br>Query : '.$query;
$con = @mysql_connect('localhost', 'root', '');
@mysql_select_db('rtest', $con);
$result = mysql_query($query, $con);
if($result){
    if($getRec){
         $data = array();
        while ($row = mysql_fetch_assoc($result)) { 
            $data[] = $row;
        }
        return $data;
    }else return true;
}else{
    var_dump(mysql_error());
    return false;
}
}


function table_insert($table_name, $table_columns) {
    $queryString = "CREATE TABLE " . $table_name . " (";
    $columns = '';
    $values = '';

    foreach ($table_columns as $column) {
        $values .= (strtolower(str_replace(' ', '_', $column))) . " VARCHAR(2048), ";
    }
    $values = substr($values, 0, strlen($values) - 2);

    $queryString .= $values . ") ";

    //// echo $queryString;

    return addToDb($queryString, false);
}


function getColumns($file){
    $cols = array();
    if (($handle = fopen($file, 'r')) !== FALSE)
    {
        while (($row = fgetcsv($handle)) !== FALSE) 
        {
           $cols = $row;
           if(count($cols)>0){
                break;
           }
        }
        return $cols;
    }else return false;
}

function analyse_file($file, $capture_limit_in_kb = 10) {
// capture starting memory usage
$output['peak_mem']['start']    = memory_get_peak_usage(true);

// log the limit how much of the file was sampled (in Kb)
$output['read_kb']                 = $capture_limit_in_kb;

// read in file
$fh = fopen($file, 'r');
    $contents = fread($fh, ($capture_limit_in_kb * 1024)); // in KB
fclose($fh);

// specify allowed field delimiters
$delimiters = array(
    'comma'     => ',',
    'semicolon' => ';',
    'tab'         => "\t",
    'pipe'         => '|',
    'colon'     => ':'
);

// specify allowed line endings
$line_endings = array(
    'rn'         => "\r\n",
    'n'         => "\n",
    'r'         => "\r",
    'nr'         => "\n\r"
);

// loop and count each line ending instance
foreach ($line_endings as $key => $value) {
    $line_result[$key] = substr_count($contents, $value);
}

// sort by largest array value
asort($line_result);

// log to output array
$output['line_ending']['results']     = $line_result;
$output['line_ending']['count']     = end($line_result);
$output['line_ending']['key']         = key($line_result);
$output['line_ending']['value']     = $line_endings[$output['line_ending']['key']];
$lines = explode($output['line_ending']['value'], $contents);

// remove last line of array, as this maybe incomplete?
array_pop($lines);

// create a string from the legal lines
$complete_lines = implode(' ', $lines);

// log statistics to output array
$output['lines']['count']     = count($lines);
$output['lines']['length']     = strlen($complete_lines);

// loop and count each delimiter instance
foreach ($delimiters as $delimiter_key => $delimiter) {
    $delimiter_result[$delimiter_key] = substr_count($complete_lines, $delimiter);
}

// sort by largest array value
asort($delimiter_result);

// log statistics to output array with largest counts as the value
$output['delimiter']['results']     = $delimiter_result;
$output['delimiter']['count']         = end($delimiter_result);
$output['delimiter']['key']         = key($delimiter_result);
$output['delimiter']['value']         = $delimiters[$output['delimiter']['key']];

// capture ending memory usage
$output['peak_mem']['end'] = memory_get_peak_usage(true);
return $output;
}

Answer 3

回答by Hearaman

Normally, "Service Unavailable" error will come when 500error occurs. I think this is coming because of insufficient execution time. Please check your log/browser console, may be you can see 500 error.

通常，当出现500错误时，会出现“ Service Unavailable”错误。我认为这是因为执行时间不足。请检查您的日志/浏览器控制台，您可能会看到 500 错误。

First of all, Keep set_time_limit(60)out of loop.

首先，将set_time_limit(60)保持在循环之外。

Do some changes like,

做一些改变，比如，

Apply INDEX on user_email_id column, so you can get the rows faster with your select query.
Do not echo message, Keep the output buffer free.

在 user_email_id 列上应用 INDEX，这样您就可以通过选择查询更快地获取行。
不要回显消息，保持输出缓冲区空闲。

And

和

I have done these kind of take using Open source program. You can get it here http://sourceforge.net/projects/phpexcelreader/

我已经使用开源程序完成了这些操作。你可以在这里得到它http://sourceforge.net/projects/phpexcelreader/

Try this.

尝试这个。

php 在php中从csv文件中读取大数据

提问by Andy Martin

回答by Abela

回答by Raza Ahmed

回答by Hearaman

相关推荐

最近更新

标签

php 在php中从csv文件中读取大数据

提问by Andy Martin

回答by Abela

回答by Raza Ahmed

回答by Hearaman

相关推荐

php 类 stdClass 的对象无法转换为字符串错误

php 邮件头中的哪个换行符，\r\n 还是 \n？

PHP 上的 FastCGI 中的错误 500

为我的 PHP 应用程序实现 ACL

相关推荐

最近更新

标签