file_get_contents => PHP 致命错误：允许内存耗尽

Question

提问by Chris

I have no experience when dealing with large files so I am not sure what to do about this. I have attempted to read several large files using file_get_contents; the task is to clean and munge them using preg_replace().

我在处理大文件时没有经验，所以我不知道该怎么办。我尝试使用file_get_contents读取几个大文件；任务是使用preg_replace()清理和清理它们。

My code runs fine on small files ; however, the large files (40 MB) trigger an Memory exhausted error:

我的代码在小文件上运行良好；但是，大文件 (40 MB) 会触发内存耗尽错误：

PHP Fatal error:  Allowed memory size of 16777216 bytes exhausted (tried to allocate 41390283 bytes)

I was thinking of using fread()instead but I am not sure that'll work either. Is there a workaround for this problem?

我正在考虑使用fread()代替，但我不确定这也行得通。这个问题有解决方法吗？

Thanks for your input.

感谢您的输入。

This is my code:

这是我的代码：

<?php
error_reporting(E_ALL);

##get find() results and remove DOS carriage returns.
##The error is thrown on the next line for large files!
$myData = file_get_contents("tmp11");
$newData = str_replace("^M", "", $myData);

##cleanup Model-Manufacturer field.
$pattern = '/(Model-Manufacturer:)(\n)(\w+)/i';
$replacement = '';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup Test_Version field and create comma delimited layout.
$pattern = '/(Test_Version=)(\d).(\d).(\d)(\n+)/';
$replacement = '..      ';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup occasional empty Model-Manufacturer field.
$pattern = '/(Test_Version=)(\d).(\d).(\d)      (Test_Version=)/';
$replacement = '..      Model-Manufacturer:N/A--';
$newData = preg_replace($pattern, $replacement, $newData);

##fix occasional Model-Manufacturer being incorrectly wrapped.
$newData = str_replace("--","\n",$newData);

##fix 'Binary file' message when find() utility cannot id file.
$pattern = '/(Binary file).*/';
$replacement = '';
$newData = preg_replace($pattern, $replacement, $newData);
$newData = removeEmptyLines($newData);

##replace colon with equal sign
$newData = str_replace("Model-Manufacturer:","Model-Manufacturer=",$newData);

##file stuff
$fh2 = fopen("tmp2","w");
fwrite($fh2, $newData);
fclose($fh2);

### Functions.

##Data cleanup
function removeEmptyLines($string)
{
        return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}
?>

Answer 1

回答by RobertPitt

Firstly you should understand that when using file_get_contents you're fetching the entire string of data into a variable, that variableis stored in the hosts memory.

首先您应该明白，当使用 file_get_contents 时，您将整个数据字符串提取到一个变量中，该变量存储在主机内存中。

If that string is greater than the size dedicated to the PHP process then PHP will halt and display the error message above.

如果该字符串大于专用于 PHP 进程的大小，则 PHP 将停止并显示上面的错误消息。

The way around this to open the file as a pointer, and then take a chunk at a time. This way if you had a 500MB file you can read the first 1MB of data, do what you will with it, delete that 1MB from the system's memory and replace it with the next MB. This allows you to manage how much data you're putting in the memory.

解决这个问题的方法是将文件作为指针打开，然后一次取一个块。这样，如果你有一个 500MB 的文件，你可以读取前 1MB 的数据，做你想做的，从系统内存中删除这 1MB 并用下一个 MB 替换它。这使您可以管理放入内存的数据量。

An example if this can be seen below, I will create a function that acts like node.js

如果可以在下面看到一个示例，我将创建一个类似于 node.js 的函数

function file_get_contents_chunked($file,$chunk_size,$callback)
{
    try
    {
        $handle = fopen($file, "r");
        $i = 0;
        while (!feof($handle))
        {
            call_user_func_array($callback,array(fread($handle,$chunk_size),&$handle,$i));
            $i++;
        }

        fclose($handle);

    }
    catch(Exception $e)
    {
         trigger_error("file_get_contents_chunked::" . $e->getMessage(),E_USER_NOTICE);
         return false;
    }

    return true;
}

and then use like so:

然后像这样使用：

$success = file_get_contents_chunked("my/large/file",4096,function($chunk,&$handle,$iteration){
    /*
        * Do what you will with the {$chunk} here
        * {$handle} is passed in case you want to seek
        ** to different parts of the file
        * {$iteration} is the section of the file that has been read so
        * ($i * 4096) is your current offset within the file.
    */

});

if(!$success)
{
    //It Failed
}

One of the problems you will find is that you're trying to perform regex several times on an extremely large chunk of data. Not only that but your regex is built for matching the entire file.

您会发现的问题之一是您试图对大量数据执行多次正则表达式。不仅如此，您的正则表达式也是为匹配整个文件而构建的。

With the above method your regex could become useless as you may only be matching a half set of data. What you should do is revert to the native string functions such as

使用上述方法，您的正则表达式可能会变得毫无用处，因为您可能只匹配了半组数据。您应该做的是恢复到本机字符串函数，例如

strpos
substr
trim
explode

strpos
substr
trim
explode

for matching the strings, I have added support in the callback so that the handle and current iteration are passed. This will allow you to work with the file directly within your callback, allowing you to use functions like fseek, ftruncateand fwritefor instance.

为了匹配字符串，我在回调中添加了支持，以便传递句柄和当前迭代。这将允许您与档案工作直接在回调中，让您使用类似功能fseek，ftruncate并fwrite为实例。

The way you're building your string manipulation is not efficient whatsoever, and using the proposed method above is by far a much better way.

您构建字符串操作的方式无论如何都不是很有效，而使用上述建议的方法是迄今为止更好的方法。

Hope this helps.

希望这可以帮助。

Answer 2

回答by vbence

A pretty ugly solution to adjust your memory limit depending on file size:

根据文件大小调整内存限制的一个非常丑陋的解决方案：

$filename = "yourfile.txt";
ini_set ('memory_limit', filesize ($filename) + 4000000);
$contents = file_get_contents ($filename);

The right solutuion would be to think if you can process the file in smaller chunks, or use command line tools from PHP.

正确的解决方案是考虑是否可以以较小的块处理文件，或者使用 PHP 中的命令行工具。

If your file is line-based you can also use fgetsto process it line-by-line.

如果您的文件是基于行的，您还可以使用fgets它来逐行处理。

Answer 3

回答by haltabush

My advice would be to use fread. It may be a little slower, but you won't have to use all your memory... For instance :

我的建议是使用 fread。它可能会慢一点，但您不必使用所有内存......例如：

//This use filesize($oldFile) memory
file_put_content($newFile, file_get_content($oldFile));
//And this 8192 bytes
$pNew=fopen($newFile, 'w');
$pOld=fopen($oldFile, 'r');
while(!feof($pOld)){
    fwrite($pNew, fread($pOld, 8192));
}

file_get_contents => PHP 致命错误：允许内存耗尽

提问by Chris

回答by RobertPitt

回答by vbence

回答by haltabush

相关推荐

最近更新

标签

file_get_contents => PHP 致命错误：允许内存耗尽

提问by Chris

回答by RobertPitt

回答by vbence

回答by haltabush

相关推荐

使用 php 将一年添加到日期时间

带有 utf-8 的 xml 的 PHP 标头

如何在 Ubuntu 上开始使用 PHP

php 如何在购物车中显示自定义属性（Magento）

相关推荐

最近更新

标签