php 读取和解析超大文件的内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14848933/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 08:04:22  来源:igfitidea点击:

Read and parse contents of very large file

phpfilefile-get-contents

提问by imperium2335

I am trying to parse a tab delimited file that is ~1GB in size.

我正在尝试解析一个大小约为 1GB 的制表符分隔文件。

Where I run the script i get:

我在哪里运行脚本我得到:

Fatal error: Allowed memory size of 1895825408 bytes exhausted  (tried to allocate 1029206974 bytes) ...

My script at the moment is just:

我目前的脚本只是:

$file = file_get_contents('allCountries.txt') ;

$file = str_replace(array("\r\n", "\t"), array("[NEW*LINE]", "[tAbul*Ator]"), $file) ;

I have set the memory limit in php.ini to -1, which then gives me:

我已将 php.ini 中的内存限制设置为 -1,然后给了我:

Fatal error: Out of memory (allocated 1029963776) (tried to allocate 1029206974 bytes)

Is there anyway to partially open the file and then move on to the next part so less memory is used up at one time?

无论如何,是否可以部分打开文件,然后转到下一部分,以便一次使用更少的内存?

回答by Ranty

Yes, you can read it line by line:

是的,您可以逐行阅读:

$handle = @fopen("/tmp/inputfile.txt", "r");
if ($handle) {
    while (($buffer = fgets($handle, 4096)) !== false) {
        echo $buffer;
    }
    fclose($handle);
}

回答by Jordi Kroon

You have to use blocks to read the file. Check the answer of this question. https://stackoverflow.com/a/6564818/1572528

您必须使用块来读取文件。检查这个问题的答案。 https://stackoverflow.com/a/6564818/1572528

You can also try to use this for less large files.

您也可以尝试将其用于较小的文件。

ini_set('memory_limit', '32M'); //max size 32m

回答by Gregor Walter

Yes, use fopen and fread / fgets for this:

是的,为此使用 fopen 和 fread / fgets:

http://www.php.net/manual/en/function.fread.php

http://www.php.net/manual/en/function.fread.php

string fread ( resource $handle , int $length )

Set $length to how many of the file you want to read. The $handle saves the position for new reads then, with fseek you can also set the position later....

将 $length 设置为要读取的文件数量。$handle 保存了新读取的位置,然后使用 fseek 您也可以稍后设置位置....

回答by Gregor Walter

Are you sure that it's fopenthat's failing and not your script's timeout setting? The default is usually around 30 seconds or so, and if your file is taking longer than that to read in, it may be tripping that up.

您确定这fopen是失败而不是脚本的超时设置吗?默认值通常约为 30 秒左右,如果您的文件读取时间比读取时间长,则可能会导致文件中断。

Another thing to consider may be the memory limit on your script - reading the file into an array may trip over this, so check your error log for memory warnings.

要考虑的另一件事可能是脚本的内存限制 - 将文件读入数组可能会导致此问题,因此请检查错误日志以获取内存警告。

If neither of the above are your problem, you might look into using fgetsto read the file in line-by-line, processing as you go.

如果以上都不是您的问题,您可能会考虑使用fgets逐行读取文件,随时处理。

$handle = fopen("/tmp/uploadfile.txt", "r") or die("Couldn't get handle");
if ($handle) {
    while (!feof($handle)) {
        $buffer = fgets($handle, 4096);
        // Process buffer here..
    }
    fclose($handle);
}

Edit

编辑

PHP doesn't seem to throw an error, it just returns false.

PHP 似乎没有抛出错误,它只是返回 false。

Is the path to $rawfilecorrect relative to where the script is running? Perhaps try setting an absolute path here for the filename.

$rawfile正确的路径是否相对于脚本运行的位置?也许尝试在此处为文件名设置绝对路径。