用 PHP 读取非常大的文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/162176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading very large files in PHP
提问by ConroyP
fopenis failing when I try to read in a very moderately sized file in PHP. A 6 meg filemakes it choke, though smaller files around 100kare just fine. i've read that it is sometimes necessary to recompile PHPwith the -D_FILE_OFFSET_BITS=64flag in order to read files over 20 gigs or something ridiculous, but shouldn't I have no problems with a 6 meg file? Eventually we'll want to read in files that are around 100 megs, and it would be nice be able to open them and then read through them line by line with fgets as I'm able to do with smaller files.
fopen当我尝试在PHP. A 6 meg file让它窒息,虽然周围较小的文件100k就好了。我读过有时需要PHP使用-D_FILE_OFFSET_BITS=64标志重新编译才能读取超过 20 个演出或一些荒谬的文件,但我不应该对 6 兆文件没有问题吗?最终,我们希望读取大约 100 兆字节的文件,如果能够打开它们,然后使用 fgets 逐行读取它们,就像我能够处理较小的文件一样。
What are your tricks/solutions for reading and doing operations on very large files in PHP?
你有什么技巧/解决方案来读取和处理非常大的文件PHP?
Update: Here's an example of a simple codeblock that fails on my 6 meg file - PHP doesn't seem to throw an error, it just returns false. Maybe I'm doing something extremely dumb?
更新:这是一个在我的 6 meg 文件中失败的简单代码块示例 - PHP 似乎没有抛出错误,它只是返回 false。也许我正在做一些非常愚蠢的事情?
$rawfile = "mediumfile.csv";
if($file = fopen($rawfile, "r")){
fclose($file);
} else {
echo "fail!";
}
Another update: Thanks all for your help, it did turn out to be something incredibly dumb - a permissions issue. My small file inexplicably had read permissions when the larger file didn't. Doh!
另一个更新:感谢大家的帮助,结果证明这是一件非常愚蠢的事情 - 权限问题。当大文件没有时,我的小文件莫名其妙地具有读取权限。哦!
回答by ConroyP
Are you sure that it's fopenthat's failing and not your script's timeout setting? The default is usually around 30 seconds or so, and if your file is taking longer than that to read in, it may be tripping that up.
您确定这fopen是失败而不是脚本的超时设置吗?默认值通常约为 30 秒左右,如果您的文件读取时间比读取时间长,则可能会导致文件中断。
Another thing to consider may be the memory limit on your script - reading the file into an array may trip over this, so check your error log for memory warnings.
要考虑的另一件事可能是脚本的内存限制 - 将文件读入数组可能会导致此问题,因此请检查错误日志以获取内存警告。
If neither of the above are your problem, you might look into using fgetsto read the file in line-by-line, processing as you go.
如果以上都不是您的问题,您可能会考虑使用fgets逐行读取文件,随时处理。
$handle = fopen("/tmp/uploadfile.txt", "r") or die("Couldn't get handle");
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
// Process buffer here..
}
fclose($handle);
}
Edit
编辑
PHP doesn't seem to throw an error, it just returns false.
PHP 似乎没有抛出错误,它只是返回 false。
Is the path to $rawfilecorrect relative to where the script is running? Perhaps try setting an absolute path here for the filename.
$rawfile正确的路径是否相对于脚本运行的位置?也许尝试在此处为文件名设置绝对路径。
回答by Al-Punk
Did 2 tests with a 1.3GB file and a 9.5GB File.
使用 1.3GB 文件和 9.5GB 文件进行了 2 次测试。
1.3 GB
1.3GB
Using fopen()
使用 fopen()
This process used 15555 ms for its computations.
此过程使用 15555 毫秒进行计算。
It spent 169 ms in system calls.
它在系统调用中花费了 169 毫秒。
Using file()
使用 file()
This process used 6983 ms for its computations.
此过程使用 6983 毫秒进行计算。
It spent 4469 ms in system calls.
它在系统调用中花费了 4469 毫秒。
9.5 GB
9.5 GB
Using fopen()
使用 fopen()
This process used 113559 ms for its computations.
此过程使用 113559 毫秒进行计算。
It spent 2532 ms in system calls.
它在系统调用中花费了 2532 毫秒。
Using file()
使用 file()
This process used 8221 ms for its computations.
此过程使用 8221 毫秒进行计算。
It spent 7998 ms in system calls.
它在系统调用中花费了 7998 毫秒。
Seems file()is faster.
似乎file()更快。
回答by Tinel Barb
? The fgets()function is fine until the text files passed 20 MBytes and the parsing speed is greatly reduced.
? 该fgets()功能很好,直到文本文件通过 20 MB 并且解析速度大大降低。
? The file_ get_contents()function give good results until 40 MBytes and acceptable results until 100 MBytes, but file_get_contents()loads the entire file into memory, so it's not scalabile.
? 该file_ get_contents()函数在 40 MB 之前提供良好的结果,在 100 MB 之前提供可接受的结果,但file_get_contents()将整个文件加载到内存中,因此它不可扩展。
? The file()function is disastrous with large files of text because this function creates an array containing each line of text, thus this array is stored in memory and the memory used is even larger.
Actually, a 200 MB file I could only manage to parse with memory_limitset at 2 GB which was inappropriate for the 1+ GB files I intended to parse.
? 该file()函数对于大文本文件是灾难性的,因为该函数创建一个包含每行文本的数组,因此该数组存储在内存中,使用的内存更大。
实际上,一个 200 MB 的文件我只能设法解析memory_limit为 2 GB,这对于我打算解析的 1+ GB 文件来说是不合适的。
When you have to parse files larger than 1 GB and the parsing time exceeded 15 seconds and you want to avoid to load the entire file into memory, you have to find another way.
当你要解析大于1GB的文件,解析时间超过15秒,又想避免将整个文件加载到内存中时,就得另辟蹊径了。
My solution was to parse data in arbitrary small chunks. The code is:
我的解决方案是解析任意小块中的数据。代码是:
$filesize = get_file_size($file);
$fp = @fopen($file, "r");
$chunk_size = (1<<24); // 16MB arbitrary
$position = 0;
// if handle $fp to file was created, go ahead
if ($fp) {
while(!feof($fp)){
// move pointer to $position in file
fseek($fp, $position);
// take a slice of $chunk_size bytes
$chunk = fread($fp,$chunk_size);
// searching the end of last full text line
$last_lf_pos = strrpos($chunk, "\n");
// $buffer will contain full lines of text
// starting from $position to $last_lf_pos
$buffer = mb_substr($chunk,0,$last_lf_pos);
////////////////////////////////////////////////////
//// ... DO SOMETHING WITH THIS BUFFER HERE ... ////
////////////////////////////////////////////////////
// Move $position
$position += $last_lf_pos;
// if remaining is less than $chunk_size, make $chunk_size equal remaining
if(($position+$chunk_size) > $filesize) $chunk_size = $filesize-$position;
$buffer = NULL;
}
fclose($fp);
}
The memory used is only the $chunk_sizeand the speed is slightly less than the one obtained with file_ get_contents(). I think PHP Group should use my approach in order to optimize it's parsing functions.
使用的内存只有$chunk_size,速度比用 获得的略慢file_ get_contents()。我认为 PHP Group 应该使用我的方法来优化它的解析功能。
*) Find the get_file_size()function here.
*)在这里找到get_file_size()函数。
回答by Fionn
Well you could try to use the readfile function if you just want to output the file.
好吧,如果您只想输出文件,可以尝试使用 readfile 函数。
If this is not the case - maybe you should think about the design of the application, why do you want to open such large files on web requests?
如果不是这种情况——也许您应该考虑一下应用程序的设计,为什么要在 Web 请求时打开如此大的文件?
回答by Enrico Murru
I used fopen to open video files for streaming, using a php script as a video streaming server, and I had no problem with files of size more than 50/60 MB.
我使用fopen打开视频文件进行流媒体播放,使用php脚本作为视频流媒体服务器,大小超过50/60 MB的文件没有问题。
回答by Juan Pablo Califano
If the problem is caused by hitting the memory limit, you can try setting it a higher value (this could work or not depending on php's configuration).
如果问题是由达到内存限制引起的,您可以尝试将其设置为更高的值(这取决于 php 的配置是否有效)。
this sets the memory limit to 12 Mb
这将内存限制设置为 12 Mb
ini\_set("memory_limit","12M");
回答by RightClick
for me, fopen()has been very slow with files over 1mb, file()is much faster.
对我来说,fopen()超过 1mb 的文件一直很慢,file()但要快得多。
Just trying to read lines 100 at a time and create batch inserts, fopen()takes 37 seconds vs file()takes 4 seconds. Must be that string->arraystep built into file()
只是尝试一次读取 100 行并创建批量插入,fopen()需要 37 秒,而file()需要 4 秒。必须是string->array内置的那一步file()
I'd try all of the file handling options to see which will work best in your application.
我会尝试所有文件处理选项,看看哪个最适合您的应用程序。
回答by ólafur Waage
Have you tried file() ?
你试过 file() 吗?
http://is2.php.net/manual/en/function.file.php
http://is2.php.net/manual/en/function.file.php
Or file_ get_contents()
或 file_ get_contents()

