php 有效地计算文本文件的行数。(200mb+)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2162497/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficiently counting the number of lines of a text file. (200mb+)
提问by Abs
I have just found out that my script gives me a fatal error:
我刚刚发现我的脚本给了我一个致命错误:
Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 440 bytes) in C:\process_txt.php on line 109
That line is this:
那一行是这样的:
$lines = count(file($path)) - 1;
So I think it is having difficulty loading the file into memeory and counting the number of lines, is there a more efficient way I can do this without having memory issues?
所以我认为将文件加载到内存中并计算行数有困难,有没有更有效的方法可以在没有内存问题的情况下做到这一点?
The text files that I need to count the number of lines for range from 2MB to 500MB. Maybe a Gig sometimes.
我需要计算 2MB 到 500MB 范围内的行数的文本文件。有时可能是演出。
Thanks all for any help.
感谢大家的帮助。
回答by Dominic Rodger
This will use less memory, since it doesn't load the whole file into memory:
这将使用更少的内存,因为它不会将整个文件加载到内存中:
$file="largefile.txt";
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
$line = fgets($handle);
$linecount++;
}
fclose($handle);
echo $linecount;
fgetsloads a single line into memory (if the second argument $lengthis omitted it will keep reading from the stream until it reaches the end of the line, which is what we want). This is still unlikely to be as quick as using something other than PHP, if you care about wall time as well as memory usage.
fgets将一行加载到内存中(如果$length省略第二个参数,它将继续从流中读取,直到到达行尾,这正是我们想要的)。如果您关心挂机时间和内存使用情况,这仍然不太可能像使用 PHP 以外的其他东西一样快。
The only danger with this is if any lines are particularly long (what if you encounter a 2GB file without line breaks?). In which case you're better off doing slurping it in in chunks, and counting end-of-line characters:
唯一的危险是如果任何行特别长(如果您遇到没有换行符的 2GB 文件怎么办?)。在这种情况下,您最好将其分块放入并计算行尾字符:
$file="largefile.txt";
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
$line = fgets($handle, 4096);
$linecount = $linecount + substr_count($line, PHP_EOL);
}
fclose($handle);
echo $linecount;
回答by Ja?ck
Using a loop of fgets()calls is fine solution and the most straightforward to write, however:
使用循环fgets()调用是很好的解决方案,而且编写起来最简单,但是:
even though internally the file is read using a buffer of 8192 bytes, your code still has to call that function for each line.
it's technically possible that a single line may be bigger than the available memory if you're reading a binary file.
即使在内部使用 8192 字节的缓冲区读取文件,您的代码仍然必须为每一行调用该函数。
如果您正在读取二进制文件,那么从技术上讲,一行可能比可用内存大。
This code reads a file in chunks of 8kB each and then counts the number of newlines within that chunk.
此代码以每个 8kB 的块读取文件,然后计算该块中的换行符数。
function getLines($file)
{
$f = fopen($file, 'rb');
$lines = 0;
while (!feof($f)) {
$lines += substr_count(fread($f, 8192), "\n");
}
fclose($f);
return $lines;
}
If the average length of each line is at most 4kB, you will already start saving on function calls, and those can add up when you process big files.
如果每行的平均长度最多为 4kB,则您已经开始节省函数调用,并且在处理大文件时这些会加起来。
Benchmark
基准
I ran a test with a 1GB file; here are the results:
我用 1GB 的文件进行了测试;结果如下:
+-------------+------------------+---------+
| This answer | Dominic's answer | wc -l |
+------------+-------------+------------------+---------+
| Lines | 3550388 | 3550389 | 3550388 |
+------------+-------------+------------------+---------+
| Runtime | 1.055 | 4.297 | 0.587 |
+------------+-------------+------------------+---------+
Time is measured in seconds real time, see herewhat real means
时间是以秒为单位的实时测量,看看这里真正的意思
回答by Wallace Maxters
Simple Oriented Object solution
简单的面向对象解决方案
$file = new \SplFileObject('file.extension');
while($file->valid()) $file->fgets();
var_dump($file->key());
Update
更新
Another way to make this is with PHP_INT_MAXin SplFileObject::seekmethod.
另一种方法是使用PHP_INT_MAXinSplFileObject::seek方法。
$file = new \SplFileObject('file.extension', 'r');
$file->seek(PHP_INT_MAX);
echo $file->key() + 1;
回答by Dave Sherohman
If you're running this on a Linux/Unix host, the easiest solution would be to use exec()or similar to run the command wc -l $path. Just make sure you've sanitized $pathfirst to be sure that it isn't something like "/path/to/file ; rm -rf /".
如果您在 Linux/Unix 主机上运行它,最简单的解决方案是使用exec()或类似的方式运行命令wc -l $path。只需确保您$path先进行了消毒,以确保它不是“/path/to/file ; rm -rf /”之类的东西。
回答by Andy Braham
There is a faster way I found that does not require looping through the entire file
我发现有一种更快的方法不需要遍历整个文件
only on *nix systems, there might be a similar way on windows ...
仅在 *nix 系统上,Windows 上可能有类似的方式......
$file = '/path/to/your.file';
//Get number of lines
$totalLines = intval(exec("wc -l '$file'"));
回答by Ben Harold
If you're using PHP 5.5 you can use a generator. This will NOTwork in any version of PHP before 5.5 though. From php.net:
如果您使用的是 PHP 5.5,则可以使用generator。这不是在PHP的任何版本5.5,虽然之前的工作。来自 php.net:
"Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface."
“生成器提供了一种简单的方法来实现简单的迭代器,而无需实现实现迭代器接口的类的开销或复杂性。”
// This function implements a generator to load individual lines of a large file
function getLines($file) {
$f = fopen($file, 'r');
// read each line of the file without loading the whole file to memory
while ($line = fgets($f)) {
yield $line;
}
}
// Since generators implement simple iterators, I can quickly count the number
// of lines using the iterator_count() function.
$file = '/path/to/file.txt';
$lineCount = iterator_count(getLines($file)); // the number of lines in the file
回答by elkolotfi
If you're under linux you can simply do:
如果你在 linux 下,你可以简单地做:
number_of_lines = intval(trim(shell_exec("wc -l ".$file_name." | awk '{print }'")));
You just have to find the right command if you're using another OS
如果您使用的是其他操作系统,您只需要找到正确的命令
Regards
问候
回答by Jani
This is an addition to Wallace de Souza'ssolution
这是对Wallace de Souza解决方案的补充
It also skips empty lines while counting:
它还会在计数时跳过空行:
function getLines($file)
{
$file = new \SplFileObject($file, 'r');
$file->setFlags(SplFileObject::READ_AHEAD | SplFileObject::SKIP_EMPTY |
SplFileObject::DROP_NEW_LINE);
$file->seek(PHP_INT_MAX);
return $file->key() + 1;
}
回答by ufk
private static function lineCount($file) {
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
if (fgets($handle) !== false) {
$linecount++;
}
}
fclose($handle);
return $linecount;
}
I wanted to add a little fix to the function above...
我想为上面的函数添加一些修复...
in a specific example where i had a file containing the word 'testing' the function returned 2 as a result. so i needed to add a check if fgets returned false or not :)
在一个特定示例中,我有一个包含单词“testing”的文件,该函数结果返回 2。所以我需要添加一个检查 fgets 是否返回 false :)
have fun :)
玩得开心 :)
回答by Santosh Kumar
Counting the number of lines can be done by following codes:
可以通过以下代码计算行数:
<?php
$fp= fopen("myfile.txt", "r");
$count=0;
while($line = fgetss($fp)) // fgetss() is used to get a line from a file ignoring html tags
$count++;
echo "Total number of lines are ".$count;
fclose($fp);
?>

