PHP 从文件中读取最后一行的最佳方式是什么？

Question

提问by lorenzo-s

In my PHP application I need to read multiple lines starting from the end of many files(mostly logs). Sometimes I need only the last one, sometimes I need tens or hundreds. Basically, I want something as flexible as the Unix tailcommand.

在我的 PHP 应用程序中，我需要从许多文件（主要是日志）的末尾开始读取多行。有时我只需要最后一个，有时我需要数十个或数百个。基本上，我想要像 Unixtail命令一样灵活的东西。

There are questions here about how to get the single last line from a file (but I need Nlines), and different solutions were given. I'm not sure about which one is the best and which performs better.

这里有一些关于如何从文件中获取最后一行的问题（但我需要N行），并且给出了不同的解决方案。我不确定哪个是最好的，哪个性能更好。

Answer 1

回答by lorenzo-s

Methods overview

方法概述

Searching on the internet, I came across different solutions. I can group them in three approaches:

在互联网上搜索，我遇到了不同的解决方案。我可以将它们分为三种方法：

naiveones that use file()PHP function;
cheatingones that runs tailcommand on the system;
mightyones that happily jump around an opened file using fseek().

使用file()PHP 函数的幼稚者；
tail在系统上运行命令的作弊者；
威武那些用幸福跳来跳去打开的文件fseek()。

I ended up choosing (or writing) five solutions, a naiveone, a cheatingone and three mightyones.

我最终选择了（或写了）五种解决方案，一种是幼稚的，一种是作弊的，另一种是三种强大的。

The most concise naivesolution, using built-in array functions.
The only possible solution based on tailcommand, which has a little big problem: it does not run if tailis not available, i.e. on non-Unix (Windows) or on restricted environments that don't allow system functions.
The solution in which single bytesare read from the end of file searching for (and counting) new-line characters, found here.
The multi-byte bufferedsolution optimized for large files, found here.
A slightly modified version of solution #4in which buffer length is dynamic, decided according to the number of lines to retrieve.

最简洁朴素的解决方案，使用内置数组函数。
在基于唯一可能的解决tail命令，它有一点点大的问题：如果它不跑tail不可用，即在非Unix（Windows）或在限制环境不允许的系统功能。
从文件末尾读取单个字节以搜索（和计数）换行符的解决方案，可在此处找到。
针对大文件优化的多字节缓冲解决方案，可在此处找到。
解决方案#4 的稍微修改版本，其中缓冲区长度是动态的，根据要检索的行数决定。

All solutions work. In the sense that they return the expected result from any file and for any number of lines we ask for (except for solution #1, that can break PHP memory limits in case of large files, returning nothing). But which one is better?

所有解决方案都有效。从某种意义上说，它们从任何文件和我们要求的任意数量的行返回预期结果（解决方案＃1 除外，它可以在大文件的情况下打破 PHP 内存限制，不返回任何内容）。但哪个更好？

Performance tests

性能测试

To answer the question I run tests. That's how these thing are done, isn't it?

为了回答这个问题，我进行了测试。这些事情就是这样完成的，不是吗？

I prepared a sample 100 KB filejoining together different files found in my /var/logdirectory. Then I wrote a PHP script that uses each one of the five solutions to retrieve 1, 2, .., 10, 20, ... 100, 200, ..., 1000lines from the end of the file. Each single test is repeated ten times (that's something like 5 × 28 × 10 = 1400tests), measuring average elapsed timein microseconds.

我准备了一个100 KB的示例文件，将我/var/log目录中的不同文件连接在一起。然后我编写了一个 PHP 脚本，它使用五种解决方案中的每一种从文件末尾检索1, 2, .., 10, 20, ... 100, 200, ..., 1000行。每个单个测试重复十次（类似于5 × 28 × 10 = 1400 次测试），以微秒为单位测量平均经过时间。

I run the script on my local development machine (Xubuntu 12.04, PHP 5.3.10, 2.70 GHz dual core CPU, 2 GB RAM) using the PHP command line interpreter. Here are the results:

我使用 PHP 命令行解释器在本地开发机器（Xubuntu 12.04、PHP 5.3.10、2.70 GHz 双核 CPU、2 GB RAM）上运行脚本。结果如下：

Execution time on sample 100 KB log file

示例 100 KB 日志文件的执行时间

Solution #1 and #2 seem to be the worse ones. Solution #3 is good only when we need to read a few lines. Solutions #4 and #5 seem to be the best ones.Note how dynamic buffer size can optimize the algorithm: execution time is a little smaller for few lines, because of the reduced buffer.

解决方案#1 和#2 似乎更糟糕。只有当我们需要阅读几行时，解决方案 #3 才是好的。解决方案#4 和#5 似乎是最好的。请注意动态缓冲区大小如何优化算法：由于减少了缓冲区，因此几行的执行时间略短。

Let's try with a bigger file. What if we have to read a 10 MBlog file?

让我们尝试使用更大的文件。如果我们必须读取10 MB 的日志文件怎么办？

Execution time on sample 10 MB log file

示例 10 MB 日志文件的执行时间

Now solution #1 is by far the worse one: in fact, loading the whole 10 MB file into memory is not a great idea. I run the tests also on 1MB and 100MB file, and it's practically the same situation.

现在解决方案#1 是最糟糕的一个：实际上，将整个 10 MB 文件加载到内存中并不是一个好主意。我也在 1MB 和 100MB 的文件上运行测试，这实际上是相同的情况。

And for tiny log files? That's the graph for a 10 KBfile:

对于微小的日志文件？这是10 KB文件的图表：

Execution time on sample 10 KB log file

示例 10 KB 日志文件的执行时间

Solution #1 is the best one now! Loading a 10 KB into memory isn't a big deal for PHP. Also #4 and #5 performs good. However this is an edge case: a 10 KB log means something like 150/200 lines...

解决方案#1 是目前最好的解决方案！将 10 KB 加载到内存中对 PHP 来说不是什么大问题。#4 和 #5 也表现良好。然而，这是一个边缘情况：10 KB 的日志意味着类似于 150/200 行...

You can download all my test files, sources and results here.

你可以在这里下载我所有的测试文件、来源和结果。

Final thoughts

最后的想法

Solution #5is heavily recommended for the general use case: works great with every file size and performs particularly good when reading a few lines.

强烈建议将解决方案 #5用于一般用例：适用于每个文件大小，并且在读取几行时表现特别好。

Avoid solution #1if you should read files bigger than 10 KB.

如果您应该读取大于 10 KB 的文件，请避免使用解决方案 #1。

Solution #2and #3aren't the best ones for each test I run: #2 never runs in less than 2ms, and #3 is heavily influenced by the number of lines you ask (works quite good only with 1 or 2 lines).

解决方案#2和#3并不是我运行的每个测试的最佳解决方案：#2 的运行时间永远不会少于 2ms，并且 #3 受您询问的行数的影响很大（仅适用于 1 或 2 行））。

Answer 2

回答by Kinga the Witch

This is a modified version which can also skip last lines:

这是一个修改版本，也可以跳过最后几行：

/**
 * Modified version of http://www.geekality.net/2011/05/28/php-tail-tackling-large-files/ and of https://gist.github.com/lorenzos/1711e81a9162320fde20
 * @author Kinga the Witch (Trans-dating.com), Torleif Berger, Lorenzo Stanco
 * @link http://stackoverflow.com/a/15025877/995958
 * @license http://creativecommons.org/licenses/by/3.0/
 */    
function tailWithSkip($filepath, $lines = 1, $skip = 0, $adaptive = true)
{
  // Open file
  $f = @fopen($filepath, "rb");
  if (@flock($f, LOCK_SH) === false) return false;
  if ($f === false) return false;

  if (!$adaptive) $buffer = 4096;
  else {
    // Sets buffer size, according to the number of lines to retrieve.
    // This gives a performance boost when reading a few lines from the file.
    $max=max($lines, $skip);
    $buffer = ($max < 2 ? 64 : ($max < 10 ? 512 : 4096));
  }

  // Jump to last character
  fseek($f, -1, SEEK_END);

  // Read it and adjust line number if necessary
  // (Otherwise the result would be wrong if file doesn't end with a blank line)
  if (fread($f, 1) == "\n") {
    if ($skip > 0) { $skip++; $lines--; }
  } else {
    $lines--;
  }

  // Start reading
  $output = '';
  $chunk = '';
  // While we would like more
  while (ftell($f) > 0 && $lines >= 0) {
    // Figure out how far back we should jump
    $seek = min(ftell($f), $buffer);

    // Do the jump (backwards, relative to where we are)
    fseek($f, -$seek, SEEK_CUR);

    // Read a chunk
    $chunk = fread($f, $seek);

    // Calculate chunk parameters
    $count = substr_count($chunk, "\n");
    $strlen = mb_strlen($chunk, '8bit');

    // Move the file pointer
    fseek($f, -$strlen, SEEK_CUR);

    if ($skip > 0) { // There are some lines to skip
      if ($skip > $count) { $skip -= $count; $chunk=''; } // Chunk contains less new line symbols than
      else {
        $pos = 0;

        while ($skip > 0) {
          if ($pos > 0) $offset = $pos - $strlen - 1; // Calculate the offset - NEGATIVE position of last new line symbol
          else $offset=0; // First search (without offset)

          $pos = strrpos($chunk, "\n", $offset); // Search for last (including offset) new line symbol

          if ($pos !== false) $skip--; // Found new line symbol - skip the line
          else break; // "else break;" - Protection against infinite loop (just in case)
        }
        $chunk=substr($chunk, 0, $pos); // Truncated chunk
        $count=substr_count($chunk, "\n"); // Count new line symbols in truncated chunk
      }
    }

    if (strlen($chunk) > 0) {
      // Add chunk to the output
      $output = $chunk . $output;
      // Decrease our line counter
      $lines -= $count;
    }
  }

  // While we have too many lines
  // (Because of buffer size we might have read too many)
  while ($lines++ < 0) {
    // Find first newline and remove all text before that
    $output = substr($output, strpos($output, "\n") + 1);
  }

  // Close file and return
  @flock($f, LOCK_UN);
  fclose($f);
  return trim($output);
}

Answer 3

回答by Gordon

This would also work:

这也可以：

$file = new SplFileObject("/path/to/file");
$file->seek(PHP_INT_MAX); // cheap trick to seek to EoF
$total_lines = $file->key(); // last line number

// output the last twenty lines
$reader = new LimitIterator($file, $total_lines - 20);
foreach ($reader as $line) {
    echo $line; // includes newlines
}

Or without the LimitIterator:

或者没有LimitIterator：

$file = new SplFileObject($filepath);
$file->seek(PHP_INT_MAX);
$total_lines = $file->key();
$file->seek($total_lines - 20);
while (!$file->eof()) {
    echo $file->current();
    $file->next();
}

Unfortunately, your testcase segfaults on my machine, so I cannot tell how it performs.

不幸的是，你的测试用例在我的机器上出现了段错误，所以我不知道它是如何执行的。

Answer 4

回答by user163193

My little copy paste solution after reading all this here. tail() does not close $fp cause you must kill it with Ctrl-C anyway. usleep for saving your cpu time, only tested on windows so far. You need to put this code into a class!

在这里阅读完所有内容后，我的小复制粘贴解决方案。tail() 不会关闭 $fp 因为你必须用 Ctrl-C 杀死它。usleep 用于节省您的 CPU 时间，目前仅在 Windows 上进行过测试。你需要把这段代码放到一个类中！

/**
 * @param $pathname
 */
private function tail($pathname)
{
    $realpath = realpath($pathname);
    $fp = fopen($realpath, 'r', FALSE);
    $lastline = '';
    fseek($fp, $this->tailonce($pathname, 1, false), SEEK_END);
    do {
        $line = fread($fp, 1000);
        if ($line == $lastline) {
            usleep(50);
        } else {
            $lastline = $line;
            echo $lastline;
        }
    } while ($fp);
}

/**
 * @param $pathname
 * @param $lines
 * @param bool $echo
 * @return int
 */
private function tailonce($pathname, $lines, $echo = true)
{
    $realpath = realpath($pathname);
    $fp = fopen($realpath, 'r', FALSE);
    $flines = 0;
    $a = -1;
    while ($flines <= $lines) {
        fseek($fp, $a--, SEEK_END);
        $char = fread($fp, 1);
        if ($char == "\n") $flines++;
    }
    $out = fread($fp, 1000000);
    fclose($fp);
    if ($echo) echo $out;
    return $a+2;
}

Answer 5

回答by biziclop

Yet another function, you can use regexes to separate items. Usage

另一个功能，您可以使用正则表达式来分隔项目。用法

$last_rows_array = file_get_tail('logfile.log', 100, array(
  'regex'     => true,          // use regex
  'separator' => '#\n{2,}#',   //  separator: at least two newlines
  'typical_item_size' => 200, //   line length
));

The function:

功能：

// public domain
function file_get_tail( $file, $requested_num = 100, $args = array() ){
  // default arg values
  $regex         = true;
  $separator     = null;
  $typical_item_size = 100; // estimated size
  $more_size_mul = 1.01; // +1%
  $max_more_size = 4000;
  extract( $args );
  if( $separator === null )  $separator = $regex ? '#\n+#' : "\n";

  if( is_string( $file ))  $f = fopen( $file, 'rb');
  else if( is_resource( $file ) && in_array( get_resource_type( $file ), array('file', 'stream'), true ))
    $f = $file;
  else throw new \Exception( __METHOD__.': file must be either filename or a file or stream resource');

  // get file size
  fseek( $f, 0, SEEK_END );
  $fsize = ftell( $f );
  $fpos = $fsize;
  $bytes_read = 0;

  $all_items = array(); // array of array
  $all_item_num = 0;
  $remaining_num = $requested_num;
  $last_junk = '';

  while( true ){
    // calc size and position of next chunk to read
    $size = $remaining_num * $typical_item_size - strlen( $last_junk );
    // reading a bit more can't hurt
    $size += (int)min( $size * $more_size_mul, $max_more_size );
    if( $size < 1 )  $size = 1;

    // set and fix read position
    $fpos = $fpos - $size;
    if( $fpos < 0 ){
      $size -= -$fpos;
      $fpos = 0;
    }

    // read chunk + add junk from prev iteration
    fseek( $f, $fpos, SEEK_SET );
    $chunk = fread( $f, $size );
    if( strlen( $chunk ) !== $size )  throw new \Exception( __METHOD__.": read error?");
    $bytes_read += strlen( $chunk );
    $chunk .= $last_junk;

    // chunk -> items, with at least one element
    $items = $regex ? preg_split( $separator, $chunk ) : explode( $separator, $chunk );

    // first item is probably cut in half, use it in next iteration ("junk") instead
    // also skip very first '' item
    if( $fpos > 0 || $items[0] === ''){
      $last_junk = $items[0];
      unset( $items[0] );
    } // … else noop, because this is the last iteration

    // ignore last empty item. end( empty [] ) === false
    if( end( $items ) === '')  array_pop( $items );

    // if we got items, push them
    $num = count( $items );
    if( $num > 0 ){
      $remaining_num -= $num;
      // if we read too much, use only needed items
      if( $remaining_num < 0 )  $items = array_slice( $items, - $remaining_num );
      // don't fix $remaining_num, we will exit anyway

      $all_items[] = array_reverse( $items );
      $all_item_num += $num;
    }

    // are we ready?
    if( $fpos === 0 || $remaining_num <= 0 )  break;

    // calculate a better estimate
    if( $all_item_num > 0 )  $typical_item_size = (int)max( 1, round( $bytes_read / $all_item_num ));
  }

  fclose( $f ); 

  //tr( $all_items );
  return call_user_func_array('array_merge', $all_items );
}

Answer 6

回答by sergiotarxz

I like the following method, but it won't work on files up to 2GB.

我喜欢以下方法，但它不适用于最大 2GB 的文件。

<?php
    function lastLines($file, $lines) {
        $size = filesize($file);
        $fd=fopen($file, 'r+');
        $pos = $size;
        $n=0;
        while ( $n < $lines+1 && $pos > 0) {
            fseek($fd, $pos);
            $a = fread($fd, 1);
            if ($a === "\n") {
                ++$n;
            };
            $pos--;
        }
        $ret = array();
        for ($i=0; $i<$lines; $i++) {
            array_push($ret, fgets($fd));
        }
        return $ret;
    }
    print_r(lastLines('hola.php', 4));
?>

PHP 从文件中读取最后一行的最佳方式是什么？

提问by lorenzo-s

回答by lorenzo-s

Methods overview

方法概述

Performance tests

性能测试

Final thoughts

最后的想法

回答by Kinga the Witch

回答by Gordon

回答by user163193

回答by biziclop

回答by sergiotarxz

相关推荐

最近更新

标签

PHP 从文件中读取最后一行的最佳方式是什么？

提问by lorenzo-s

回答by lorenzo-s

Methods overview

方法概述

Performance tests

性能测试

Final thoughts

最后的想法

回答by Kinga the Witch

回答by Gordon

回答by user163193

回答by biziclop

回答by sergiotarxz

相关推荐

php 按钮提交后重定向到另一个页面

php 什么是php中的函数重载和覆盖？

php php中的动态函数名

将 PHP 变量保存到文本文件

相关推荐

最近更新

标签