Linux 您将如何有效地实施tail?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10164597/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 05:47:46  来源:igfitidea点击:

How would you implement tail efficiently?

clinuxunixtail

提问by Tomas Pruzina

What is the efficient way to implement tail in *NIX? I came up (wrote) with two simple solution, both using kind of circular buffer to load lines into circular structure (array | doubly linked circular list - for fun). I've seen part of older implementation in busybox and from what I understood, they used fseek to find EOF and then read stuff "backwards". Is there anything cleaner and faster out there? I got asked this on interview and asker did not look satisfied. Thank you in advance.

在 *NIX 中实现 tail 的有效方法是什么?我想出了(写)了两个简单的解决方案,它们都使用一种循环缓冲区将行加载到循环结构中(数组 | 双向链接循环列表 - 为了好玩)。我已经在busybox 中看到了旧实现的一部分,据我所知,他们使用 fseek 来查找 EOF,然后“向后”读取内容。有什么更干净、更快的东西吗?我在面试中被问到这个问题,问者看起来并不满意。先感谢您。

采纳答案by akappa

I don't think there are solutions different than "keep the latest N lines while reading forward the data" or "start from the end and go backwards until you read the Nth line".

我不认为有与“向前读取数据时保留最新的 N 行”或“从末尾开始向后移动直到读取第 N 行”不同的解决方案。

The point is that you'd use one or the another based on the context.

关键是你会根据上下文使用一个或另一个。

The "go to the end and go backwards" is better when tail accesses a random access file, or when the data is small enough to be put on memory. In this case the runtime is minimized, since you scan the data that has to be outputted (so, it's "optimal")

当尾部访问随机访问文件时,或者当数据小到足以放入内存时,“走到最后并向后退”更好。在这种情况下,运行时间被最小化,因为您扫描了必须输出的数据(因此,它是“最佳的”)

Your solution (keep the N latest lines) is better when tail is fed with a pipeline or when the data is huge. In this case, the other solution wastes too much memory, so it is not practical and, in the case the source is slower than tail (which is probable) scanning all the file doesn't matter that much.

当使用管道馈送尾部或数据量很大时,您的解决方案(保留 N 个最新行)会更好。在这种情况下,另一种解决方案浪费了太多内存,因此这是不切实际的,并且在源比尾部(可能)慢的情况下,扫描所有文件并不那么重要。

回答by thumbmunkeys

Read backwards from the end of the file until Nlinebreaks are read or the beginning of the file is reached.

从文件末尾向后读取,直到N读取换行符或到达文件开头。

Then print what was just read.

然后打印刚刚阅读的内容。

I dont think any fancy datastructures are needed here.

我认为这里不需要任何花哨的数据结构。

Here is the source code of tailif you're interested.

如果您有兴趣,这里是 tail 的源代码

回答by Andres Julian Ramirez Gomez

/*This example implements the option n of tail command.*/

/*This example implements the option n of tail command.*/

#define _FILE_OFFSET_BITS 64
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <getopt.h>

#define BUFF_SIZE 4096

FILE *openFile(const char *filePath)
{
  FILE *file;
  file= fopen(filePath, "r");
  if(file == NULL)
  {
    fprintf(stderr,"Error opening file: %s\n",filePath);
    exit(errno);
  }
  return(file);
}

void printLine(FILE *file, off_t startline)
{
  int fd;
  fd= fileno(file);
  int nread;
  char buffer[BUFF_SIZE];
  lseek(fd,(startline + 1),SEEK_SET);
  while((nread= read(fd,buffer,BUFF_SIZE)) > 0)
  {
    write(STDOUT_FILENO, buffer, nread);
  }
}

void walkFile(FILE *file, long nlines)
{
  off_t fposition;
  fseek(file,0,SEEK_END);
  fposition= ftell(file);
  off_t index= fposition;
  off_t end= fposition;
  long countlines= 0;
  char cbyte;

  for(index; index >= 0; index --)
  {
    cbyte= fgetc(file);
    if (cbyte == '\n' && (end - index) > 1)
    {
      countlines ++;
      if(countlines == nlines)
      {
    break;
      }
     }
    fposition--;
    fseek(file,fposition,SEEK_SET);
  }
  printLine(file, fposition);
  fclose(file);
}

int main(int argc, char *argv[])
{
  FILE *file;
  file= openFile(argv[2]);
  walkFile(file, atol(argv[1]));
  return 0;
}

/*Note: take in mind that i not wrote code to parse input options and arguments, neither code to check if the lines number argument is really a number.*/

回答by Bernd Elkemann

First use fseekto find the end-of-file then subtract 512 and fseekto that offset, then read forward from there to end. Count the number of line-breaks because if there are too few you will have to do the same with a subtracted offset of 1024 ...but in 99% of cases 512 will be enough.

首先用于fseek查找文件结尾,然后减去 512 和fseek该偏移量,然后从那里向前读取到结尾。计算换行符的数量,因为如果换行符太少,您将不得不用减去 1024 的偏移量来做同样的事情……但在 99% 的情况下,512 就足够了。

This (1)avoids reading the whole file forward and (2)the reason why this is probably more efficient than reading backwards from the end is that reading forward is typically faster.

(1)避免了向前读取整个文件和(2)这可能比从末尾向后读取更有效的原因是向前读取通常更快。