C# 获取最后 10 行非常大的文本文件 > 10GB

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/398378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 01:54:47  来源:igfitidea点击:

Get last 10 lines of very large text file > 10GB

c#textlarge-files

提问by Chris Conway

What is the most efficient way to display the last 10 lines of a very large text file (this particular file is over 10GB). I was thinking of just writing a simple C# app but I'm not sure how to do this effectively.

显示非常大的文本文件(这个特定文件超过 10GB)的最后 10 行的最有效方法是什么?我正在考虑编写一个简单的 C# 应用程序,但我不确定如何有效地做到这一点。

采纳答案by jason

Read to the end of the file, then seek backwards until you find ten newlines, and then read forward to the end taking into consideration various encodings. Be sure to handle cases where the number of lines in the file is less than ten. Below is an implementation (in C# as you tagged this), generalized to find the last numberOfTokensin the file located at pathencoded in encodingwhere the token separator is represented by tokenSeparator; the result is returned as a string(this could be improved by returning an IEnumerable<string>that enumerates the tokens).

读到文件末尾,然后向后查找,直到找到十个换行符,然后考虑到各种编码,向前读到末尾。一定要处理文件中的行数少于十的情况。下面是一个实现(在 C# 中,正如您标记的那样),概括为找到numberOfTokens位于path编码中的文件中的最后一个,encoding其中标记分隔符由tokenSeparator; 结果作为 a 返回string(这可以通过返回一个IEnumerable<string>枚举令牌来改进)。

public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) {

    int sizeOfChar = encoding.GetByteCount("\n");
    byte[] buffer = encoding.GetBytes(tokenSeparator);


    using (FileStream fs = new FileStream(path, FileMode.Open)) {
        Int64 tokenCount = 0;
        Int64 endPosition = fs.Length / sizeOfChar;

        for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) {
            fs.Seek(-position, SeekOrigin.End);
            fs.Read(buffer, 0, buffer.Length);

            if (encoding.GetString(buffer) == tokenSeparator) {
                tokenCount++;
                if (tokenCount == numberOfTokens) {
                    byte[] returnBuffer = new byte[fs.Length - fs.Position];
                    fs.Read(returnBuffer, 0, returnBuffer.Length);
                    return encoding.GetString(returnBuffer);
                }
            }
        }

        // handle case where number of tokens in file is less than numberOfTokens
        fs.Seek(0, SeekOrigin.Begin);
        buffer = new byte[fs.Length];
        fs.Read(buffer, 0, buffer.Length);
        return encoding.GetString(buffer);
    }
}

回答by ctacke

I'd likely just open it as a binary stream, seek to the end, then back up looking for line breaks. Back up 10 (or 11 depending on that last line) to find your 10 lines, then just read to the end and use Encoding.GetString on what you read to get it into a string format. Split as desired.

我可能只是将它作为二进制流打开,寻找到最后,然后返回寻找换行符。备份 10(或 11,取决于最后一行)以找到您的 10 行,然后读到最后并使用 Encoding.GetString 对您读取的内容将其转换为字符串格式。根据需要拆分。

回答by Lolindrath

You should be able to use FileStream.Seek()to move to the end of the file, then work your way backwards, looking for \n until you have enough lines.

您应该能够使用FileStream.Seek()移动到文件的末尾,然后向后工作,寻找 \n 直到您有足够的行。

回答by w4g3n3r

Tail? Tail is a unix command that will display the last few lines of a file. There is a Windows version in the Windows 2003 Server resource kit.

尾巴?Tail 是一个 unix 命令,它将显示文件的最后几行。Windows 2003 Server 资源工具包中有一个 Windows 版本。

回答by zendar

That is what unix tail command does. See http://en.wikipedia.org/wiki/Tail_(Unix)

这就是 unix tail 命令所做的。见http://en.wikipedia.org/wiki/Tail_(Unix)

There is lots of open source implementations on internet and here is one for win32: Tail for WIn32

互联网上有很多开源实现,这里是 win32 的一个:Tail for WIn32

回答by Jared

You could use the windows version of the tailcommand and just pype it's output to a text file with the > symbol or view it on the screen depending on what your needs are.

您可以使用tail命令的 Windows 版本,只需将其输出到带有 > 符号的文本文件中,或者根据您的需要在屏幕上查看它。

回答by Jon Skeet

As the others have suggested, you can go to the end of the file and read backwards, effectively. However, it's slightly tricky - particularly because if you have a variable-length encoding (such as UTF-8) you need to be cunning about making sure you get "whole" characters.

正如其他人所建议的那样,您可以转到文件末尾并有效地向后阅读。但是,这有点棘手 - 特别是因为如果您有可变长度编码(例如 UTF-8),您需要狡猾地确保获得“完整”字符。

回答by Steven Behnke

If you open the file with FileMode.Append it will seek to the end of the file for you. Then you could seek back the number of bytes you want and read them. It might not be fast though regardless of what you do since that's a pretty massive file.

如果您使用 FileMode.Append 打开文件,它将为您查找文件末尾。然后你可以找回你想要的字节数并读取它们。不管你做什么,它可能不会很快,因为这是一个非常大的文件。

回答by biozinc

One useful method is FileInfo.Length. It gives the size of a file in bytes.

一种有用的方法是FileInfo.Length。它以字节为单位给出文件的大小。

What structure is your file? Are you sure the last 10 lines will be near the end of the file? If you have a file with 12 lines of text and 10GB of 0s, then looking at the end won't really be that fast. Then again, you might have to look through the whole file.

你的文件是什么结构?您确定最后 10 行将在文件末尾附近吗?如果您有一个包含 12 行文本和 10GB 0 的文件,那么查看结尾不会那么快。再说一次,您可能需要查看整个文件。

If you are sure that the file contains numerous short strings each on a new line, seek to the end, then check back until you've counted 11 end of lines. Then you can read forward for the next 10 lines.

如果您确定该文件在一个新行中包含许多短字符串,请查找到末尾,然后返回检查,直到您数完 11 行结尾。然后您可以向前阅读接下来的 10 行。

回答by Eric Ness

I'm not sure how efficient it will be, but in Windows PowerShell getting the last ten lines of a file is as easy as

我不确定它的效率有多高,但在 Windows PowerShell 中获取文件的最后十行就像

Get-Content file.txt | Select-Object -last 10