C# 确定文本文件中的行数

Question

提问by TK.

Is there an easy way to programmatically determine the number of lines within a text file?

有没有一种简单的方法来以编程方式确定文本文件中的行数？

Answer 1

采纳答案by Greg Beech

Seriously belated edit: If you're using .NET 4.0 or later

严重迟到的编辑：如果您使用 .NET 4.0 或更高版本

The Fileclass has a new ReadLinesmethod which lazily enumerates lines rather than greedily reading them all into an array like ReadAllLines. So now you can have both efficiency and conciseness with:

的File类有一个新的ReadLines，其懒惰地列举线而不是贪婪地读取它们全部纳入等的阵列的方法ReadAllLines。所以现在你可以同时拥有效率和简洁性：

var lineCount = File.ReadLines(@"C:\file.txt").Count();

Original Answer

原答案

If you're not too bothered about efficiency, you can simply write:

如果你不太关心效率，你可以简单地写：

var lineCount = File.ReadAllLines(@"C:\file.txt").Length;

For a more efficient method you could do:

对于更有效的方法，您可以执行以下操作：

var lineCount = 0;
using (var reader = File.OpenText(@"C:\file.txt"))
{
    while (reader.ReadLine() != null)
    {
        lineCount++;
    }
}

Edit: In response to questions about efficiency

编辑：回应有关效率的问题

The reason I said the second was more efficient was regarding memory usage, not necessarily speed. The first one loads the entire contents of the file into an array which means it must allocate at least as much memory as the size of the file. The second merely loops one line at a time so it never has to allocate more than one line's worth of memory at a time. This isn't that important for small files, but for larger files it could be an issue (if you try and find the number of lines in a 4GB file on a 32-bit system, for example, where there simply isn't enough user-mode address space to allocate an array this large).

我说第二个更有效的原因是关于内存使用，而不是速度。第一个将文件的全部内容加载到一个数组中，这意味着它必须至少分配与文件大小一样多的内存。第二个只是一次循环一行，所以它永远不必一次分配超过一行的内存。这对于小文件来说不是那么重要，但对于较大的文件，这可能是一个问题（例如，如果您尝试在 32 位系统上查找 4GB 文件中的行数，则根本没有足够的行数）分配这么大的数组的用户模式地址空间）。

In terms of speed I wouldn't expect there to be a lot in it. It's possible that ReadAllLines has some internal optimisations, but on the other hand it may have to allocate a massive chunk of memory. I'd guess that ReadAllLines might be faster for small files, but significantly slower for large files; though the only way to tell would be to measure it with a Stopwatch or code profiler.

在速度方面，我不希望有很多。ReadAllLines 可能有一些内部优化，但另一方面它可能必须分配大量内存。我猜想 ReadAllLines 对于小文件来说可能更快，但对于大文件来说则要慢得多；虽然唯一的判断方法是用秒表或代码分析器测量它。

Answer 2

回答by Mitchel Sellers

You could quickly read it in, and increment a counter, just use a loop to increment, doing nothing with the text.

您可以快速读取它，并增加一个计数器，只需使用循环来增加，不对文本做任何事情。

Answer 3

回答by leppie

The easiest:

最简单的：

int lines = File.ReadAllLines("myfile").Length;

Answer 4

回答by geocoin

count the carriage returns/line feeds. I believe in unicode they are still 0x000D and 0x000A respectively. that way you can be as efficient or as inefficient as you want, and decide if you have to deal with both characters or not

计算回车/换行。我相信 unicode 它们仍然分别是 0x000D 和 0x000A。这样你就可以随心所欲地高效或低效，并决定是否必须同时处理这两个角色

Answer 5

回答by user8456

If by easy you mean a lines of code that are easy to decipher but per chance inefficient?

如果你所说的简单是指易于破译但效率低下的代码行？

string[] lines = System.IO.File.RealAllLines($filename);
int cnt = lines.Count();

That's probably the quickest way to know how many lines.

这可能是知道多少行的最快方法。

You could also do (depending on if you are buffering it in)

你也可以做（取决于你是否在缓冲它）

#for large files
while (...reads into buffer){
string[] lines = Regex.Split(buffer,System.Enviorment.NewLine);
}

There are other numerous ways but one of the above is probably what you'll go with.

还有其他多种方法，但上述方法之一可能是您会采用的方法。

Answer 6

回答by benPearce

This would use less memory, but probably take longer

这将使用更少的内存，但可能需要更长的时间

int count = 0;
string line;
TextReader reader = new StreamReader("file.txt");
while ((line = reader.ReadLine()) != null)
{
  count++;
}
reader.Close();

Answer 7

回答by Sklivvz

You can launch the "wc.exe" executable (comes with UnixUtilsand does not need installation) run as an external process. It supports different line count methods (like unix vs mac vs windows).

您可以启动作为外部进程运行的“ wc.exe”可执行文件（随UnixUtils 提供，无需安装）。它支持不同的行数方法（如 unix vs mac vs windows）。

Answer 8

回答by Muhammad Usman -kai hiwatari

try {
    string path = args[0];
    FileStream fh = new FileStream(path, FileMode.Open, FileAccess.Read);
    int i;
    string s = "";
    while ((i = fh.ReadByte()) != -1)
        s = s + (char)i;

    //its for reading number of paragraphs
    int count = 0;
    for (int j = 0; j < s.Length - 1; j++) {
            if (s.Substring(j, 1) == "\n")
                count++;
    }

    Console.WriteLine("The total searches were :" + count);

    fh.Close();

} catch(Exception ex) {
    Console.WriteLine(ex.Message);
}

Answer 9

回答by Krythic

A viable option, and one that I have personally used, would be to add your own header to the first line of the file. I did this for a custom model format for my game. Basically, I have a tool that optimizes my .obj files, getting rid of the crap I don't need, converts them to a better layout, and then writes the total number of lines, faces, normals, vertices, and texture UVs on the very first line. That data is then used by various array buffers when the model is loaded.

一个可行的选项，也是我个人使用过的选项，是将您自己的标题添加到文件的第一行。我这样做是为了我的游戏的自定义模型格式。基本上，我有一个工具可以优化我的 .obj 文件，摆脱我不需要的废话，将它们转换为更好的布局，然后将线、面、法线、顶点和纹理 UV 的总数写入第一行。然后在加载模型时，这些数据会被各种数组缓冲区使用。

This is also useful because you only need to loop through the file once to load it in, instead of once to count the lines, and again to read the data into your created buffers.

这也很有用，因为您只需要遍历文件一次以加载它，而不是一次来计算行数，然后再次将数据读入您创建的缓冲区。

Answer 10

回答by Walter Vehoeven

Reading a file in and by itself takes some time, garbage collecting the result is another problem as you read the whole file just to count the newline character(s),

读取文件本身需要一些时间，垃圾收集结果是另一个问题，因为您读取整个文件只是为了计算换行符，

At some point, someone is going to have to read the characters in the file, regardless if this the framework or if it is your code. This means you have to open the file and read it into memory if the file is large this is going to potentially be a problem as the memory needs to be garbage collected.

在某些时候，有人将不得不读取文件中的字符，无论这是框架还是您的代码。这意味着您必须打开文件并将其读入内存，如果文件很大，这可能会成为一个问题，因为内存需要被垃圾回收。

Nima Ara made a nice analysis that you might take into consideration

尼玛阿拉做了一个很好的分析，你可能会考虑

Here is the solution proposed, as it reads 4 characters at a time, counts the line feed character and re-uses the same memory address again for the next character comparison.

这是建议的解决方案，因为它一次读取 4 个字符，计算换行符并再次使用相同的内存地址进行下一个字符比较。

private const char CR = '\r';  
private const char LF = '\n';  
private const char NULL = (char)0;

public static long CountLinesMaybe(Stream stream)  
{
    Ensure.NotNull(stream, nameof(stream));

    var lineCount = 0L;

    var byteBuffer = new byte[1024 * 1024];
    const int BytesAtTheTime = 4;
    var detectedEOL = NULL;
    var currentChar = NULL;

    int bytesRead;
    while ((bytesRead = stream.Read(byteBuffer, 0, byteBuffer.Length)) > 0)
    {
        var i = 0;
        for (; i <= bytesRead - BytesAtTheTime; i += BytesAtTheTime)
        {
            currentChar = (char)byteBuffer[i];

            if (detectedEOL != NULL)
            {
                if (currentChar == detectedEOL) { lineCount++; }

                currentChar = (char)byteBuffer[i + 1];
                if (currentChar == detectedEOL) { lineCount++; }

                currentChar = (char)byteBuffer[i + 2];
                if (currentChar == detectedEOL) { lineCount++; }

                currentChar = (char)byteBuffer[i + 3];
                if (currentChar == detectedEOL) { lineCount++; }
            }
            else
            {
                if (currentChar == LF || currentChar == CR)
                {
                    detectedEOL = currentChar;
                    lineCount++;
                }
                i -= BytesAtTheTime - 1;
            }
        }

        for (; i < bytesRead; i++)
        {
            currentChar = (char)byteBuffer[i];

            if (detectedEOL != NULL)
            {
                if (currentChar == detectedEOL) { lineCount++; }
            }
            else
            {
                if (currentChar == LF || currentChar == CR)
                {
                    detectedEOL = currentChar;
                    lineCount++;
                }
            }
        }
    }

    if (currentChar != LF && currentChar != CR && currentChar != NULL)
    {
        lineCount++;
    }
    return lineCount;
}

Above you can see that a line is read one character at a time as well by the underlying framework as you need to read all characters to see the line feed.

在上面，您可以看到底层框架一次读取一个字符，因为您需要读取所有字符才能查看换行符。

If you profile it as done bay Nima you would see that this is a rather fast and efficient way of doing this.

如果您将其描述为 done bay Nima，您会发现这是一种相当快速和有效的方法。

C# 确定文本文件中的行数

提问by TK.

采纳答案by Greg Beech

回答by Mitchel Sellers

回答by leppie

回答by geocoin

回答by user8456

回答by benPearce

回答by Sklivvz

回答by Muhammad Usman -kai hiwatari

回答by Krythic

回答by Walter Vehoeven

相关推荐

最近更新

标签

C# 确定文本文件中的行数

提问by TK.

采纳答案by Greg Beech

回答by Mitchel Sellers

回答by leppie

回答by geocoin

回答by user8456

回答by benPearce

回答by Sklivvz

回答by Muhammad Usman -kai hiwatari

回答by Krythic

回答by Walter Vehoeven

相关推荐

C# 如何将通用属性作为参数传递给函数？

C#中的大数组算法

C# 无需读取整个文件即可获取图像尺寸

C# 自动实现的 getter 和 setter 与公共字段

相关推荐

最近更新

标签