C# StreamReader.Readline() 真的是计算文件行数的最快方法吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14243249/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is StreamReader.Readline() really the fastest method to count lines in a file?
提问by sergeidave
While looking around for a while I found quite a few discussions on how to figure out the number of lines in a file.
在环顾四周时,我发现了很多关于如何计算文件中行数的讨论。
For example these three:
c# how do I count lines in a textfile
Determine the number of lines within a text file
How to count lines fast?
例如这三个:
c# 如何计算
文本文件中的行数 确定文本文件中的行数
如何快速计算行数?
So, I went ahead and ended up using what seems to be the most efficient (at least memory-wise?) method that I could find:
所以,我继续前进并最终使用了我能找到的似乎最有效(至少在内存方面?)的方法:
private static int countFileLines(string filePath)
{
using (StreamReader r = new StreamReader(filePath))
{
int i = 0;
while (r.ReadLine() != null)
{
i++;
}
return i;
}
}
But this takes forever when the lines themselves from the file are very long. Is there really not a faster solution to this?
但是当文件中的行本身很长时,这需要永远。真的没有更快的解决方案吗?
I've been trying to use StreamReader.Read()or StreamReader.Peek()but I can't (or don't know how to) make the either of them move on to the next line as soon as there's 'stuff' (chars? text?).
我一直在尝试使用StreamReader.Read()orStreamReader.Peek()但我不能(或不知道如何)在有“东西”(字符?文本?)时让它们中的任何一个移动到下一行。
Any ideas please?
请问有什么想法吗?
CONCLUSION/RESULTS(After running some tests based on the answers provided):
结论/结果(根据提供的答案运行一些测试后):
I tested the 5 methods below on two different files and I got consistent results that seem to indicate that plain old StreamReader.ReadLine()is still one of the fastest ways... To be honest, I'm perplexed after all the comments and discussion in the answers.
我在两个不同的文件上测试了下面的 5 种方法,我得到了一致的结果,这似乎表明普通旧StreamReader.ReadLine()方法仍然是最快的方法之一......老实说,在答案中的所有评论和讨论之后,我感到很困惑。
File #1:
Size: 3,631 KB
Lines: 56,870
文件 #1:
大小:3,631 KB
行数:56,870
Results in seconds for File #1:
0.02 --> ReadLine method.
0.04 --> Read method.
0.29 --> ReadByte method.
0.25 --> Readlines.Count method.
0.04 --> ReadWithBufferSize method.
文件 #1 的结果(以秒为单位):
0.02 --> ReadLine 方法。
0.04 --> 读取方法。
0.29 --> ReadByte 方法。
0.25 --> Readlines.Count 方法。
0.04 --> ReadWithBufferSize 方法。
File #2:
Size: 14,499 KB
Lines: 213,424
文件 #2:
大小
:14,499 KB行数:213,424
Results in seconds for File #1:
0.08 --> ReadLine method.
0.19 --> Read method.
1.15 --> ReadByte method.
1.02 --> Readlines.Count method.
0.08 --> ReadWithBufferSize method.
文件 #1 的结果(以秒为单位):
0.08 --> ReadLine 方法。
0.19 --> 读取方法。
1.15 --> ReadByte 方法。
1.02 --> Readlines.Count 方法。
0.08 --> ReadWithBufferSize 方法。
Here are the 5 methods I tested based on all the feedback I received:
以下是我根据收到的所有反馈测试的 5 种方法:
private static int countWithReadLine(string filePath)
{
using (StreamReader r = new StreamReader(filePath))
{
int i = 0;
while (r.ReadLine() != null)
{
i++;
}
return i;
}
}
private static int countWithRead(string filePath)
{
using (StreamReader _reader = new StreamReader(filePath))
{
int c = 0, count = 0;
while ((c = _reader.Read()) != -1)
{
if (c == 10)
{
count++;
}
}
return count;
}
}
private static int countWithReadByte(string filePath)
{
using (Stream s = new FileStream(filePath, FileMode.Open))
{
int i = 0;
int b;
b = s.ReadByte();
while (b >= 0)
{
if (b == 10)
{
i++;
}
b = s.ReadByte();
}
return i;
}
}
private static int countWithReadLinesCount(string filePath)
{
return File.ReadLines(filePath).Count();
}
private static int countWithReadAndBufferSize(string filePath)
{
int bufferSize = 512;
using (Stream s = new FileStream(filePath, FileMode.Open))
{
int i = 0;
byte[] b = new byte[bufferSize];
int n = 0;
n = s.Read(b, 0, bufferSize);
while (n > 0)
{
i += countByteLines(b, n);
n = s.Read(b, 0, bufferSize);
}
return i;
}
}
private static int countByteLines(byte[] b, int n)
{
int i = 0;
for (int j = 0; j < n; j++)
{
if (b[j] == 10)
{
i++;
}
}
return i;
}
采纳答案by TomTom
No, it is not. Point is - it materializes the strings, which is not needed.
不它不是。重点是 - 它实现了不需要的字符串。
To COUNT it you are much better off to ignore the "string" Part and to go the "line" Part.
要计算它,您最好忽略“字符串”部分并转到“线”部分。
a LINE is a seriees of bytes ending with \r\n (13, 10 - CR LF) or another marker.
LINE 是一系列以 \r\n (13, 10 - CR LF) 或其他标记结尾的字节。
Just run along the bytes, in a buffered stream, counting the number of appearances of your end of line marker.
只需在缓冲流中沿着字节运行,计算行尾标记出现的次数。
回答by Brian
public static int CountLines(Stream stm)
{
StreamReader _reader = new StreamReader(stm);
int c = 0, count = 0;
while ((c = _reader.Read()) != -1)
{
if (c == '\n')
{
count++;
}
}
return count;
}
回答by Hogan
The best way to know how to do this fast is to think about the fastest way to do it without using C/C++.
知道如何快速做到这一点的最好方法是考虑不使用 C/C++ 的最快方法。
In assembly there is a CPU level operation that scans memory for a character so in assembly you would do the following
在汇编中有一个 CPU 级别的操作,它会扫描内存中的字符,因此在汇编中您将执行以下操作
- Read big part (or all) of the file into memory
- Execute the SCASB command
- Repeat as needed
- 将文件的大部分(或全部)读入内存
- 执行 SCASB 命令
- 根据需要重复
So, in C# you want the compiler to get as close to that as possible.
因此,在 C# 中,您希望编译器尽可能接近它。
回答by Guffa
Yes, reading lines like that is the fastest and easiest way in any practical sense.
是的,在任何实际意义上,阅读这样的台词都是最快和最简单的方法。
There are no shortcuts here. Files are not line based, so you have to read every single byte from the file to determine how many lines there are.
这里没有捷径。文件不是基于行的,因此您必须从文件中读取每个字节以确定有多少行。
As TomTom pointed out, creating the strings is not strictly needed to count the lines, but a vast majority of the time spent will be waiting for the data to be read from the disk. Writing a much more complicated algorithm would perhaps shave off a percent of the execution time, and it would dramatically increase the time for writing and testing the code.
正如 TomTom 所指出的,创建字符串并不是严格计算行数所必需的,但是花费的绝大多数时间都将用于等待从磁盘读取数据。编写更复杂的算法可能会减少百分之一的执行时间,并且会显着增加编写和测试代码的时间。
回答by Nick Bray
I tried multiple methods and tested their performance:
我尝试了多种方法并测试了它们的性能:
The one that reads a single byte is about 50% slower than the other methods. The other methods all return around the same amount of time. You could try creating threads and doing this asynchronously, so while you are waiting for a read you can start processing a previous read. That sounds like a headache to me.
读取单个字节的方法比其他方法慢约 50%。其他方法都返回大约相同的时间。您可以尝试创建线程并异步执行此操作,以便在等待读取时开始处理先前的读取。这对我来说听起来很头疼。
I would go with the one liner: File.ReadLines(filePath).Count();it performs as well as the other methods I tested.
我会选择一种衬垫:File.ReadLines(filePath).Count();它的性能与我测试的其他方法一样好。
private static int countFileLines(string filePath)
{
using (StreamReader r = new StreamReader(filePath))
{
int i = 0;
while (r.ReadLine() != null)
{
i++;
}
return i;
}
}
private static int countFileLines2(string filePath)
{
using (Stream s = new FileStream(filePath, FileMode.Open))
{
int i = 0;
int b;
b = s.ReadByte();
while (b >= 0)
{
if (b == 10)
{
i++;
}
b = s.ReadByte();
}
return i + 1;
}
}
private static int countFileLines3(string filePath)
{
using (Stream s = new FileStream(filePath, FileMode.Open))
{
int i = 0;
byte[] b = new byte[bufferSize];
int n = 0;
n = s.Read(b, 0, bufferSize);
while (n > 0)
{
i += countByteLines(b, n);
n = s.Read(b, 0, bufferSize);
}
return i + 1;
}
}
private static int countByteLines(byte[] b, int n)
{
int i = 0;
for (int j = 0; j < n; j++)
{
if (b[j] == 10)
{
i++;
}
}
return i;
}
private static int countFileLines4(string filePath)
{
return File.ReadLines(filePath).Count();
}
回答by Nick Bray
There are numerous ways to read a file. Usually, the fastestway is the simplest:
有多种方法可以读取文件。通常,最快的方法是最简单的:
using (StreamReader sr = File.OpenText(fileName))
{
string s = String.Empty;
while ((s = sr.ReadLine()) != null)
{
//do what you gotta do here
}
}
This page does a great performance comparisonbetween several different techniques including using BufferedReaders, reading into StringBuilder objects, and into an entire array.
这个页面对几种不同的技术进行了很好的性能比较,包括使用 BufferedReaders、读入 StringBuilder 对象和读入整个数组。
回答by Slai
StreamReaderis not the fastest way to read files in general because of the small overhead from encoding the bytes to characters, so reading the file in a byte array is faster.
The results I get are a bit different each time due to caching and other processes, but here is one of the results I got (in milliseconds) with a 16 MB file :
StreamReader通常不是读取文件的最快方法,因为将字节编码为字符的开销很小,因此读取字节数组中的文件更快。
由于缓存和其他进程,我每次得到的结果都有些不同,但这是我得到的结果之一(以毫秒为单位),文件大小为 16 MB:
75 ReadLines
82 ReadLine
22 ReadAllBytes
23 Read 32K
21 Read 64K
27 Read 128K
In general File.ReadLinesshould be a little bit slower than a StreamReader.ReadLineloop.
File.ReadAllBytesis slower with bigger files and will throw out of memory exception with huge files.
The default buffer size for FileStreamis 4K, but on my machine 64K seemed the fastest.
一般来说File.ReadLines应该比StreamReader.ReadLine循环慢一点 。
File.ReadAllBytes较大的文件速度较慢,并且会抛出大文件的内存不足异常。默认缓冲区大小FileStream是 4K,但在我的机器上 64K 似乎是最快的。
private static int countWithReadLines(string filePath)
{
int count = 0;
var lines = File.ReadLines(filePath);
foreach (var line in lines) count++;
return count;
}
private static int countWithReadLine(string filePath)
{
int count = 0;
using (var sr = new StreamReader(filePath))
while (sr.ReadLine() != null)
count++;
return count;
}
private static int countWithFileStream(string filePath, int bufferSize = 1024 * 4)
{
using (var fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
int count = 0;
byte[] array = new byte[bufferSize];
while (true)
{
int length = fs.Read(array, 0, bufferSize);
for (int i = 0; i < length; i++)
if(array[i] == 10)
count++;
if (length < bufferSize) return count;
}
} // end of using
}
and tested with:
并测试:
var path = "1234567890.txt"; Stopwatch sw; string s = "";
File.WriteAllLines(path, Enumerable.Repeat("1234567890abcd", 1024 * 1024 )); // 16MB (16 bytes per line)
sw = Stopwatch.StartNew(); countWithReadLines(path) ; sw.Stop(); s += sw.ElapsedMilliseconds + " ReadLines \n";
sw = Stopwatch.StartNew(); countWithReadLine(path) ; sw.Stop(); s += sw.ElapsedMilliseconds + " ReadLine \n";
sw = Stopwatch.StartNew(); countWithReadAllBytes(path); sw.Stop(); s += sw.ElapsedMilliseconds + " ReadAllBytes \n";
sw = Stopwatch.StartNew(); countWithFileStream(path, 1024 * 32); sw.Stop(); s += sw.ElapsedMilliseconds + " Read 32K \n";
sw = Stopwatch.StartNew(); countWithFileStream(path, 1024 * 64); sw.Stop(); s += sw.ElapsedMilliseconds + " Read 64K \n";
sw = Stopwatch.StartNew(); countWithFileStream(path, 1024 *128); sw.Stop(); s += sw.ElapsedMilliseconds + " Read 128K \n";
MessageBox.Show(s);

