Java 如何以有效的方式获取文件中的行数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1277880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I get the count of line in a file in an efficient way?
提问by firstthumb
I have a big file. It includes approximately 3.000-20.000 lines. How can I get the total count of lines in the file using Java?
我有一个大文件。它包括大约 3.000-20.000 行。如何使用 Java 获取文件中的总行数?
采纳答案by Mnementh
BufferedReader reader = new BufferedReader(new FileReader("file.txt"));
int lines = 0;
while (reader.readLine() != null) lines++;
reader.close();
Update:To answer the performance-question raised here, I made a measurement. First thing: 20.000 lines are too few, to get the program running for a noticeable time. I created a text-file with 5 million lines. This solution (started with java without parameters like -server or -XX-options) needed around 11 seconds on my box. The same with wc -l
(UNIX command-line-tool to count lines), 11 seconds. The solution reading every single character and looking for '\n' needed 104 seconds, 9-10 times as much.
更新:为了回答这里提出的性能问题,我做了一个测量。第一件事:20.000 行太少了,无法让程序运行很长时间。我创建了一个 500 万行的文本文件。这个解决方案(从没有 -server 或 -XX-options 等参数的 java 开始)在我的盒子上需要大约 11 秒。与wc -l
(UNIX command-line-tool to count lines) 相同,11 秒。读取每个字符并查找 '\n' 的解决方案需要 104 秒,是它的 9-10 倍。
回答by Esko Luontola
Read the file through and count the number of newline characters. An easy way to read a file in Java, one line at a time, is the java.util.Scannerclass.
通读文件并计算换行符的数量。在 Java 中读取文件的一种简单方法(一次一行)是java.util.Scanner类。
回答by Ken Liu
Read the file line by line and increment a counter for each line until you have read the entire file.
逐行读取文件并为每一行增加一个计数器,直到您阅读了整个文件。
回答by Narayan
use LineNumberReader
something like
就像是
public static int countLines(File aFile) throws IOException {
LineNumberReader reader = null;
try {
reader = new LineNumberReader(new FileReader(aFile));
while ((reader.readLine()) != null);
return reader.getLineNumber();
} catch (Exception ex) {
return -1;
} finally {
if(reader != null)
reader.close();
}
}
回答by NSherwin
The buffered reader is overkill
缓冲阅读器是矫枉过正
Reader r = new FileReader("f.txt");
int count = 0;
int nextchar = 0;
while (nextchar != -1){
nextchar = r.read();
if (nextchar == Character.getNumericValue('\n') ){
count++;
}
}
My search for a simple example has createde one thats actually quite poor. calling read() repeadedly for a single character is less than optimal. see herefor examples and measurements.
我对一个简单示例的搜索创建了一个实际上很差的示例。对单个字符重复调用 read() 不是最佳的。有关示例和测量,请参见此处。
回答by Malax
All previous answers suggest to read though the whole file and count the amount of newlines you find while doing this. You commented some as "not effective" but thats the only way you can do that. A "line" is nothing else as a simple character inside the file. And to count that character you must have a look at every single character within the file.
所有以前的答案都建议通读整个文件并计算您在执行此操作时找到的换行符数量。你评论了一些“无效”,但这是你能做到的唯一方法。“行”只是文件中的一个简单字符。要计算该字符,您必须查看文件中的每个字符。
I'm sorry, but you have no choice. :-)
对不起,你别无选择。:-)
回答by blackNBUK
If the already posted answers aren't fast enough you'll probably have to look for a solution specific to your particular problem.
如果已经发布的答案不够快,您可能需要寻找特定于您的特定问题的解决方案。
For example if these text files are logs that are only appended to and you regularly need to know the number of lines in them you could create an index. This index would contain the number of lines in the file, when the file was last modified and how large the file was then. This would allow you to recalculate the number of lines in the file by skipping over all the lines you had already seen and just reading the new lines.
例如,如果这些文本文件是仅附加到的日志,并且您经常需要知道其中的行数,则可以创建索引。该索引将包含文件中的行数、文件上次修改时间以及文件当时的大小。这将允许您通过跳过您已经看到的所有行并仅读取新行来重新计算文件中的行数。
回答by Stephen C
Probably the fastest solution in pure Java would be to read the file as bytes using a NIO Channel into large ByteBuffer. Then using your knowledge of the file encoding scheme(s) count the encoded CR and/or NL bytes, per the relevant line separator convention.
纯 Java 中最快的解决方案可能是使用 NIO 通道将文件作为字节读取到大 ByteBuffer 中。然后使用您对文件编码方案的了解,根据相关的行分隔符约定计算编码的 CR 和/或 NL 字节。
The keys to maximising throughput will be:
最大化吞吐量的关键是:
- make sure that you read the file in large chunks,
- avoid copying the bytes from one buffer to another,
- avoid copying / converting bytes into characters, and
- avoid allocating objects to represent the file lines.
- 确保您以大块读取文件,
- 避免将字节从一个缓冲区复制到另一个缓冲区,
- 避免将字节复制/转换为字符,以及
- 避免分配对象来表示文件行。
The actual code is too complicated for me to write on the fly. Besides, the OP is not asking for the fastest solution.
实际代码太复杂了,我无法即时编写。此外,OP 并不要求最快的解决方案。
回答by Daniel
Try the unix "wc" command. I don't mean use it, I mean download the source and see how they do it. It's probably in c, but you can easily port the behavior to java. The problem with making your own is to account for the ending cr/lf problem.
试试 unix "wc" 命令。我不是说使用它,我的意思是下载源代码,看看他们是怎么做的。它可能在 c 中,但您可以轻松地将行为移植到 java。自己制作的问题是解决结束 cr/lf 问题。
回答by ZZ Coder
This is about as efficient as it can get, buffered binary read, no string conversion,
这与它所能获得的效率一样高,缓冲的二进制读取,没有字符串转换,
FileInputStream stream = new FileInputStream("/tmp/test.txt");
byte[] buffer = new byte[8192];
int count = 0;
int n;
while ((n = stream.read(buffer)) > 0) {
for (int i = 0; i < n; i++) {
if (buffer[i] == '\n') count++;
}
}
stream.close();
System.out.println("Number of lines: " + count);