Java文件中的行数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/453018/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 14:48:32  来源:igfitidea点击:

Number of lines in a file in Java

javalarge-filesline-numbers

提问by Mark

I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file

我使用巨大的数据文件,有时我只需要知道这些文件中的行数,通常我打开它们并逐行读取它们,直到到达文件末尾

I was wondering if there is a smarter way to do that

我想知道是否有更聪明的方法来做到这一点

采纳答案by martinus

This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.

这是迄今为止我发现的最快的版本,比 readLines 快大约 6 倍。在 150MB 的日志文件上,这需要 0.35 秒,而使用 readLines() 时需要 2.40 秒。只是为了好玩,linux 的 wc -l 命令需要 0.15 秒。

public static int countLinesOld(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 0;
        int readChars = 0;
        boolean empty = true;
        while ((readChars = is.read(c)) != -1) {
            empty = false;
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
        }
        return (count == 0 && !empty) ? 1 : count;
    } finally {
        is.close();
    }
}

EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReadersolution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:

编辑,9 1/2 年后:我几乎没有 Java 经验,但无论如何我已经尝试根据LineNumberReader下面的解决方案对此代码进行基准测试,因为没有人这样做让我感到困扰。似乎特别是对于大文件,我的解决方案更快。尽管在优化器完成体面的工作之前似乎需要运行几次。我对代码进行了一些操作,并生成了一个始终最快的新版本:

public static int countLinesNew(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];

        int readChars = is.read(c);
        if (readChars == -1) {
            // bail out if nothing to read
            return 0;
        }

        // make it easy for the optimizer to tune this loop
        int count = 0;
        while (readChars == 1024) {
            for (int i=0; i<1024;) {
                if (c[i++] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }

        // count remaining characters
        while (readChars != -1) {
            System.out.println(readChars);
            for (int i=0; i<readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }

        return count == 0 ? 1 : count;
    } finally {
        is.close();
    }
}

Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime(). You can see that countLinesOldhas a few outliers, and countLinesNewhas none and while it's only a bit faster, the difference is statistically significant. LineNumberReaderis clearly slower.

1.3GB 文本文件的基准测试结果,y 轴以秒为单位。我已经使用同一个文件执行了 100 次运行,并使用System.nanoTime(). 您可以看到countLinesOld有一些异常值,并且countLinesNew没有,虽然它只是快了一点,但差异在统计上是显着的。LineNumberReader显然更慢。

Benchmark Plot

基准图

回答by Peter Hilton

On Unix-based systems, use the wccommand on the command-line.

在基于 Unix 的系统上,使用wc命令行上的命令。

回答by Esko

Only way to know how many lines there are in file is to count them. You can of course create a metric from your data giving you an average length of one line and then get the file size and divide that with avg. length but that won't be accurate.

知道文件中有多少行的唯一方法是计算它们。您当然可以根据您的数据创建一个指标,为您提供一行的平均长度,然后获取文件大小并将其除以 avg。长度,但这不会是准确的。

回答by David Schmitt

If you don't have any index structures, you'll not get around the reading of the complete file. But you can optimize it by avoiding to read it line by line and use a regex to match all line terminators.

如果您没有任何索引结构,您将无法阅读完整的文件。但是您可以通过避免逐行读取并使用正则表达式来匹配所有行终止符来优化它。

回答by Dave Bergert

The answer with the method count() above gave me line miscounts if a file didn't have a newline at the end of the file - it failed to count the last line in the file.

如果文件末尾没有换行符,上面方法 count() 的答案给了我行错误计数 - 它无法计算文件中的最后一行。

This method works better for me:

这种方法对我更有效:

public int countLines(String filename) throws IOException {
    LineNumberReader reader  = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}

cnt = reader.getLineNumber(); 
reader.close();
return cnt;
}

回答by Faisal

if you use this

如果你使用这个

public int countLines(String filename) throws IOException {
    LineNumberReader reader  = new LineNumberReader(new FileReader(filename));
    int cnt = 0;
    String lineRead = "";
    while ((lineRead = reader.readLine()) != null) {}

    cnt = reader.getLineNumber(); 
    reader.close();
    return cnt;
}

you cant run to big num rows, likes 100K rows, because return from reader.getLineNumber is int. you need long type of data to process maximum rows..

你不能跑到大 num 行,比如 100K 行,因为从 reader.getLineNumber 返回的是 int。您需要长类型的数据来处理最大行数..

回答by er.vikas

I have implemented another solution to the problem, I found it more efficient in counting rows:

我已经实现了另一个解决方案,我发现它在计算行数方面更有效:

try
(
   FileReader       input = new FileReader("input.txt");
   LineNumberReader count = new LineNumberReader(input);
)
{
   while (count.skip(Long.MAX_VALUE) > 0)
   {
      // Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
   }

   result = count.getLineNumber() + 1;                                    // +1 because line index starts at 0
}

回答by Nathan Ryan

I know this is an old question, but the accepted solution didn't quite match what I needed it to do. So, I refined it to accept various line terminators (rather than just line feed) and to use a specified character encoding (rather than ISO-8859-n). All in one method (refactor as appropriate):

我知道这是一个老问题,但接受的解决方案与我需要它做的并不完全匹配。因此,我改进了它以接受各种行终止符(而不仅仅是换行符)并使用指定的字符编码(而不是 ISO-8859- n)。多合一方法(酌情重构):

public static long getLinesCount(String fileName, String encodingName) throws IOException {
    long linesCount = 0;
    File file = new File(fileName);
    FileInputStream fileIn = new FileInputStream(file);
    try {
        Charset encoding = Charset.forName(encodingName);
        Reader fileReader = new InputStreamReader(fileIn, encoding);
        int bufferSize = 4096;
        Reader reader = new BufferedReader(fileReader, bufferSize);
        char[] buffer = new char[bufferSize];
        int prevChar = -1;
        int readCount = reader.read(buffer);
        while (readCount != -1) {
            for (int i = 0; i < readCount; i++) {
                int nextChar = buffer[i];
                switch (nextChar) {
                    case '\r': {
                        // The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed.
                        linesCount++;
                        break;
                    }
                    case '\n': {
                        if (prevChar == '\r') {
                            // The current line is terminated by a carriage return immediately followed by a line feed.
                            // The line has already been counted.
                        } else {
                            // The current line is terminated by a line feed.
                            linesCount++;
                        }
                        break;
                    }
                }
                prevChar = nextChar;
            }
            readCount = reader.read(buffer);
        }
        if (prevCh != -1) {
            switch (prevCh) {
                case '\r':
                case '\n': {
                    // The last line is terminated by a line terminator.
                    // The last line has already been counted.
                    break;
                }
                default: {
                    // The last line is terminated by end-of-file.
                    linesCount++;
                }
            }
        }
    } finally {
        fileIn.close();
    }
    return linesCount;
}

This solution is comparable in speed to the accepted solution, about 4% slower in my tests (though timing tests in Java are notoriously unreliable).

这个解决方案在速度上与公认的解决方案相当,在我的测试中慢了大约 4%(尽管 Java 中的计时测试是出了名的不可靠)。

回答by DMulligan

The accepted answer has an off by one error for multi line files which don't end in newline. A one line file ending without a newline would return 1, but a two line file ending without a newline would return 1 too. Here's an implementation of the accepted solution which fixes this. The endsWithoutNewLine checks are wasteful for everything but the final read, but should be trivial time wise compared to the overall function.

对于不以换行符结尾的多行文件,接受的答案有一个错误。没有换行符结尾的单行文件将返回 1,但没有换行符结尾的两行文件也将返回 1。这是解决此问题的已接受解决方案的实现。除了最终读取之外,endsWithoutNewLine 检查对于所有内容都是浪费的,但与整个函数相比应该是微不足道的。

public int count(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 0;
        int readChars = 0;
        boolean endsWithoutNewLine = false;
        while ((readChars = is.read(c)) != -1) {
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n')
                    ++count;
            }
            endsWithoutNewLine = (c[readChars - 1] != '\n');
        }
        if(endsWithoutNewLine) {
            ++count;
        } 
        return count;
    } finally {
        is.close();
    }
}

回答by Sunil Shevante

How about using the Process class from within Java code? And then reading the output of the command.

如何在 Java 代码中使用 Process 类?然后读取命令的输出。

Process p = Runtime.getRuntime().exec("wc -l " + yourfilename);
p.waitFor();

BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = "";
int lineCount = 0;
while ((line = b.readLine()) != null) {
    System.out.println(line);
    lineCount = Integer.parseInt(line);
}

Need to try it though. Will post the results.

不过需要尝试一下。将发布结果。