Java:读取一个巨大文件的最后 n 行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4121678/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java : Read last n lines of a HUGE file
提问by Gaurav Verma
I want to read the last n lines of a very big file without reading the whole file into any buffer/memory area using Java.
我想读取一个非常大文件的最后 n 行,而不是使用 Java 将整个文件读入任何缓冲区/内存区域。
I looked around the JDK APIs and Apache Commons I/O and am not able to locate one which is suitable for this purpose.
我环顾了 JDK API 和 Apache Commons I/O,但无法找到适合此目的的 API。
I was thinking of the way tail or less does it in UNIX. I don't think they load the entire file and then show the last few lines of the file. There should be similar way to do the same in Java too.
我在想在 UNIX 中使用 tail 或 less 的方式。我认为他们不会加载整个文件然后显示文件的最后几行。在 Java 中也应该有类似的方法来做同样的事情。
采纳答案by paxdiablo
If you use a RandomAccessFile
, you can use length
and seek
to get to a specific point near the end of the file and then read forward from there.
如果使用 a RandomAccessFile
,则可以使用length
和seek
到达文件末尾附近的特定点,然后从那里向前阅读。
If you find there weren't enough lines, back up from that point and try again. Once you've figured out where the N
th last line begins, you can seek to there and just read-and-print.
如果您发现没有足够的行,请从该点备份并重试。一旦你弄清楚N
最后一行的开始位置,你就可以找到那里,然后阅读和打印。
An initial best-guess assumption can be made based on your data properties. For example, if it's a text file, it's possible the line lengths won't exceed an average of 132 so, to get the last five lines, start 660 characters before the end. Then, if you were wrong, try again at 1320 (you can even use what you learned from the last 660 characters to adjust that - example: if those 660 characters were just three lines, the next try could be 660 / 3 * 5, plus maybe a bit extra just in case).
可以根据您的数据属性做出最初的最佳猜测假设。例如,如果它是一个文本文件,它的平均行长度可能不会超过 132,因此,要获取最后 5 行,请在结束前 660 个字符开始。然后,如果你错了,在 1320 再试一次(你甚至可以用你从最后 660 个字符中学到的东西来调整它——例如:如果那 660 个字符只有三行,那么下一次尝试可能是 660 / 3 * 5,加上可能有点额外以防万一)。
回答by Yann Ramin
A RandomAccessFile
allows for seeking (http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html). The File.length
method will return the size of the file. The problem is determining number of lines. For this, you can seek to the end of the file and read backwards until you have hit the right number of lines.
ARandomAccessFile
允许查找 (http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html)。该File.length
方法将返回文件的大小。问题是确定行数。为此,您可以查找到文件末尾并向后阅读,直到找到正确的行数。
回答by Stephen C
RandomAccessFile is a good place to start, as described by the other answers. There is one important caveatthough.
RandomAccessFile 是一个很好的起点,如其他答案所述。不过,有一个重要的警告。
If your file is not encoded with an one-byte-per-character encoding, the readLine()
method is not going to work for you. And readUTF()
won't work in any circumstances. (It reads a string preceded by a character count ...)
如果您的文件没有使用每字符一个字节的编码进行编码,则该readLine()
方法对您不起作用。并且readUTF()
在任何情况下都不起作用。(它读取以字符数开头的字符串......)
Instead, you will need to make sure that you look for end-of-line markers in a way that respects the encoding's character boundaries. For fixed length encodings (e.g. flavors of UTF-16 or UTF-32) you need to extract characters starting from byte positions that are divisible by the character size in bytes. For variable length encodings (e.g. UTF-8), you need to search for a byte that mustbe the first byte of a character.
相反,您需要确保以尊重编码字符边界的方式查找行尾标记。对于固定长度的编码(例如 UTF-16 或 UTF-32 的风格),您需要从可被字符大小(以字节为单位)整除的字节位置开始提取字符。对于可变长度编码(例如 UTF-8),您需要搜索必须是字符第一个字节的字节。
In the case of UTF-8, the first byte of a character will be 0xxxxxxx
or 110xxxxx
or 1110xxxx
or 11110xxx
. Anything else is either a second / third byte, or an illegal UTF-8 sequence. See The Unicode Standard, Version 5.2, Chapter 3.9, Table 3-7. This means, as the comment discussion points out, that any 0x0A and 0x0D bytes in a properly encoded UTF-8 stream will represent a LF or CR character. Thus, simply counting the 0x0A and 0x0D bytes is a valid implementation strategy (for UTF-8) if we can assume that the other kinds of Unicode line separator (0x2028, 0x2029 and 0x0085) are not used. You can't assume that, then the code would be more complicated.
在 UTF-8 的情况下,字符的第一个字节将是0xxxxxxx
or 110xxxxx
or 1110xxxx
or 11110xxx
。其他任何东西要么是第二个/第三个字节,要么是非法的 UTF-8 序列。请参阅Unicode 标准,版本 5.2,第 3.9 章,表 3-7。这意味着,正如评论讨论指出的那样,正确编码的 UTF-8 流中的任何 0x0A 和 0x0D 字节都将代表 LF 或 CR 字符。因此,如果我们可以假设不使用其他类型的 Unicode 行分隔符(0x2028、0x2029 和 0x0085),那么简单地计算 0x0A 和 0x0D 字节是一种有效的实现策略(对于 UTF-8)。你不能假设,那么代码会更复杂。
Having identified a proper character boundary, you can then just call new String(...)
passing the byte array, offset, count and encoding, and then repeatedly call String.lastIndexOf(...)
to count end-of-lines.
确定了正确的字符边界后,您可以调用new String(...)
传递字节数组、偏移量、计数和编码,然后重复调用String.lastIndexOf(...)
计数行尾。
回答by ra9r
Here is the best way I've found to do it. Simple and pretty fast and memory efficient.
这是我找到的最好的方法。简单且非常快速且内存高效。
public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
BufferedReader reader = new BufferedReader(new FileReader(src));
String[] lines = new String[maxLines];
int lastNdx = 0;
for (String line=reader.readLine(); line != null; line=reader.readLine()) {
if (lastNdx == lines.length) {
lastNdx = 0;
}
lines[lastNdx++] = line;
}
OutputStreamWriter writer = new OutputStreamWriter(out);
for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
if (ndx == lines.length) {
ndx = 0;
}
writer.write(lines[ndx]);
writer.write("\n");
}
writer.flush();
}
回答by ruth542
CircularFifoBufferfrom apache commons . answer from a similar question at How to read last 5 lines of a .txt file into java
来自 apache commons 的CircularFifoBuffer。来自类似问题的回答如何将 .txt 文件的最后 5 行读入 java
Note that in Apache Commons Collections 4 this class seems to have been renamed to CircularFifoQueue
请注意,在 Apache Commons Collections 4 中,此类似乎已重命名为CircularFifoQueue
回答by Luca
I found RandomAccessFile
and other Buffer Reader classes too slow for me. Nothing can be faster than a tail -<#lines>
. So this it was the best solution for me.
我发现RandomAccessFile
其他 Buffer Reader 类对我来说太慢了。没有什么比一个更快的了tail -<#lines>
。所以这对我来说是最好的解决方案。
public String getLastNLogLines(File file, int nLines) {
StringBuilder s = new StringBuilder();
try {
Process p = Runtime.getRuntime().exec("tail -"+nLines+" "+file);
java.io.BufferedReader input = new java.io.BufferedReader(new java.io.InputStreamReader(p.getInputStream()));
String line = null;
//Here we first read the next line into the variable
//line and then check for the EOF condition, which
//is the return value of null
while((line = input.readLine()) != null){
s.append(line+'\n');
}
} catch (java.io.IOException e) {
e.printStackTrace();
}
return s.toString();
}
回答by akki_java
I found it the simplest way to do by using ReversedLinesFileReader
from apache commons-ioapi.
This method will give you the line from bottom to top of a file and you can specify n_lines
value to specify the number of line.
我发现使用ReversedLinesFileReader
from apache commons-ioapi是最简单的方法。此方法将为您提供从文件底部到顶部的行,您可以指定n_lines
值来指定行数。
import org.apache.commons.io.input.ReversedLinesFileReader;
File file = new File("D:\file_name.xml");
int n_lines = 10;
int counter = 0;
ReversedLinesFileReader object = new ReversedLinesFileReader(file);
while(counter < n_lines) {
System.out.println(object.readLine());
counter++;
}
回答by Torsten Simon
The ReversedLinesFileReader
can be found in the Apache Commons IOjava library.
本ReversedLinesFileReader
可以在发现Apache的百科全书IOjava库。
int n_lines = 1000;
ReversedLinesFileReader object = new ReversedLinesFileReader(new File(path));
String result="";
for(int i=0;i<n_lines;i++){
String line=object.readLine();
if(line==null)
break;
result+=line;
}
return result;
回答by pocket
I had similar problem, but I don't understood to another solutions.
我有类似的问题,但我不明白其他解决方案。
I used this. I hope thats simple code.
我用过这个。我希望那是简单的代码。
// String filePathName = (direction and file name).
File f = new File(filePathName);
long fileLength = f.length(); // Take size of file [bites].
long fileLength_toRead = 0;
if (fileLength > 2000) {
// My file content is a table, I know one row has about e.g. 100 bites / characters.
// I used 1000 bites before file end to point where start read.
// If you don't know line length, use @paxdiablo advice.
fileLength_toRead = fileLength - 1000;
}
try (RandomAccessFile raf = new RandomAccessFile(filePathName, "r")) { // This row manage open and close file.
raf.seek(fileLength_toRead); // File will begin read at this bite.
String rowInFile = raf.readLine(); // First readed line usualy is not whole, I needn't it.
rowInFile = raf.readLine();
while (rowInFile != null) {
// Here I can readed lines (rowInFile) add to String[] array or ArriyList<String>.
// Later I can work with rows from array - last row is sometimes empty, etc.
rowInFile = raf.readLine();
}
}
catch (IOException e) {
//
}
回答by user11016
Here is the working for this.
这是为此的工作。
private static void printLastNLines(String filePath, int n) {
File file = new File(filePath);
StringBuilder builder = new StringBuilder();
try {
RandomAccessFile randomAccessFile = new RandomAccessFile(filePath, "r");
long pos = file.length() - 1;
randomAccessFile.seek(pos);
for (long i = pos - 1; i >= 0; i--) {
randomAccessFile.seek(i);
char c = (char) randomAccessFile.read();
if (c == '\n') {
n--;
if (n == 0) {
break;
}
}
builder.append(c);
}
builder.reverse();
System.out.println(builder.toString());
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}