Java阅读200万行文本文件的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19486077/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Fastest way to read through text file with 2 million lines
提问by BeyondProgrammer
Currently I am using scanner/filereader and using while hasnextline. I think this method is not highly efficient. Is there any other method to read file with the similar functionality of this?
目前我正在使用扫描仪/文件阅读器并使用 while hasnextline。我认为这种方法效率不高。有没有其他方法可以读取具有类似功能的文件?
public void Read(String file) {
Scanner sc = null;
try {
sc = new Scanner(new FileReader(file));
while (sc.hasNextLine()) {
String text = sc.nextLine();
String[] file_Array = text.split(" ", 3);
if (file_Array[0].equalsIgnoreCase("case")) {
//do something
} else if (file_Array[0].equalsIgnoreCase("object")) {
//do something
} else if (file_Array[0].equalsIgnoreCase("classes")) {
//do something
} else if (file_Array[0].equalsIgnoreCase("function")) {
//do something
}
else if (file_Array[0].equalsIgnoreCase("ignore")) {
//do something
}
else if (file_Array[0].equalsIgnoreCase("display")) {
//do something
}
}
} catch (FileNotFoundException e) {
System.out.println("Input file " + file + " not found");
System.exit(1);
} finally {
sc.close();
}
}
采纳答案by user207421
You will find that BufferedReader.readLine()
is as fast as you need: you can read millions of lines a second with it. It is more probable that your string splitting and handling is causing whatever performance problems you are encountering.
您会发现这BufferedReader.readLine()
与您需要的一样快:您可以使用它每秒读取数百万行。您的字符串拆分和处理更有可能导致您遇到的任何性能问题。
回答by Trying
you can use FileChanneland ByteBufferfrom JAVA NIO. ByteBuffer size is the most critical part in reading data faster what i have observed. Below code will read the content of the file.
您可以使用JAVA NIO 的FileChannel和ByteBuffer。ByteBuffer 大小是我观察到的更快读取数据的最关键部分。下面的代码将读取文件的内容。
static public void main( String args[] ) throws Exception
{
FileInputStream fileInputStream = new FileInputStream(
new File("sample4.txt"));
FileChannel fileChannel = fileInputStream.getChannel();
ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
fileChannel.read(byteBuffer);
byteBuffer.flip();
int limit = byteBuffer.limit();
while(limit>0)
{
System.out.print((char)byteBuffer.get());
limit--;
}
fileChannel.close();
}
You can check for '\n' for new line here. Thanks.
您可以在此处检查新行的 '\n' 。谢谢。
Even you can scatter and getter way to read files faster i.e.
即使你可以分散和获取方式来更快地读取文件,即
fileChannel.get(buffers);
where
在哪里
ByteBuffer b1 = ByteBuffer.allocate(B1);
ByteBuffer b2 = ByteBuffer.allocate(B2);
ByteBuffer b3 = ByteBuffer.allocate(B3);
ByteBuffer[] buffers = {b1, b2, b3};
This saves the user process to from making several system calls (which can be expensive) and allows kernel to optimize handling of the data because it has information about the total transfer, If multiple CPUs available it may even be possible to fill and drain several buffers simultaneously.
这使用户进程免于进行多次系统调用(这可能很昂贵)并允许内核优化数据处理,因为它具有有关总传输的信息,如果有多个 CPU 可用,甚至可以填充和排空多个缓冲区同时。
From thisbook.
从这一本书。
回答by nullptr
You must investigate which part of program is taking time.
您必须调查程序的哪一部分需要时间。
As per answer of EJP, you should use BufferedReader.
根据 EJP 的回答,您应该使用 BufferedReader。
If really string processing is taking time, then you should consider using threads, one thread will read from file and queues lines. Other string processor threads will dequeue lines and process them. You will need to investigate how many threads to use, the number of threads you should use in application has to be related with number of cores in CPU, in that way will use full CPU.
如果真正的字符串处理需要时间,那么您应该考虑使用线程,一个线程将从文件和队列行中读取。其他字符串处理器线程将使行出列并处理它们。您需要调查要使用多少线程,您应该在应用程序中使用的线程数必须与 CPU 中的内核数相关,这样将使用完整的 CPU。
回答by Pratik Shelar
If you wish to read all lines together then you should have a look at the Files API of java 7. Its really simple to use.
如果您希望一起阅读所有行,那么您应该查看 java 7 的 Files API。它使用起来非常简单。
But a better approach would be to process this file in a batch. Have a reader which reads chunks of lines from the file and a writer which does the required processing or persists the data. Having abatch will ensure that it will work even if the lines increase to billion in future. Also you can have a batch which uses a multithreading to increase theoverall performance of the batch. I would recpmmend that you have a look at spring batch.
但更好的方法是批量处理此文件。有一个从文件中读取行块的读取器和一个执行所需处理或持久化数据的写入器。拥有一批将确保即使未来线路增加到十亿,它也能正常工作。你也可以有一个批处理,它使用多线程来提高批处理的整体性能。我建议你看看春季批次。
回答by shamsAAzad
Scanner
can't be as fast as BufferedReader
, as it uses regular expressions for reading text files, which makes it slower compared to BufferedReader
. By using BufferedReader
you can read a block from a text file.
Scanner
不能像 一样快BufferedReader
,因为它使用正则表达式来读取文本文件,与BufferedReader
. 通过使用,BufferedReader
您可以从文本文件中读取块。
BufferedReader bf = new BufferedReader(new FileReader("FileName"));
you can next use readLine()to read from bf.
您接下来可以使用readLine()从 bf 读取。
Hope it serves your purpose.
希望它能达到你的目的。
回答by mac7
Use BufferedReaderfor high performance file access. But the default buffer size of 8192 bytes is often too small. For huge files you can increase the buffer sizeby orders of magnitudes to boost your file reading performance. For example:
使用BufferedReader进行高性能文件访问。但 8192 字节的默认缓冲区大小通常太小。对于大文件,您可以将缓冲区大小增加几个数量级,以提高文件读取性能。例如:
BufferedReader br = new BufferedReader("file.dat", 1000 * 8192);
while ((thisLine = br.readLine()) != null) {
System.out.println(thisLine);
}
回答by YAMM
I made a gistcomparing different methods:
我做了一个比较不同方法的要点:
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import java.util.function.Function;
public class Main {
public static void main(String[] args) {
String path = "resources/testfile.txt";
measureTime("BufferedReader.readLine() into ArrayList", Main::bufferReaderToLinkedList, path);
measureTime("BufferedReader.readLine() into LinkedList", Main::bufferReaderToArrayList, path);
measureTime("Files.readAllLines()", Main::readAllLines, path);
measureTime("Scanner.nextLine() into ArrayList", Main::scannerArrayList, path);
measureTime("Scanner.nextLine() into LinkedList", Main::scannerLinkedList, path);
measureTime("RandomAccessFile.readLine() into ArrayList", Main::randomAccessFileArrayList, path);
measureTime("RandomAccessFile.readLine() into LinkedList", Main::randomAccessFileLinkedList, path);
System.out.println("-----------------------------------------------------------");
}
private static void measureTime(String name, Function<String, List<String>> fn, String path) {
System.out.println("-----------------------------------------------------------");
System.out.println("run: " + name);
long startTime = System.nanoTime();
List<String> l = fn.apply(path);
long estimatedTime = System.nanoTime() - startTime;
System.out.println("lines: " + l.size());
System.out.println("estimatedTime: " + estimatedTime / 1_000_000_000.);
}
private static List<String> bufferReaderToLinkedList(String path) {
return bufferReaderToList(path, new LinkedList<>());
}
private static List<String> bufferReaderToArrayList(String path) {
return bufferReaderToList(path, new ArrayList<>());
}
private static List<String> bufferReaderToList(String path, List<String> list) {
try {
final BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(path), StandardCharsets.UTF_8));
String line;
while ((line = in.readLine()) != null) {
list.add(line);
}
in.close();
} catch (final IOException e) {
e.printStackTrace();
}
return list;
}
private static List<String> readAllLines(String path) {
try {
return Files.readAllLines(Paths.get(path));
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
private static List<String> randomAccessFileLinkedList(String path) {
return randomAccessFile(path, new LinkedList<>());
}
private static List<String> randomAccessFileArrayList(String path) {
return randomAccessFile(path, new ArrayList<>());
}
private static List<String> randomAccessFile(String path, List<String> list) {
try {
RandomAccessFile file = new RandomAccessFile(path, "r");
String str;
while ((str = file.readLine()) != null) {
list.add(str);
}
file.close();
} catch (IOException e) {
e.printStackTrace();
}
return list;
}
private static List<String> scannerLinkedList(String path) {
return scanner(path, new LinkedList<>());
}
private static List<String> scannerArrayList(String path) {
return scanner(path, new ArrayList<>());
}
private static List<String> scanner(String path, List<String> list) {
try {
Scanner scanner = new Scanner(new File(path));
while (scanner.hasNextLine()) {
list.add(scanner.nextLine());
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return list;
}
}
run: BufferedReader.readLine() into ArrayList, lines: 1000000, estimatedTime: 0.105118655
run: BufferedReader.readLine() into LinkedList, lines: 1000000, estimatedTime: 0.072696934
run: Files.readAllLines(), lines: 1000000, estimatedTime: 0.087753316
run: Scanner.nextLine() into ArrayList, lines: 1000000, estimatedTime: 0.743121734
run: Scanner.nextLine() into LinkedList, lines: 1000000, estimatedTime: 0.867049885
run: RandomAccessFile.readLine() into ArrayList, lines: 1000000, estimatedTime: 11.413323046
run: RandomAccessFile.readLine() into LinkedList, lines: 1000000, estimatedTime: 11.423862897
运行:BufferedReader.readLine() 进入ArrayList,行数:1000000,估计时间:0.105118655
运行:BufferedReader.readLine() 进入 LinkedList,行数:1000000,估计时间:0.072696934
运行:Files.readAllLines(),行数:1000000,估计时间:0.087753316
运行:Scanner.nextLine() 进入 ArrayList,行数:1000000,估计时间:0.743121734
运行:Scanner.nextLine() 进入 LinkedList,行数:1000000,估计时间:0.867049885
运行:RandomAccessFile.readLine() 进入 ArrayList,行数:1000000,估计时间:11.413323046
运行:RandomAccessFile.readLine() 进入 LinkedList,行数:1000000,估计时间:11.423862897
BufferedReader
is the fastest, Files.readAllLines()
is also acceptable, Scanner
is slow due to regex, RandomAccessFile
is inacceptable
BufferedReader
是最快的,Files.readAllLines()
也是可以接受的,Scanner
由于正则表达式很慢,RandomAccessFile
是不可接受的
回答by Digao
just updating this thread, now we have java 8 to do this job:
刚刚更新这个线程,现在我们有 java 8 来完成这项工作:
List<String> lines = Files.readAllLines(Paths.get(file_path);