java 读取文件 vs 将文件从磁盘加载到主内存中进行处理
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13096543/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading a file vs loading a file into main memory from disk for processing
提问by Mahalakshmi Lakshminarayanan
how do I load a file into main memory?
如何将文件加载到主内存中?
I read the files using, I use
我使用阅读文件,我使用
BufferReader buf = new BufferedReader(FileReader());
I presume that this is reading the file line by line from the disk. What is the advantage of this?
我认为这是从磁盘逐行读取文件。这有什么好处?
What is the advantage of loading the file directly into memory? How do we do that in Java?
直接将文件加载到内存中有什么好处?我们如何在 Java 中做到这一点?
I found some examples on Scanner
or RandomAccessFile
methods. Do they load the files into memory? Should I use them? Which of the two should I use ?
我找到了一些关于Scanner
或RandomAccessFile
方法的例子。他们是否将文件加载到内存中?我应该使用它们吗?我应该使用两者中的哪一个?
Thanks in advance!!!
提前致谢!!!
回答by Stephen C
BufferReader buf = new BufferedReader(FileReader());
I presume that this is reading the file line by line from the disk. What is the advantage of this?
BufferReader buf = new BufferedReader(FileReader());
我认为这是从磁盘逐行读取文件。这有什么好处?
Not exactly. It is reading the file in chunks whose size is the default buffer size (8k bytes I think).
不完全是。它以块的形式读取文件,其大小是默认缓冲区大小(我认为是 8k 字节)。
The advantage is that you don't need a huge heap to read a huge file. This is a significant issue since the maximum heap size can only be specified at JVM startup (with Hotspot Java).
优点是你不需要一个巨大的堆来读取一个巨大的文件。这是一个重要的问题,因为最大堆大小只能在 JVM 启动时指定(使用 Hotspot Java)。
You also don't consume the system's physical / virtual memory resources to represent the huge heap.
您也不会消耗系统的物理/虚拟内存资源来表示巨大的堆。
What is the advantage of loading the file directly into memory?
直接将文件加载到内存中有什么好处?
It reduces the number of system calls, and mayread the file faster. How much faster depends on a number of factors. And you have the problem of dealing with really large files.
它减少了系统调用的次数,并且可以更快地读取文件。多快取决于许多因素。你有处理非常大的文件的问题。
How do we do that in Java?
我们如何在 Java 中做到这一点?
- Find out how large the file is.
- Allocate a byte (or character) array big enough.
- Use the relevant
read(byte[], int, int)
orread(char[], int, int)
method to read the entire file.
- 找出文件有多大。
- 分配一个足够大的字节(或字符)数组。
- 使用相关
read(byte[], int, int)
或read(char[], int, int)
方法读取整个文件。
You can also use a memory-mapped file ... but that requires using the Buffer
APIs which can be a bit tricky to use.
您还可以使用内存映射文件……但这需要使用Buffer
API,使用起来可能有点棘手。
I found some examples on Scanner or RandomAccessFile methods. Do they load the files into memory?
我找到了一些关于 Scanner 或 RandomAccessFile 方法的例子。他们是否将文件加载到内存中?
No, and no.
不,也没有。
Should I use them? Which of the two should I use ?
我应该使用它们吗?我应该使用两者中的哪一个?
Do they provide the functionality that you require? Do you need to read / parse text-based data? Do you need to do random access on a binary data?
它们是否提供您需要的功能?您是否需要读取/解析基于文本的数据?您是否需要对二进制数据进行随机访问?
Under normal circumstances, you should chose your I/O APIs based primarily on the functionality that you require, and secondarily on performance considerations. Using a BufferedInputStream
or BufferedReader
is usually enough to get acceptable*performance if you intend to parse it as you read it. (But if you actually need to hold the entire file in memory in its original form, then a BufferedXxx
wrapper class actually makes reading a bit slower.)
在正常情况下,您应该主要根据您需要的功能来选择您的 I/O API,其次是性能方面的考虑。如果您打算在阅读时对其进行解析,则使用BufferedInputStream
orBufferedReader
通常足以获得可接受的*性能。(但如果您确实需要以原始形式将整个文件保存在内存中,那么BufferedXxx
包装类实际上会使读取速度变慢一些。)
* - Note that acceptableperformance is not the same as optimalperformance, but your client / project manager probably would not want your to waste time writing code to perform optimally ... if this is not a stated requirement.
* - 请注意,可接受的性能与最佳性能不同,但您的客户/项目经理可能不希望您浪费时间编写代码以实现最佳性能……如果这不是明确的要求。
回答by Hot Licks
If you're reading in the file and then parsing it, walking from beginning to end once to extract your data, then not referencing the file again, a buffered reader is about as "optimal" as you'll get. You can "tune" the performance somewhat by adjusting the buffer size -- a larger buffer will read larger chunks from the file. (Make the buffer a power of 2 -- eg 262144.) Reading in an entire large file (larger than, say, 1mb) will generally cost you performance in paging and heap management.
如果您正在读取文件然后解析它,从头到尾提取数据一次,然后不再引用该文件,则缓冲读取器与您将获得的“最佳”一样。您可以通过调整缓冲区大小来“调整”性能——更大的缓冲区将从文件中读取更大的块。(使缓冲区为 2 的幂——例如 262144。)读取整个大文件(大于,比如说,1mb)通常会降低分页和堆管理的性能。