Java IO:逐行写入文本文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25054644/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 07:14:44  来源:igfitidea点击:

Java IO : Writing into a text file line by line

javafilefile-iojava-io

提问by JavaUser

I have a requirement where i need to write a text file line by line. The number of lines may be up to 80K . I am opening the file output stream and inside a for-loop , iterating a list and forming a line and writing the line into the file.

我有一个要求,我需要逐行编写一个文本文件。行数可能高达 80K。我正在打开文件输出流并在 for-loop 内部,迭代一个列表并形成一行并将该行写入文件。

This means 80K write operations are made on the file .

这意味着对文件进行了 80K 次写入操作。

Opening and writing the file very frequently hinders performance. Can anyone suggest a best way yo address this requirement in Java IO?

非常频繁地打开和写入文件会影响性能。任何人都可以建议在 Java IO 中解决此要求的最佳方法吗?

Thanks.

谢谢。

回答by user207421

You haven't posted any code, but as long as your writes are buffered you should hardly notice the performance. Use BufferedWriter.write()followed by BufferedWriter.newLine(),and avoid flushing as much as you can. Don't 'form a line', just write whatever you have to write as soon as you have it. Much if not all of the overhead you are observing may actually be string concatenation rather than I/O.

您还没有发布任何代码,但是只要您的写入被缓冲,您就几乎不会注意到性能。使用BufferedWriter.write()其次BufferedWriter.newLine(),,避免冲洗的多,你可以。不要“排成一行”,只要有你必须写的就写。您观察到的大部分开销(如果不是全部)实际上可能是字符串连接而不是 I/O。

The alternatives mentioned in other answers either amount to this implemented in more baroque ways, or involve NIO which isn't going to be any faster.

其他答案中提到的替代方案要么以更巴洛克的方式实现,要么涉及不会更快的 NIO。

回答by vanje

Use a BufferedOutputStream. With it all writes are written in a buffer at first and not directly to disk. Writing to disk appears only if the buffer is full and while closing or flushing the stream. The default buffer size is 8192 bytes but you can specify your own buffer size.

使用一个BufferedOutputStream. 有了它,所有写入首先写入缓冲区,而不是直接写入磁盘。仅当缓冲区已满并且关闭或刷新流时才会出现写入磁盘。默认缓冲区大小为 8192 字节,但您可以指定自己的缓冲区大小。

Here is an example using the default buffer size:

以下是使用默认缓冲区大小的示例:

PrintWriter out = null;
try {
  out = new PrintWriter(new OutputStreamWriter(
      new BufferedOutputStream(new FileOutputStream("out.txt")), "UTF-8"));
  for(int i = 0; i < 80000; i++) {
    out.println(String.format("Line %d", i));
  }      
} catch (UnsupportedEncodingException e) {
  e.printStackTrace();
} catch (FileNotFoundException e) {
  e.printStackTrace();
} finally {
  if(out != null) {
    out.flush();
    out.close();
  }
}

回答by Chris K

Below are the heuristics that I use to aid my decisions when designing for fast file IO and a set of benchmarks that I use to test different alternatives.

以下是我在设计快速文件 IO 时用来帮助我做出决定的启发式方法以及用于测试不同替代方案的一组基准测试。

Heuristics:

启发式:

  1. preallocate the file, asking the OS to resize the file is expensive,
  2. stream the data as much as possible, avoiding seeking as they do not perform on spinning discs,
  3. batch the writes (while taking care to not create excessive GC problems),
  4. when designing for ssd's, avoid updating data in place.. that is the slowest op on a ssd. A complete guide to there SSD quirks can be read here
  5. where possible avoid copying data between buffers (this is where java nio can help) and
  6. if possible use memory mapped files. Memory mapped files are under used in Java, however handing the disk writes over to the OS to perform asynchronously is typically an order of magnitude faster than the alternatives; ie BufferedWriter and RandomAccessFile.
  1. 预分配文件,要求操作系统调整文件大小很昂贵,
  2. 尽可能多地流式传输数据,避免查找,因为它们不在旋转磁盘上执行,
  3. 批量写入(同时注意不要产生过多的 GC 问题),
  4. 在为 ssd 设计时,避免就地更新数据......这是 ssd 上最慢的操作。可以在此处阅读有关 SSD 怪癖的完整指南
  5. 尽可能避免在缓冲区之间复制数据(这是 java nio 可以提供帮助的地方)和
  6. 如果可能,请使用内存映射文件。内存映射文件在 Java 中使用不足,但是将磁盘写入交给操作系统以异步执行通常比替代方案快一个数量级;即 BufferedWriter 和 RandomAccessFile。

I wrote the following file benchmarks awhile ago. Give them a run: https://gist.github.com/kirkch/3402882

不久前我编写了以下文件基准测试。试一试:https: //gist.github.com/kirkch/3402882

When I run the benchmarks, against a standard spinning disk I got these results:

当我在标准旋转磁盘上运行基准测试时,我得到了以下结果:

Stream Write: 438
Mapped Write: 28
Stream Read: 421
Mapped Read: 12
Stream Read/Write: 1866
Mapped Read/Write: 19

All numbers are in ms, so smaller is better. Notice that memory mapped files consistently out perform every other approach.

所有数字都以毫秒为单位,因此越小越好。请注意,内存映射文件始终优于其他所有方法。

The other surprise that I have found when writing these types of systems is that in later versions of Java, using BufferedWriter can be slower than just using FileWriter directly or RandomAccessFile. It turns out that buffering is done lower down now, I think that it happened when Sun rewrote java.io to use channels and byte buffers under the covers. Yet the advice of adding ones own buffering remains common practice. As aways measure first on your target environment, feel free to adjust the benchmark code above to experiment further.

我在编写这些类型的系统时发现的另一个惊喜是,在 Java 的更高版本中,使用 BufferedWriter 可能比直接使用 FileWriter 或 RandomAccessFile 慢。事实证明,缓冲现在已经完成了,我认为这是在 Sun 重写 java.io 以在幕后使用通道和字节缓冲区时发生的。然而,添加自己的缓冲的建议仍然是常见的做法。由于首先在您的目标环境上进行测量,请随意调整上面的基准代码以进一步试验。

While looking for links to back up some of the facts above, I came across Martin Thompson's post on this topic. It is well worth a read.

在寻找支持上述某些事实的链接时,我发现了Martin Thompson 关于此主题的帖子。值得一读。