如何在 Java 中附加/写入巨大的数据文件文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18718676/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 10:27:29  来源:igfitidea点击:

How to append/write huge data file text in Java

javafile

提问by hudi

I have a database with 150k records. I want to write this to file as fast as possible. I've tried many approaches, but all seem slow. How do I make this faster?

我有一个包含 15 万条记录的数据库。我想尽快将其写入文件。我尝试了很多方法,但似乎都很慢。我如何使这个更快?

I read these records in blocks of 40k. So first I read 40k then another 40k and so on.

我以 40k 块为单位读取这些记录。所以首先我读了 40k,然后再读了 40k,依此类推。

After reading the records, this process returns a StringBuilder which contains 40k lines. Then we write this StringBuilder to a file.

读取记录后,此过程返回一个包含 40k 行的 StringBuilder。然后我们将此 StringBuilder 写入文件。

private static void write(StringBuilder sb, Boolean append) throws Exception {
    File file = File.createTempFile("foo", ".txt");

    FileWriter writer = new FileWriter(file.getAbsoluteFile(), append);
    PrintWriter out = new PrintWriter(writer);
    try {
        out.print(sb);           
        out.flush();
        writer.flush();
    } finally {
        writer.close();
        out.close();
    }
}

I read this other example but it is equally slow: Fastest way to write huge data in text file Java

我读了另一个例子,但它同样慢:在文本文件 Java 中写入大量数据的最快方法

I also tried it with NIO api:

我也用 NIO api 尝试过:

private static void write(StringBuilder sb, Boolean append)) throws Exception {
    FileChannel rwChannel = new FileOutputStream("textfile.txt", true).getChannel();
    ByteBuffer bb = ByteBuffer.wrap(sb.toString().getBytes("UTF-8"));
    rwChannel.write(bb);
    rwChannel.close();
}

Which is the best method to write/append huge data into file?

将大量数据写入/附加到文件中的最佳方法是什么?

采纳答案by Holger

You don't need a PrintWriterhere. If you have whatever kind of Writer(e.g. a FileWriter) you can simply invoke append(sb)on it. And you don't need to flush, closeimplies flushing.

你不需要PrintWriter这里。如果您有任何类型的Writer(例如 a FileWriter),您可以简单地调用append(sb)它。而且你不需要flushclose意味着冲洗。

private static void write(StringBuilder sb, Boolean append) throws Exception {
  File file = File.createTempFile("foo", ".txt");

  try(FileWriter writer = new FileWriter(file.getAbsoluteFile(), append)) {
      writer.append(sb);
  }
}

On my system I encountered a small performance improvement using a Channelrather than an OutputStream:

在我的系统上,我遇到了使用 aChannel而不是 的小性能改进OutputStream

private static void write0a(StringBuilder sb, Boolean append) throws Exception {
  File file = File.createTempFile("foo", ".txt");

  try(Writer writer = Channels.newWriter(new FileOutputStream(
      file.getAbsoluteFile(), append).getChannel(), "UTF-8")) {
    writer.append(sb);
  }
}

However these are only slight improvements. I don't see much possibilities here as all the code ends up calling the same routines. What could really improve your performance is keeping the Writer alive during the invocations and not flushing every record.

然而,这些只是轻微的改进。我在这里看不到太多可能性,因为所有代码最终都会调用相同的例程。真正可以提高您的性能的是在调用期间保持 Writer 处于活动状态,而不是刷新每条记录。

回答by Seelenvirtuose

You are using a FileWriter (or a FileOutputStream in the second example). These are not buffered! So they write single chars resp. bytes to the disk.

您正在使用 FileWriter(或第二个示例中的 FileOutputStream)。这些没有缓冲!所以他们写单个字符。字节到磁盘。

That means, you should wrap the FileWriter in a BufferedWriter (or the FileOutputSystem in a BufferedOutputSystem).

这意味着,您应该将 FileWriter 包装在 BufferedWriter 中(或将 FileOutputSystem 包装在 BufferedOutputSystem 中)。

private static void write(StringBuilder sb, Boolean append) throws Exception {
    File file = File.createTempFile("foo", ".txt");
    Writer writer = new BufferedWriter(new FileWriter(file.getAbsoluteFile(), append));
    PrintWriter out = new PrintWriter(writer);
    try {
        out.print(sb);           
        out.flush();
        writer.flush();
    } finally {
        writer.close();
        out.close();
    }
}

回答by user1079877

If you have a huge amount of data, it's better that you don't store it to StringBuilder and then write it to file at once.

如果你有大量的数据,最好不要将它存储到StringBuilder然后立即将其写入文件。

This is the best scenario:

这是最好的场景:

1) Before you start process on the data create FileInputStream

1) 在开始处理数据之前创建 FileInputStream

FileOutputStream fos = new FileOutputStream("/path/of/your/file");

2) Create and OutputStreamWriter from this file

2)从这个文件创建和OutputStreamWriter

OutputStreamWriter w = new OutputStreamWriter(fos, "UTF-8");

3) Create BufferedWriter (Improve file writing performance)

3)创建BufferedWriter(提高文件写入性能)

BufferedWriter bw = new BufferedWriter(w);

4) Pass bw to your process function and then flush/close

4)将 bw 传递给您的过程函数,然后刷新/关闭

bw.flush();
bw.close();

The functionality of StringBuilder and BufferedWriter is almost same, So you do not need to change your code so much. The only negative point of this scenario is that, your process will involve all the time that the data are writing to file, but if you don't process the data in different thread, it is not an issue.

StringBuilder 和 BufferedWriter 的功能几乎相同,因此您无需对代码进行太多更改。这种情况的唯一缺点是,您的过程将涉及数据写入文件的所有时间,但是如果您不在不同的线程中处理数据,则不是问题。

In this way, it doesn't matter how large data is it

这样,不管数据有多大

回答by user207421

You are opening the file, writing one line, then closing it. It's the opening and closing that takes the time here. Find a way to keep the output file open.

您正在打开文件,写一行,然后关闭它。在这里需要时间的是开场和闭幕。找到保持输出文件打开的方法。

回答by Amr Gawish

Did you try Apache IO, is the performance still the same?

你试过Apache IO,性能还是一样吗?