如何将 Scala 流的内容写入文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29978264/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:06:30  来源:igfitidea点击:

How to write the contents of a Scala stream to a file?

scalaio

提问by mrog

I have a Scala stream of bytes that I'd like to write to a file. The stream has too much data to buffer all of it memory.

我有一个要写入文件的 Scala 字节流。流的数据太多,无法缓冲所有内存。

As a first attempt, I created an InputStreamsimilar to this:

作为第一次尝试,我创建了一个InputStream类似的:

class MyInputStream(data: Stream[Byte]) extends InputStream {
  private val iterator = data.iterator
  override def read(): Int = if (iterator.hasNext) iterator.next else -1
}

Then I use Apache Commons to write the file:

然后我使用 Apache Commons 来写文件:

val source = new MyInputStream(dataStream)
val target = new FileOutputStream(file)
try {
  IOUtils.copy(source, target)
} finally {
  target.close
}

This works, but I'm not too happy with the performance. I'm guessing that calling MyInputStream.readfor every byte introduces a lot of overhead. Is there a better way?

这有效,但我对性能不太满意。我猜想调用MyInputStream.read每个字节会带来很多开销。有没有更好的办法?

采纳答案by Steve Waldman

You might (or might not!) be mistaken that the read side is the source of your performance troubles. It could be the fact that you are using an unbuffered FileOutputStream(...), forcing a separate system call for every byte written.

您可能(也可能不会!)误认为读取端是性能问题的根源。可能是因为您正在使用无缓冲的 FileOutputStream(...),强制对每个写入的字节进行单独的系统调用。

Here's my take, quick 'n simple:

这是我的看法,快速简单:

def writeBytes( data : Stream[Byte], file : File ) = {
  val target = new BufferedOutputStream( new FileOutputStream(file) );
  try data.foreach( target.write(_) ) finally target.close;
}

回答by Aphex

I'd recommend the java.nio.filepackage. With Files.writeyou can write Arrays of Bytes to a Pathconstructed from a filename.

我会推荐这个java.nio.file包。随着Files.write你可以写ArrayBytes到一个Path从一个文件名构成。

It's up to you how to provide the Bytes. You can turn the Streaminto an Arraywith .toArrayor you can takebytes off the stream one (or a handful) at a time and turn them into arrays.

如何提供Bytes取决于您。您可以将Stream转换为Arraywith .toArray,也可以take一次从流中取出一个(或少量)字节并将它们转换为数组。

Here's a simple code block demonstrating the .toArraymethod.

这是一个演示该.toArray方法的简单代码块。

import java.nio.file.{Files, Paths}

val filename: String = "output.bin"
val bytes: Stream[Byte] = ...
Files.write(Paths.get(filename), bytes.toArray)

回答by Arne Claassen

Given that StreamIteratorreading one byte at a time might be the bottleneck, I've devised a way to write a stream to anOutputStreamthat does not rely on it and is hopefully more efficient:

考虑到一次StreamIterator读取一个字节可能是瓶颈,我设计了一种将流写入OutputStream不依赖它的方法,希望能更有效:

object StreamCopier {
  def copy(data: Stream[Byte], output: OutputStream) = {
    def write(d: Stream[Byte]): Unit = if (d.nonEmpty) {
      val (head, tail) = d.splitAt(4 * 1024)
      val bytes = head.toArray
      output.write(bytes, 0, bytes.length)
      write(tail)
    }
    write(data)
  }
}

EDIT:Fixed a bug by replacing datawith dinside the tail-recursive writefunction.

编辑:通过替换修正了datad所述尾递归内部write功能。

This approach uses a recursive approach via splitAtto split the stream into the first ~4K and the remainder, write that head it to the OutputStreamand recurse on the tail of the stream, until splitAtreturns an empty stream.

这种方法使用递归方法通过splitAt将流拆分为第一个 ~4K 和其余部分,将其写入流的头部OutputStream并在流的尾部递归,直到splitAt返回一个空流。

Since you have performance benchmarks in place, i'll leave it to you to judge if that turns out to be more efficient.

由于您已经制定了性能基准,我将让您来判断这是否更有效。

回答by Arne Claassen

You should implement the bulk read override in your InputStream implementation:

您应该在 InputStream 实现中实现批量读取覆盖:

override def read(b: Array[Byte], off: Int, len: Int)

IOUtils.copyuses that signature to read/write in 4K chunks.

IOUtils.copy使用该签名以 4K 块读取/写入。