在 Scala 中读取整个文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1284423/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 01:35:56  来源:igfitidea点击:

Read entire file in Scala?

scala

提问by Brendan OConnor

What's a simple and canonical way to read an entire file into memory in Scala? (Ideally, with control over character encoding.)

在 Scala 中将整个文件读入内存的简单而规范的方法是什么?(理想情况下,可以控制字符编码。)

The best I can come up with is:

我能想到的最好的是:

scala.io.Source.fromPath("file.txt").getLines.reduceLeft(_+_)

or am I supposed to use one of Java's god-awful idioms, the best of which (without using an external library) seems to be:

或者我应该使用Java 的一种非常糟糕的习语,其中最好的(不使用外部库)似乎是:

import java.util.Scanner
import java.io.File
new Scanner(new File("file.txt")).useDelimiter("\Z").next()

From reading mailing list discussions, it's not clear to me that scala.io.Source is even supposed to be the canonical I/O library. I don't understand what its intended purpose is, exactly.

通过阅读邮件列表讨论,我不清楚 scala.io.Source 甚至应该是规范的 I/O 库。我不明白它的预期目的是什么。

... I'd like something dead-simple and easy to remember. For example, in these languages it's very hard to forget the idiom ...

...我想要一些简单易记的东西。例如,在这些语言中,很难忘记习语......

Ruby    open("file.txt").read
Ruby    File.read("file.txt")
Python  open("file.txt").read()

回答by Daniel C. Sobral

val lines = scala.io.Source.fromFile("file.txt").mkString

By the way, "scala." isn't really necessary, as it's always in scope anyway, and you can, of course, import io's contents, fully or partially, and avoid having to prepend "io." too.

顺便说一下," scala." 并不是真正必要的,因为它总是在范围内,当然,您可以完全或部分导入 io 的内容,而不必在前面加上“io”。也。

The above leaves the file open, however. To avoid problems, you should close it like this:

但是,以上使文件保持打开状态。为避免出现问题,您应该像这样关闭它:

val source = scala.io.Source.fromFile("file.txt")
val lines = try source.mkString finally source.close()

Another problem with the code above is that it is horrible slow due to its implementation nature. For larger files one should use:

上面代码的另一个问题是,由于其实现性质,它的速度非常慢。对于较大的文件,应使用:

source.getLines mkString "\n"

回答by Daniel Spiewak

Just to expand on Daniel's solution, you can shorten things up tremendously by inserting the following import into any file which requires file manipulation:

只是为了扩展 Daniel 的解决方案,您可以通过将以下导入插入到任何需要文件操作的文件中来大大缩短事情的时间:

import scala.io.Source._

With this, you can now do:

有了这个,您现在可以执行以下操作:

val lines = fromFile("file.txt").getLines

I would be wary of reading an entire file into a single String. It's a very bad habit, one which will bite you sooner and harder than you think. The getLinesmethod returns a value of type Iterator[String]. It's effectively a lazy cursor into the file, allowing you to examine just the data you need without risking memory glut.

我会警惕将整个文件读入单个String. 这是一个非常糟糕的习惯,它会比你想象的更快更难咬你。该getLines方法返回一个类型的值Iterator[String]。它实际上是文件中的惰性游标,允许您只检查所需的数据,而不会冒内存过剩的风险。

Oh, and to answer your implied question about Source: yes, it is the canonical I/O library. Most code ends up using java.iodue to its lower-level interface and better compatibility with existing frameworks, but any code which has a choice should be using Source, particularly for simple file manipulation.

哦,回答你隐含的问题Source:是的,它是规范的 I/O 库。java.io由于其较低级别的接口和与现有框架更好的兼容性,大多数代码最终都会使用,但是任何有选择的代码都应该使用Source,特别是对于简单的文件操作。

回答by Walter Chang

// for file with utf-8 encoding
val lines = scala.io.Source.fromFile("file.txt", "utf-8").getLines.mkString

回答by psp

(EDIT: This does not work in scala 2.9 and maybe not 2.8 either)

(编辑:这在 Scala 2.9 中不起作用,也可能在 2.8 中不起作用)

Use trunk:

使用中继:

scala> io.File("/etc/passwd").slurp
res0: String = 
##
# User Database
# 
... etc

回答by Paul Draper

import java.nio.charset.StandardCharsets._
import java.nio.file.{Files, Paths}

new String(Files.readAllBytes(Paths.get("file.txt")), UTF_8)

Control over character encoding, and no resources to clean up. Also, possibly optimized (e.g. Files.readAllBytesallocating a byte array appropriate to the size of the file).

控制字符编码,无需清理资源。此外,可能已优化(例如,Files.readAllBytes分配适合文件大小的字节数组)。

回答by Ikai Lan

I've been told that Source.fromFile is problematic. Personally, I have had problems opening large files with Source.fromFile and have had to resort to Java InputStreams.

有人告诉我 Source.fromFile 有问题。就我个人而言,我在使用 Source.fromFile 打开大文件时遇到了问题,不得不求助于 Java InputStreams。

Another interesting solution is using scalax. Here's an example?of some well commented code that opens a log file using ManagedResource to open a file with scalax helpers: http://pastie.org/pastes/420714

另一个有趣的解决方案是使用 scalax。这是一个示例?一些注释良好的代码使用 ManagedResource 打开日志文件以打开带有 scalax 帮助程序的文件:http://pastie.org/pastes/420714

回答by Muyyatin

Using getLines() on scala.io.Source discards what characters were used for line terminators (\n, \r, \r\n, etc.)

在 scala.io.Source 上使用 getLines() 会丢弃用于行终止符的字符(\n、\r、\r\n 等)

The following should preserve it character-for-character, and doesn't do excessive string concatenation (performance problems):

以下内容应逐个保留它,并且不会进行过多的字符串连接(性能问题):

def fileToString(file: File, encoding: String) = {
  val inStream = new FileInputStream(file)
  val outStream = new ByteArrayOutputStream
  try {
    var reading = true
    while ( reading ) {
      inStream.read() match {
        case -1 => reading = false
        case c => outStream.write(c)
      }
    }
    outStream.flush()
  }
  finally {
    inStream.close()
  }
  new String(outStream.toByteArray(), encoding)
}

回答by pathikrit

One more: https://github.com/pathikrit/better-files#streams-and-codecs

还有一个:https: //github.com/pathikrit/better-files#streams-and-codecs

Various ways to slurp a file without loading the contents into memory:

在不将内容加载到内存中的情况下 slurp 文件的各种方法:

val bytes  : Iterator[Byte]            = file.bytes
val chars  : Iterator[Char]            = file.chars
val lines  : Iterator[String]          = file.lines
val source : scala.io.BufferedSource   = file.content 

You can supply your own codec too for anything that does a read/write (it assumes scala.io.Codec.default if you don't provide one):

您也可以为任何进行读/写操作的编解码器提供自己的编解码器(如果您不提供,则假定为 scala.io.Codec.default):

val content: String = file.contentAsString  // default codec
// custom codec:
import scala.io.Codec
file.contentAsString(Codec.ISO8859)
//or
import scala.io.Codec.string2codec
file.write("hello world")(codec = "US-ASCII")

回答by Dzmitry Lazerka

Just like in Java, using CommonsIO library:

就像在 Java 中一样,使用 CommonsIO 库:

FileUtils.readFileToString(file, StandardCharsets.UTF_8)

Also, many answers here forget Charset. It's better to always provide it explicitly, or it will hit one day.

此外,这里的许多答案都忘记了 Charset。最好始终明确提供它,否则有一天它会发生。

回答by elm

For emulating Ruby syntax (and convey the semantics) of opening and reading a file, consider this implicit class (Scala 2.10 and upper),

为了模拟打开和读取文件的 Ruby 语法(并传达语义),请考虑这个隐式类(Scala 2.10 及更高版本),

import java.io.File

def open(filename: String) = new File(filename)

implicit class RichFile(val file: File) extends AnyVal {
  def read = io.Source.fromFile(file).getLines.mkString("\n")
}

In this way,

这样,

open("file.txt").read