如何从文件内容创建 Java 字符串？

Question

提问by OscarRyz

I've been using the idiom below for some time now. And it seems to be the most wide-spread, at least on the sites I've visited.

我已经使用下面的成语有一段时间了。它似乎是最普遍的，至少在我访问过的网站上是这样。

Is there a better/different way to read a file into a string in Java?

有没有更好/不同的方式将文件读入 Java 中的字符串？

private String readFile(String file) throws IOException {
    BufferedReader reader = new BufferedReader(new FileReader (file));
    String         line = null;
    StringBuilder  stringBuilder = new StringBuilder();
    String         ls = System.getProperty("line.separator");

    try {
        while((line = reader.readLine()) != null) {
            stringBuilder.append(line);
            stringBuilder.append(ls);
        }

        return stringBuilder.toString();
    } finally {
        reader.close();
    }
}

Answer 1

回答by Willi aus Rohr

If you're willing to use an external library, check out Apache Commons IO(200KB JAR). It contains an org.apache.commons.io.FileUtils.readFileToString()method that allows you to read an entire Fileinto a Stringwith one line of code.

如果您愿意使用外部库，请查看Apache Commons IO(200KB JAR)。它包含一种org.apache.commons.io.FileUtils.readFileToString()方法，允许您用一行代码将整个读File入String。

Example:

例子：

import java.io.*;
import java.nio.charset.*;
import org.apache.commons.io.*;

public String readFile() throws IOException {
    File file = new File("data.txt");
    return FileUtils.readFileToString(file, StandardCharsets.UTF_8);
}

Answer 2

回答by Claudiu

Java attempts to be extremely general and flexible in all it does. As a result, something which is relatively simple in a scripting language (your code would be replaced with "open(file).read()" in python) is a lot more complicated. There doesn't seem to be any shorter way of doing it, except using an external library (like Willi aus Rohrmentioned). Your options:

Java 试图在所有方面都非常通用和灵活。结果，在脚本语言中相对简单的东西（open(file).read()在 python 中你的代码将被替换为“ ”）要复杂得多。除了使用外部库（如Willi aus Rohr提到的）之外，似乎没有任何更短的方法可以做到这一点。您的选择：

Use an external library.
Copy this code into all your projects.
Create your own mini-library which contains functions you use often.

使用外部库。
将此代码复制到您的所有项目中。
创建您自己的迷你库，其中包含您经常使用的功能。

Your best bet is probably the 2nd one, as it has the least dependencies.

您最好的选择可能是第二个，因为它具有最少的依赖性。

Answer 3

回答by erickson

Read all text from a file

从文件中读取所有文本

Java 11 added the readString()method to read small files as a String, preserving line terminators:

Java 11 添加了readString()方法来读取小文件作为String，保留行终止符：

String content = Files.readString(path, StandardCharsets.US_ASCII);

For versions between Java 7 and 11, here's a compact, robust idiom, wrapped up in a utility method:

对于 Java 7 和 11 之间的版本，这里有一个紧凑、健壮的习惯用法，包含在一个实用程序方法中：

static String readFile(String path, Charset encoding) 
  throws IOException 
{
  byte[] encoded = Files.readAllBytes(Paths.get(path));
  return new String(encoded, encoding);
}

Read lines of text from a file

从文件中读取文本行

Java 7 added a convenience method to read a file as lines of text,represented as a List<String>. This approach is "lossy" because the line separators are stripped from the end of each line.

Java 7 添加了一种方便的方法来读取文件作为文本行，表示为List<String>. 这种方法是“有损的”，因为行分隔符从每行的末尾剥离。

List<String> lines = Files.readAllLines(Paths.get(path), encoding);

Java 8 added the Files.lines()method to produce a Stream<String>. Again, this method is lossy because line separators are stripped. If an IOExceptionis encountered while reading the file, it is wrapped in an UncheckedIOException, since Streamdoesn't accept lambdas that throw checked exceptions.

Java 8 添加了Files.lines()生成Stream<String>. 同样，这种方法是有损的，因为行分隔符被剥离了。如果IOException在读取文件时遇到an ，则将其包装在 an 中UncheckedIOException，因为Stream它不接受抛出已检查异常的 lambda。

try (Stream<String> lines = Files.lines(path, encoding)) {
  lines.forEach(System.out::println);
}

This Streamdoes need a close()call; this is poorly documented on the API, and I suspect many people don't even notice Streamhas a close()method. Be sure to use an ARM-block as shown.

这Stream确实需要一个close()电话；这在 API 上的记录很差，我怀疑很多人甚至没有注意到Stream有一个close()方法。请务必使用如图所示的 ARM 模块。

If you are working with a source other than a file, you can use the lines()method in BufferedReaderinstead.

如果您正在使用文件以外的源，则可以使用lines()in 方法BufferedReader。

Memory utilization

内存利用率

The first method, that preserves line breaks, can temporarily require memory several times the size of the file, because for a short time the raw file contents (a byte array), and the decoded characters (each of which is 16 bits even if encoded as 8 bits in the file) reside in memory at once. It is safest to apply to files that you know to be small relative to the available memory.

第一种保留换行符的方法可能会临时需要数倍于文件大小的内存，因为在短时间内原始文件内容（字节数组）和解码后的字符（即使经过编码也是 16 位）作为文件中的 8 位）一次驻留在内存中。应用于您知道相对于可用内存较小的文件是最安全的。

The second method, reading lines, is usually more memory efficient, because the input byte buffer for decoding doesn't need to contain the entire file. However, it's still not suitable for files that are very large relative to available memory.

第二种方法，读取行，通常内存效率更高，因为用于解码的输入字节缓冲区不需要包含整个文件。但是，它仍然不适合相对于可用内存非常大的文件。

For reading large files, you need a different design for your program, one that reads a chunk of text from a stream, processes it, and then moves on to the next, reusing the same fixed-sized memory block. Here, "large" depends on the computer specs. Nowadays, this threshold might be many gigabytes of RAM. The third method, using a Stream<String>is one way to do this, if your input "records" happen to be individual lines. (Using the readLine()method of BufferedReaderis the procedural equivalent to this approach.)

为了读取大文件，您需要为您的程序进行不同的设计，从流中读取一大块文本，处理它，然后移动到下一个，重用相同的固定大小的内存块。在这里，“大”取决于计算机规格。如今，这个阈值可能是数 GB 的 RAM。Stream<String>如果您的输入“记录”恰好是单独的行，则使用 a 的第三种方法是执行此操作的一种方法。（使用的readLine()方法BufferedReader与此方法在程序上等效。）

Character encoding

字符编码

One thing that is missing from the sample in the original post is the character encoding. There are some special cases where the platform default is what you want, but they are rare, and you should be able justify your choice.

原始帖子中的示例中缺少的一件事是字符编码。在某些特殊情况下，平台默认值正是您想要的，但这种情况很少见，您应该能够证明您的选择是合理的。

The StandardCharsetsclass define some constants for the encodings required of all Java runtimes:

在StandardCharsets类中定义的所有Java运行时所需要的编码的一些常量：

String content = readFile("test.txt", StandardCharsets.UTF_8);

The platform default is available from the Charsetclassitself:

该平台默认可从该Charset类本身：

String content = readFile("test.txt", Charset.defaultCharset());

Note: This answer largely replaces my Java 6 version. The utility of Java 7 safely simplifies the code, and the old answer, which used a mapped byte buffer, prevented the file that was read from being deleted until the mapped buffer was garbage collected. You can view the old version via the "edited" link on this answer.

注意：这个答案很大程度上取代了我的 Java 6 版本。Java 7 的实用程序安全地简化了代码，并且使用映射字节缓冲区的旧答案阻止了读取的文件被删除，直到映射缓冲区被垃圾收集。您可以通过此答案上的“已编辑”链接查看旧版本。

Answer 4

回答by Dónal

If you're looking for an alternative that doesn't involve a third-party library (e.g. Commons I/O), you can use the Scannerclass:

如果您正在寻找不涉及第三方库（例如Commons I/O）的替代方案，您可以使用Scanner类：

private String readFile(String pathname) throws IOException {

    File file = new File(pathname);
    StringBuilder fileContents = new StringBuilder((int)file.length());        

    try (Scanner scanner = new Scanner(file)) {
        while(scanner.hasNextLine()) {
            fileContents.append(scanner.nextLine() + System.lineSeparator());
        }
        return fileContents.toString();
    }
}

Answer 5

回答by Jon Skeet

That code will normalize line breaks, which may or may not be what you really want to do.

该代码将规范换行符，这可能是也可能不是您真正想要做的。

Here's an alternative which doesn't do that, and which is (IMO) simpler to understand than the NIO code (although it still uses java.nio.charset.Charset):

这是一个不这样做的替代方案，并且（IMO）比 NIO 代码更容易理解（尽管它仍然使用java.nio.charset.Charset）：

public static String readFile(String file, String csName)
            throws IOException {
    Charset cs = Charset.forName(csName);
    return readFile(file, cs);
}

public static String readFile(String file, Charset cs)
            throws IOException {
    // No real need to close the BufferedReader/InputStreamReader
    // as they're only wrapping the stream
    FileInputStream stream = new FileInputStream(file);
    try {
        Reader reader = new BufferedReader(new InputStreamReader(stream, cs));
        StringBuilder builder = new StringBuilder();
        char[] buffer = new char[8192];
        int read;
        while ((read = reader.read(buffer, 0, buffer.length)) > 0) {
            builder.append(buffer, 0, read);
        }
        return builder.toString();
    } finally {
        // Potential issue here: if this throws an IOException,
        // it will mask any others. Normally I'd use a utility
        // method which would log exceptions and swallow them
        stream.close();
    }        
}

Answer 6

回答by Dan Dyer

There is a variation on the same theme that uses a for loop, instead of a while loop, to limit the scope of the line variable. Whether it's "better" is a matter of personal taste.

同一主题有一个变体，它使用 for 循环而不是 while 循环来限制 line 变量的范围。是否“更好”是个人品味的问题。

for(String line = reader.readLine(); line != null; line = reader.readLine()) {
    stringBuilder.append(line);
    stringBuilder.append(ls);
}

Answer 7

回答by Scott S. McCoy

public static String slurp (final File file)
throws IOException {
    StringBuilder result = new StringBuilder();

    BufferedReader reader = new BufferedReader(new FileReader(file));

    try {
        char[] buf = new char[1024];

        int r = 0;

        while ((r = reader.read(buf)) != -1) {
            result.append(buf, 0, r);
        }
    }
    finally {
        reader.close();
    }

    return result.toString();
}

Answer 8

回答by finnw

Guavahas a method similar to the one from Commons IOUtils that Willi aus Rohr mentioned:

Guava有一种类似于 Willi aus Rohr 提到的 Commons IOUtils 的方法：

import com.google.common.base.Charsets;
import com.google.common.io.Files;

// ...

String text = Files.toString(new File(path), Charsets.UTF_8);

EDIT by PiggyPiglet
Files#toStringis deprecated, and due for removal Octobor 2019. Instead use Files.asCharSource(new File(path), StandardCharsets.UTF_8).read();

PiggyPiglet 的 EDIT
Files#toString已弃用，将于 2019 年 10 月移除。改为使用 Files.asCharSource(new File(path), StandardCharsets.UTF_8).read();

EDIT by Oscar Reyes

奥斯卡雷耶斯编辑

This is the (simplified) underlying code on the cited library:

这是引用库上的（简化的）底层代码：

InputStream in = new FileInputStream(file);
byte[] b  = new byte[file.length()];
int len = b.length;
int total = 0;

while (total < len) {
  int result = in.read(b, total, len - total);
  if (result == -1) {
    break;
  }
  total += result;
}

return new String( b , Charsets.UTF_8 );

Edit(by Jonik): The above doesn't match the source code of recent Guava versions. For the current source, see the classes Files, CharStreams, ByteSourceand CharSourcein com.google.common.iopackage.

编辑（由 Jonik）：以上与最近的 Guava 版本的源代码不匹配。对于电流源，看到类文件，CharStreams，ByteSource和CharSource在com.google.common.io包。

Answer 9

回答by Peter Lawrey

To read a File as binary and convert at the end

将文件读取为二进制文件并在最后进行转换

public static String readFileAsString(String filePath) throws IOException {
    DataInputStream dis = new DataInputStream(new FileInputStream(filePath));
    try {
        long len = new File(filePath).length();
        if (len > Integer.MAX_VALUE) throw new IOException("File "+filePath+" too large, was "+len+" bytes.");
        byte[] bytes = new byte[(int) len];
        dis.readFully(bytes);
        return new String(bytes, "UTF-8");
    } finally {
        dis.close();
    }
}

Answer 10

回答by Pablo Grisafi

A very lean solution based on Scanner:

一个非常精简的解决方案，基于Scanner：

Scanner scanner = new Scanner( new File("poem.txt") );
String text = scanner.useDelimiter("\A").next();
scanner.close(); // Put this call in a finally block

Or, if you want to set the charset:

或者，如果要设置字符集：

Scanner scanner = new Scanner( new File("poem.txt"), "UTF-8" );
String text = scanner.useDelimiter("\A").next();
scanner.close(); // Put this call in a finally block

Or, with a try-with-resourcesblock, which will call scanner.close()for you:

或者，使用try-with-resources块，它将调用scanner.close()您：

try (Scanner scanner = new Scanner( new File("poem.txt"), "UTF-8" )) {
    String text = scanner.useDelimiter("\A").next();
}

Remember that the Scannerconstructor can throw an IOException. And don't forget to import java.ioand java.util.

请记住，Scanner构造函数可以抛出一个IOException. 并且不要忘记导入java.io和java.util。

Source: Pat Niemeyer's blog

资料来源：Pat Niemeyer 的博客

如何从文件内容创建 Java 字符串？

提问by OscarRyz

回答by Willi aus Rohr

回答by Claudiu

回答by erickson

Read all text from a file

从文件中读取所有文本

Read lines of text from a file

从文件中读取文本行

Memory utilization

内存利用率

Character encoding

字符编码

回答by Dónal

回答by Jon Skeet

回答by Dan Dyer

回答by Scott S. McCoy

回答by finnw

回答by Peter Lawrey

回答by Pablo Grisafi

相关推荐

最近更新

标签

如何从文件内容创建 Java 字符串？

提问by OscarRyz

回答by Willi aus Rohr

回答by Claudiu

回答by erickson

Read all text from a file

从文件中读取所有文本

Read lines of text from a file

从文件中读取文本行

Memory utilization

内存利用率

Character encoding

字符编码

回答by Dónal

回答by Jon Skeet

回答by Dan Dyer

回答by Scott S. McCoy

回答by finnw

回答by Peter Lawrey

回答by Pablo Grisafi

相关推荐

Java 改造：500 内部服务器错误

Java 如何在图像中设置 DPI 信息？

Java 8 LocalDateTime 正在解析无效的日期

Java 动态绑定和方法覆盖

相关推荐

最近更新

标签