Java Files.readAllBytes 与 Files.lines 获取 MalformedInputException

Question

提问by Angelo.Hannes

I would have thought that the following two approaches to read a file should behave equally. But they don't. The second approach is throwing a MalformedInputException.

我原以为以下两种读取文件的方法应该表现相同。但他们没有。第二种方法是抛出一个MalformedInputException.

public static void main(String[] args) {    
    try {
        String content = new String(Files.readAllBytes(Paths.get("_template.txt")));
        System.out.println(content);
    } catch (IOException e) {
        e.printStackTrace();
    }

    try(Stream<String> lines = Files.lines(Paths.get("_template.txt"))) {
        lines.forEach(System.out::println);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

This is the stack trace:

这是堆栈跟踪：

Exception in thread "main" java.io.UncheckedIOException: java.nio.charset.MalformedInputException: Input length = 1
    at java.io.BufferedReader.hasNext(BufferedReader.java:574)
    at java.util.Iterator.forEachRemaining(Iterator.java:115)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
    at Test.main(Test.java:19)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
    at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.fill(BufferedReader.java:161)
    at java.io.BufferedReader.readLine(BufferedReader.java:324)
    at java.io.BufferedReader.readLine(BufferedReader.java:389)
    at java.io.BufferedReader.hasNext(BufferedReader.java:571)
    ... 4 more

What is the difference here, and how do I fix it?

这里有什么区别，我该如何解决？

Answer 1

采纳答案by Jesper

This has to do with character encoding. Computers only deal with numbers. To store text, the characters in the text have to be converted to and from numbers, using some scheme. That scheme is called the character encoding. There are many different character encodings; some of the well-known standard character encodings are ASCII, ISO-8859-1 and UTF-8.

这与字符编码有关。计算机只处理数字。要存储文本，必须使用某种方案将文本中的字符转换为数字或从数字转换。该方案称为字符编码。有许多不同的字符编码；一些众所周知的标准字符编码是 ASCII、ISO-8859-1 和 UTF-8。

In the first example, you read all the bytes (numbers) in the file and then convert them to characters by passing them to the constructor of class String. This will use the default character encoding of your system (whatever it is on your operating system) to convert the bytes to characters.

在第一个示例中，您读取文件中的所有字节（数字），然后通过将它们传递给 class 的构造函数将它们转换为字符String。这将使用您系统的默认字符编码（无论您的操作系统是什么）将字节转换为字符。

In the second example, where you use Files.lines(...), the UTF-8 character encoding will be used, according to the documentation. When a sequence of bytes is found in the file that is not a valid UTF-8 sequence, you'll get a MalformedInputException.

Files.lines(...)根据文档，在第二个示例中，您使用的地方将使用UTF-8 字符编码。当在文件中找到不是有效 UTF-8 序列的字节序列时，您将获得一个MalformedInputException.

The default character encoding of your system may or may not be UTF-8, so that can explain a difference in behaviour.

您系统的默认字符编码可能是也可能不是 UTF-8，因此这可以解释行为上的差异。

You'll have to find out what character encoding is used for the file, and then explicitly use that. For example:

您必须找出文件使用的字符编码，然后明确使用它。例如：

String content = new String(Files.readAllBytes(Paths.get("_template.txt")),
        StandardCharsets.ISO_8859_1);

Second example:

第二个例子：

Stream<String> lines = Files.lines(Paths.get("_template.txt"),
        StandardCharsets.ISO_8859_1);

Answer 2

回答by Evan Knowles

Files.linesby default uses the UTF-8 encoding, whereas instantiating a new String from bytes will use the default system encoding. It appears that your file is not in UTF-8, which is why it is failing.

Files.lines默认情况下使用UTF-8 编码，而从字节实例化新字符串将使用默认系统编码。您的文件似乎不是 UTF-8，这就是它失败的原因。

Check what encoding your file is using, and pass it as the second parameter.

检查您的文件使用的编码，并将其作为第二个参数传递。

Answer 3

回答by fge

To complement Jesper's answer, what happens here (and is undocumented!) is that Files.lines()creates a CharsetDecoderwhose policy is to reject invalid byte sequences; that is, its CodingErrorActionis set to REPORT.

为了补充Jesper 的回答，这里发生的事情（并且没有记录！）是Files.lines()创建了一个CharsetDecoder拒绝无效字节序列的策略；也就是说，它CodingErrorAction被设置为REPORT。

This is unlike what happens for nearly all other Readerimplementations provided by the JDK, whose standard policy is to REPLACE. This policy will result in all unmappable byte sequences to emit a replacement character (U+FFFD).

这与ReaderJDK 提供的几乎所有其他实现不同，JDK 的标准策略是REPLACE. 此策略将导致所有不可映射的字节序列发出替换字符 (U+FFFD)。

Answer 4

回答by delive

2017 use:

2017年使用：

 Charset.forName("ISO_8859_1") instead of Charsets.ISO_8859_1

Java Files.readAllBytes 与 Files.lines 获取 MalformedInputException

提问by Angelo.Hannes

采纳答案by Jesper

回答by Evan Knowles

回答by fge

回答by delive

相关推荐

最近更新

标签

Java Files.readAllBytes 与 Files.lines 获取 MalformedInputException

提问by Angelo.Hannes

采纳答案by Jesper

回答by Evan Knowles

回答by fge

回答by delive

相关推荐

如何以编程方式强制停止使用 Java 的 Android 应用程序？

如何从 Java 漂亮地打印 XML？

Java 使用 Gradle 过滤 JaCoCo 覆盖率报告

Java 类加载器问题 - 如何确定加载了哪些库版本（jar 文件）

相关推荐

最近更新

标签