Java Files.readAllBytes 与 Files.lines 获取 MalformedInputException

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29937600/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 08:50:45  来源:igfitidea点击:

Files.readAllBytes vs Files.lines getting MalformedInputException

javafile-iostream

提问by Angelo.Hannes

I would have thought that the following two approaches to read a file should behave equally. But they don't. The second approach is throwing a MalformedInputException.

我原以为以下两种读取文件的方法应该表现相同。但他们没有。第二种方法是抛出一个MalformedInputException.

public static void main(String[] args) {    
    try {
        String content = new String(Files.readAllBytes(Paths.get("_template.txt")));
        System.out.println(content);
    } catch (IOException e) {
        e.printStackTrace();
    }

    try(Stream<String> lines = Files.lines(Paths.get("_template.txt"))) {
        lines.forEach(System.out::println);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

This is the stack trace:

这是堆栈跟踪:

Exception in thread "main" java.io.UncheckedIOException: java.nio.charset.MalformedInputException: Input length = 1
    at java.io.BufferedReader.hasNext(BufferedReader.java:574)
    at java.util.Iterator.forEachRemaining(Iterator.java:115)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
    at Test.main(Test.java:19)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
    at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.fill(BufferedReader.java:161)
    at java.io.BufferedReader.readLine(BufferedReader.java:324)
    at java.io.BufferedReader.readLine(BufferedReader.java:389)
    at java.io.BufferedReader.hasNext(BufferedReader.java:571)
    ... 4 more

What is the difference here, and how do I fix it?

这里有什么区别,我该如何解决?

采纳答案by Jesper

This has to do with character encoding. Computers only deal with numbers. To store text, the characters in the text have to be converted to and from numbers, using some scheme. That scheme is called the character encoding. There are many different character encodings; some of the well-known standard character encodings are ASCII, ISO-8859-1 and UTF-8.

这与字符编码有关。计算机只处理数字。要存储文本,必须使用某种方案将文本中的字符转换为数字或从数字转换。该方案称为字符编码。有许多不同的字符编码;一些众所周知的标准字符编码是 ASCII、ISO-8859-1 和 UTF-8。

In the first example, you read all the bytes (numbers) in the file and then convert them to characters by passing them to the constructor of class String. This will use the default character encoding of your system (whatever it is on your operating system) to convert the bytes to characters.

在第一个示例中,您读取文件中的所有字节(数字),然后通过将它们传递给 class 的构造函数将它们转换为字符String。这将使用您系统的默认字符编码(无论您的操作系统是什么)将字节转换为字符。

In the second example, where you use Files.lines(...), the UTF-8 character encoding will be used, according to the documentation. When a sequence of bytes is found in the file that is not a valid UTF-8 sequence, you'll get a MalformedInputException.

Files.lines(...)根据文档,在第二个示例中,您使用 的地方将使用UTF-8 字符编码。当在文件中找到不是有效 UTF-8 序列的字节序列时,您将获得一个MalformedInputException.

The default character encoding of your system may or may not be UTF-8, so that can explain a difference in behaviour.

您系统的默认字符编码可能是也可能不是 UTF-8,因此这可以解释行为上的差异。

You'll have to find out what character encoding is used for the file, and then explicitly use that. For example:

您必须找出文件使用的字符编码,然后明确使用它。例如:

String content = new String(Files.readAllBytes(Paths.get("_template.txt")),
        StandardCharsets.ISO_8859_1);

Second example:

第二个例子:

Stream<String> lines = Files.lines(Paths.get("_template.txt"),
        StandardCharsets.ISO_8859_1);

回答by Evan Knowles

Files.linesby default uses the UTF-8 encoding, whereas instantiating a new String from bytes will use the default system encoding. It appears that your file is not in UTF-8, which is why it is failing.

Files.lines默认情况下使用UTF-8 编码,而从字节实例化新字符串将使用默认系统编码。您的文件似乎不是 UTF-8,这就是它失败的原因。

Check what encoding your file is using, and pass it as the second parameter.

检查您的文件使用的编码,并将其作为第二个参数传递。

回答by fge

To complement Jesper's answer, what happens here (and is undocumented!) is that Files.lines()creates a CharsetDecoderwhose policy is to reject invalid byte sequences; that is, its CodingErrorActionis set to REPORT.

为了补充Jesper 的回答,这里发生的事情(并且没有记录!)是Files.lines()创建了一个CharsetDecoder拒绝无效字节序列的策略;也就是说,它CodingErrorAction被设置为REPORT

This is unlike what happens for nearly all other Readerimplementations provided by the JDK, whose standard policy is to REPLACE. This policy will result in all unmappable byte sequences to emit a replacement character (U+FFFD).

这与ReaderJDK 提供的几乎所有其他实现不同,JDK 的标准策略是REPLACE. 此策略将导致所有不可映射的字节序列发出替换字符 (U+FFFD)

回答by delive

2017 use:

2017年使用:

 Charset.forName("ISO_8859_1") instead of Charsets.ISO_8859_1