Java InputStream 编码/字符集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3043710/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 15:56:49  来源:igfitidea点击:

Java InputStream encoding/charset

javaencodingiso-8859-1

提问by Tobbe

Running the following (example) code

运行以下(示例)代码

import java.io.*;

public class test {
    public static void main(String[] args) throws Exception {
        byte[] buf = {-27};
        InputStream is = new ByteArrayInputStream(buf);
        BufferedReader r = new BufferedReader(
                new InputStreamReader(is, "ISO-8859-1"));
        String s = r.readLine();
        System.out.println("test.java:9 [byte] (char)" + (char)s.getBytes()[0] + 
                " (int)" + (int)s.getBytes()[0]);
        System.out.println("test.java:10 [char] (char)" + (char)s.charAt(0) + 
                " (int)" + (int)s.charAt(0));
        System.out.println("test.java:11 string below");
        System.out.println(s);
        System.out.println("test.java:13 string above");
    }
}

gives me this output

给我这个输出

test.java:9 [byte] (char)? (int)63
test.java:10 [char] (char)? (int)229
test.java:11 string below
?
test.java:13 string above

How do I retain the correct byte value (-27) in the line-9 printout? And consequently receive the expected output of the System.out.println(s)command (?).

如何在第 9 行打印输出中保留正确的字节值 (-27)?并因此收到System.out.println(s)命令 (?)的预期输出。

采纳答案by Jon Skeet

If you want to retain bytevalues, don't use a Reader at all, ideally. To represent arbitrary binary data in text and convert it back to binary data later, you should use base16 or base64 encoding.

如果您想保留字节值,最好不要使用 Reader。要在文本中表示任意二进制数据并稍后将其转换回二进制数据,您应该使用 base16 或 base64 编码。

However, to explain what's going on, when you call s.getBytes()that's using the defaultcharacter encoding, which apparently doesn't include Unicode character U+00E5.

但是,为了解释发生了什么,当您调用s.getBytes()它时使用的是默认字符编码,这显然不包括 Unicode 字符 U+00E5。

If you call s.getBytes("ISO-8859-1")everywhere instead of s.getBytes()I suspect you'll get back the right byte value... but relying on ISO-8859-1 for this is kinda dirty IMO.

如果你s.getBytes("ISO-8859-1")到处打电话而不是s.getBytes()我怀疑你会得到正确的字节值......但是依靠 ISO-8859-1 这有点肮脏的 IMO。

回答by Matthew Flaschen

As noted, getBytes()(no-arguments) uses the Java platform default encoding, which may not be ISO-8859-1. Simply printing it should work, provided your terminal and the default encoding match and support the character. For instance, on my system, the terminal and default Java encoding are both UTF-8. The fact that you're seeing a '?' indicates that yours don't match or ? is not supported.

如前所述,getBytes()(无参数)使用 Java 平台默认编码,可能不是 ISO-8859-1。只要您的终端和默认编码匹配并支持该字符,只需打印它就可以工作。例如,在我的系统上,终端和默认 Java 编码都是 UTF-8。事实上,你看到的是一个“?” 表示您的不匹配或?不支持。

If you want to manually encode to UTF-8 on your system, do:

如果要在系统上手动编码为 UTF-8,请执行以下操作:

String s = r.readLine();
byte[] utf8Bytes = s.getBytes("UTF-8");

It should give a byte array with {-61, -91}.

它应该给出一个带有{-61, -91}.