Java InputStream 编码/字符集

Question

提问by Tobbe

Running the following (example) code

运行以下（示例）代码

import java.io.*;

public class test {
    public static void main(String[] args) throws Exception {
        byte[] buf = {-27};
        InputStream is = new ByteArrayInputStream(buf);
        BufferedReader r = new BufferedReader(
                new InputStreamReader(is, "ISO-8859-1"));
        String s = r.readLine();
        System.out.println("test.java:9 [byte] (char)" + (char)s.getBytes()[0] + 
                " (int)" + (int)s.getBytes()[0]);
        System.out.println("test.java:10 [char] (char)" + (char)s.charAt(0) + 
                " (int)" + (int)s.charAt(0));
        System.out.println("test.java:11 string below");
        System.out.println(s);
        System.out.println("test.java:13 string above");
    }
}

gives me this output

给我这个输出

test.java:9 [byte] (char)? (int)63
test.java:10 [char] (char)? (int)229
test.java:11 string below
?
test.java:13 string above

How do I retain the correct byte value (-27) in the line-9 printout? And consequently receive the expected output of the System.out.println(s)command (?).

如何在第 9 行打印输出中保留正确的字节值 (-27)？并因此收到System.out.println(s)命令 (?)的预期输出。

Answer 1

采纳答案by Jon Skeet

If you want to retain bytevalues, don't use a Reader at all, ideally. To represent arbitrary binary data in text and convert it back to binary data later, you should use base16 or base64 encoding.

如果您想保留字节值，最好不要使用 Reader。要在文本中表示任意二进制数据并稍后将其转换回二进制数据，您应该使用 base16 或 base64 编码。

However, to explain what's going on, when you call s.getBytes()that's using the defaultcharacter encoding, which apparently doesn't include Unicode character U+00E5.

但是，为了解释发生了什么，当您调用s.getBytes()它时使用的是默认字符编码，这显然不包括 Unicode 字符 U+00E5。

If you call s.getBytes("ISO-8859-1")everywhere instead of s.getBytes()I suspect you'll get back the right byte value... but relying on ISO-8859-1 for this is kinda dirty IMO.

如果你s.getBytes("ISO-8859-1")到处打电话而不是s.getBytes()我怀疑你会得到正确的字节值......但是依靠 ISO-8859-1 这有点肮脏的 IMO。

Answer 2

回答by Matthew Flaschen

As noted, getBytes()(no-arguments) uses the Java platform default encoding, which may not be ISO-8859-1. Simply printing it should work, provided your terminal and the default encoding match and support the character. For instance, on my system, the terminal and default Java encoding are both UTF-8. The fact that you're seeing a '?' indicates that yours don't match or ? is not supported.

如前所述，getBytes()（无参数）使用 Java 平台默认编码，可能不是 ISO-8859-1。只要您的终端和默认编码匹配并支持该字符，只需打印它就可以工作。例如，在我的系统上，终端和默认 Java 编码都是 UTF-8。事实上，你看到的是一个“？” 表示您的不匹配或？不支持。

If you want to manually encode to UTF-8 on your system, do:

如果要在系统上手动编码为 UTF-8，请执行以下操作：

String s = r.readLine();
byte[] utf8Bytes = s.getBytes("UTF-8");

It should give a byte array with {-61, -91}.

它应该给出一个带有{-61, -91}.

Java InputStream 编码/字符集

提问by Tobbe

采纳答案by Jon Skeet

回答by Matthew Flaschen

相关推荐

最近更新

标签

Java InputStream 编码/字符集

提问by Tobbe

采纳答案by Jon Skeet

回答by Matthew Flaschen

相关推荐

Java 如何访问流之外的 Spring Webflow FlowScope 元素？

Java 使用 CXF 通过 HTTP 基本身份验证使用 Web 服务时出现 401 错误

Java 如何从命令行使用 maven 运行 selenium 测试？

是否可以在 Glassfish 3 上使用 Java 8？

相关推荐

最近更新

标签