Java，使用扫描仪将字符输入为 UTF-8，无法打印文本

Question

提问by famfamfam

I can convert String to Array as UTF-8, but I can't convert it back to String like the first String.

我可以将字符串转换为 UTF-8 格式的数组，但不能像第一个字符串那样将其转换回字符串。

public static void main(String[] args) {

    Scanner h = new Scanner(System.in);
    System.out.println("INPUT : ");
    String stringToConvert = h.nextLine();
    byte[] theByteArray = stringToConvert.getBytes();

    System.out.println(theByteArray);
    theByteArray.toString();
    String s = new String(theByteArray);

    System.out.println(""+s);
}

How do I print theByteArrayas a String?

如何打印theByteArray为字符串？

Answer 1

回答by Joe

String s = new String(theByteArray);

should really be

真的应该

String s = new String(theByteArray, Charset.forName("UTF-8"));

The underlying issue here is that String constructors aren't smart. The String constructor cannot distinguish the charset that is being used and will try to convert it using the system standard which is generally something like ASCII or ISO-8859-1. This is why normal A-Za-z looks proper but then everything else begins to fail.

这里的根本问题是 String 构造函数并不智能。String 构造函数无法区分正在使用的字符集，并且会尝试使用系统标准（通常类似于 ASCII 或 ISO-8859-1）来转换它。这就是为什么正常的 A-Za-z 看起来不错，但随后其他一切都开始失败的原因。

byte is a type that runs from -127 to 127 thus for UTF-8 conversion consecutive bytes need to be concatenated. It's impossible for the String constructor to distinguish this off a byte array so it will handle each byte individually by default (thus why basic alphanumeric will always work as they fall into this range).

byte 是一种从 -127 到 127 的类型，因此对于 UTF-8 转换，需要连接连续的字节。String 构造函数不可能将其与字节数组区分开来，因此默认情况下它将单独处理每个字节（因此，当基本字母数字落入此范围时，它们将始终有效）。

Example:

例子：

String text = "こんにちは";
byte[] array = text.getBytes("UTF-8");
String s = new String(array, Charset.forName("UTF-8"));
System.out.println(s); // Prints as expected
String sISO = new String(array, Charset.forName("ISO-8859-1")); // Prints '???????ˉ'
System.out.println(sISO);

Answer 2

回答by Jake Greene

There are several problems with the provided code:

提供的代码有几个问题：

You are not ensuring that you are getting the UTF-8 byte array from that String.
```
byte[] theByteArray = stringToConvert.getBytes();
```
returns a byte array with the default encoding on the given platform, as described by the JavaDoc. What you actually want to do is the following:
```
byte[] theByteArray = stringToConvert.getBytes("UTF-8");
```
You should check the documentationfor System.out.println():
```
System.out.println(theByteArray);
```
is calling System.out.println(Object x), which will print the results of x.toString(). By default, toString() returns the memory address of the given object.
So when you see output of the form:
INPUT :
[B@5f1121f6
inputText
What you are seeing is the memory location of theByteArray and then the given input line of text.
You seem to not understand the 'x.toString()' method. Remember, Strings in Java are immutable; None of String's methods will alter the String. theByteArray.toString();returnsa string representation of theByteArray;. The returned value is thrown out unless you give the value to another String
```
String arrayAsString = theByteArray.toString();
```
However, as previously described, the returned String will be the memory location of theByteArray. In order to print out the contents of theByteArray, you will need to convert it to a String
```
String convertedString = new String(theByteArray, Charset.forName("UTF-8"));
```

您不能确保从该字符串中获取 UTF-8 字节数组。
```
byte[] theByteArray = stringToConvert.getBytes();
```
如JavaDoc所述，返回给定平台上具有默认编码的字节数组。您真正想要做的是以下内容：
```
byte[] theByteArray = stringToConvert.getBytes("UTF-8");
```
您应该检查该文件为System.out.println()：
```
System.out.println(theByteArray);
```
正在调用System.out.println(Object x)，它将打印的结果x.toString()。默认情况下， toString() 返回给定对象的内存地址。
因此，当您看到表单的输出时：
输入：
[B@5f1121f6
输入文本
您看到的是 ByteArray 的内存位置，然后是给定的文本输入行。
您似乎不了解 'x.toString()' 方法。请记住，Java 中的字符串是不可变的；String 的任何方法都不会改变 String。theByteArray.toString();返回的字符串表示形式theByteArray;。除非您将该值提供给另一个 String，否则返回的值将被抛出
```
String arrayAsString = theByteArray.toString();
```
但是，如前所述，返回的 String 将是的内存位置theByteArray。为了打印出的内容theByteArray，您需要将其转换为字符串
```
String convertedString = new String(theByteArray, Charset.forName("UTF-8"));
```

Assumingyour requirements are to print the converted String and then print the original String, your code should look something like this:

假设您的要求是打印转换后的字符串，然后打印原始字符串，您的代码应如下所示：

public static void main(String[] args) {

    Scanner h = new Scanner(System.in);
    System.out.println("INPUT : ");
    String stringToConvert = h.nextLine();

    try {
        // Array of the UTF-8 representation of the given String
        byte[] theByteArray;
        theByteArray = stringToConvert.getBytes("UTF-8");

        // The converted String
        System.out.println(new String(theByteArray, Charset.forName("UTF-8")));
    } catch (UnsupportedEncodingException e) {
        // We may provide an invalid character set
        e.printStackTrace();
    }

    // The original String
    System.out.println(stringToConvert);
}

Java，使用扫描仪将字符输入为 UTF-8，无法打印文本

提问by famfamfam

回答by Joe

回答by Jake Greene

相关推荐

最近更新

标签

Java，使用扫描仪将字符输入为 UTF-8，无法打印文本

提问by famfamfam

回答by Joe

回答by Jake Greene

相关推荐

线程“main”中的奇怪异常 java.io.FileNotFoundException I/O Java

在 Java 中检查字符串是主机名还是 IP 地址

java DAO 设计模式

java @JoinFormula 和 @OneToMany 定义 - 糟糕的文档

相关推荐

最近更新

标签