Java,使用扫描仪将字符输入为 UTF-8,无法打印文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9980699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java, Using Scanner to input characters as UTF-8, can't print text
提问by famfamfam
I can convert String to Array as UTF-8, but I can't convert it back to String like the first String.
我可以将字符串转换为 UTF-8 格式的数组,但不能像第一个字符串那样将其转换回字符串。
public static void main(String[] args) {
Scanner h = new Scanner(System.in);
System.out.println("INPUT : ");
String stringToConvert = h.nextLine();
byte[] theByteArray = stringToConvert.getBytes();
System.out.println(theByteArray);
theByteArray.toString();
String s = new String(theByteArray);
System.out.println(""+s);
}
How do I print theByteArray
as a String?
如何打印theByteArray
为字符串?
回答by Joe
String s = new String(theByteArray);
should really be
真的应该
String s = new String(theByteArray, Charset.forName("UTF-8"));
The underlying issue here is that String constructors aren't smart. The String constructor cannot distinguish the charset that is being used and will try to convert it using the system standard which is generally something like ASCII or ISO-8859-1. This is why normal A-Za-z looks proper but then everything else begins to fail.
这里的根本问题是 String 构造函数并不智能。String 构造函数无法区分正在使用的字符集,并且会尝试使用系统标准(通常类似于 ASCII 或 ISO-8859-1)来转换它。这就是为什么正常的 A-Za-z 看起来不错,但随后其他一切都开始失败的原因。
byte is a type that runs from -127 to 127 thus for UTF-8 conversion consecutive bytes need to be concatenated. It's impossible for the String constructor to distinguish this off a byte array so it will handle each byte individually by default (thus why basic alphanumeric will always work as they fall into this range).
byte 是一种从 -127 到 127 的类型,因此对于 UTF-8 转换,需要连接连续的字节。String 构造函数不可能将其与字节数组区分开来,因此默认情况下它将单独处理每个字节(因此,当基本字母数字落入此范围时,它们将始终有效)。
Example:
例子:
String text = "こんにちは";
byte[] array = text.getBytes("UTF-8");
String s = new String(array, Charset.forName("UTF-8"));
System.out.println(s); // Prints as expected
String sISO = new String(array, Charset.forName("ISO-8859-1")); // Prints '???????ˉ'
System.out.println(sISO);
回答by Jake Greene
There are several problems with the provided code:
提供的代码有几个问题:
You are not ensuring that you are getting the UTF-8 byte array from that String.
byte[] theByteArray = stringToConvert.getBytes();
returns a byte array with the default encoding on the given platform, as described by the JavaDoc. What you actually want to do is the following:
byte[] theByteArray = stringToConvert.getBytes("UTF-8");
You should check the documentationfor
System.out.println()
:System.out.println(theByteArray);
is calling
System.out.println(Object x)
, which will print the results ofx.toString()
. By default, toString() returns the memory address of the given object.So when you see output of the form:
INPUT :
[B@5f1121f6
inputText
What you are seeing is the memory location of theByteArray and then the given input line of text.
You seem to not understand the 'x.toString()' method. Remember, Strings in Java are immutable; None of String's methods will alter the String.
theByteArray.toString();
returnsa string representation oftheByteArray;
. The returned value is thrown out unless you give the value to another StringString arrayAsString = theByteArray.toString();
However, as previously described, the returned String will be the memory location of
theByteArray
. In order to print out the contents oftheByteArray
, you will need to convert it to a StringString convertedString = new String(theByteArray, Charset.forName("UTF-8"));
您不能确保从该字符串中获取 UTF-8 字节数组。
byte[] theByteArray = stringToConvert.getBytes();
如JavaDoc所述,返回给定平台上具有默认编码的字节数组。您真正想要做的是以下内容:
byte[] theByteArray = stringToConvert.getBytes("UTF-8");
您应该检查该文件为
System.out.println()
:System.out.println(theByteArray);
正在调用
System.out.println(Object x)
,它将打印 的结果x.toString()
。默认情况下, toString() 返回给定对象的内存地址。因此,当您看到表单的输出时:
输入 :
[B@5f1121f6
输入文本
您看到的是 ByteArray 的内存位置,然后是给定的文本输入行。
您似乎不了解 'x.toString()' 方法。请记住,Java 中的字符串是不可变的;String 的任何方法都不会改变 String。
theByteArray.toString();
返回的字符串表示形式theByteArray;
。除非您将该值提供给另一个 String,否则返回的值将被抛出String arrayAsString = theByteArray.toString();
但是,如前所述,返回的 String 将是 的内存位置
theByteArray
。为了打印出 的内容theByteArray
,您需要将其转换为字符串String convertedString = new String(theByteArray, Charset.forName("UTF-8"));
Assumingyour requirements are to print the converted String and then print the original String, your code should look something like this:
假设您的要求是打印转换后的字符串,然后打印原始字符串,您的代码应如下所示:
public static void main(String[] args) {
Scanner h = new Scanner(System.in);
System.out.println("INPUT : ");
String stringToConvert = h.nextLine();
try {
// Array of the UTF-8 representation of the given String
byte[] theByteArray;
theByteArray = stringToConvert.getBytes("UTF-8");
// The converted String
System.out.println(new String(theByteArray, Charset.forName("UTF-8")));
} catch (UnsupportedEncodingException e) {
// We may provide an invalid character set
e.printStackTrace();
}
// The original String
System.out.println(stringToConvert);
}