字符串编码 (UTF-8) JAVA

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49536891/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 03:03:18  来源:igfitidea点击:

String encoding (UTF-8) JAVA

javastringencodingutf-8character-encoding

提问by no_name22

Could anyone please help me out here. I want to know the difference in below two string formatting. I am trying to encode the string to UTF-8. which one is the correct method.

任何人都可以请帮助我。我想知道以下两种字符串格式的区别。我正在尝试将字符串编码为 UTF-8。哪一种是正确的方法。

String string2 = new String(string1.getBytes("UTF-8"), "UTF-8")); 

OR

或者

String string3 = new String(string1.getBytes(),"UTF-8"));

ALSO if I use above two code together i.e.

如果我同时使用以上两个代码,即

line 1 :string1 = new String(string1.getBytes("UTF-8"), "UTF-8")); 
line 2 :string1 = new String(string1.getBytes(),"UTF-8")); 

Will the value of string1 will be the same in both the lines?

两行中 string1 的值是否相同?

PS: Purpose of doing all this is to send Japanese text in web service call. So I want to send it with UTF-8 encoding.

PS:这样做的目的是在网络服务调用中发送日语文本。所以我想用UTF-8编码发送它。

采纳答案by Syed Aqeel Ashiq

According to the javadoc of String#getBytes(String charsetName):

根据的javadoc String#getBytes(String charsetName)

Encodes this String into a sequence of bytes using the named charset, storing the result into a new byte array.

使用命名字符集将此 String 编码为字节序列,并将结果存储到新的字节数组中。

And the documentation of String(byte[] bytes, Charset charset)

和文档 String(byte[] bytes, Charset charset)

Constructs a new String by decoding the specified array of bytes using the specified charset.

通过使用指定的字符集解码指定的字节数组来构造一个新的 String。

Thus getBytes()is opposite operation of String(byte []). The getBytes()encodes the string to bytes, and String(byte [])will decode the byte array and convert it to string. You will have to use same charset for both methods to preserve the actual string value. I.e. your second example is wrong:

因此getBytes()是 的相反操作String(byte [])。的getBytes()编码串字节,并且String(byte [])将字节数组解码并将其转换为字符串。您必须对两种方法使用相同的字符集来保留实际的字符串值。即你的第二个例子是错误的:

// This is wrong because you are calling getBytes() with default charset
// But converting those bytes to string using UTF-8 encoding. This will 
// mostly work because default encoding is usually UTF-8, but it can fail
// so it is wrong.
new String(string1.getBytes(),"UTF-8")); 

回答by Joop Eggen

Stringand char(two-bytes UTF-16) in java is for (Unicode) text.

StringcharJava 中的和(两字节 UTF-16)用于(Unicode)文本。

When converting from and to byte[]s one needs the Charset(encoding) of those bytes.

当从和转换为byte[]s 时,需要Charset这些字节的(编码)。

Both String.getBytes()and new String(byte[])are short cuts that use the default operating system encoding. That almost always is wrong for crossplatform usages.

这两个String.getBytes()new String(byte[])是使用默认的操作系统编码捷径。对于跨平台使用来说,这几乎总是错误的。

So use

所以用

byte[] b = s.getBytes("UTF-8");
s = new String(b, "UTF-8");

Or better, not throwing an UnsupportedCharsetException:

或者更好,不要抛出 UnsupportedCharsetException:

byte[] b = s.getBytes(StandardCharsets.UTF_8);
s = new String(b, StandardCharsets.UTF_8);

(Android does not know StandardCharsets however.)

(不过,Android 不知道 StandardCharsets。)

The same holds for InputStreamReader, OutputStreamWriterthat bridge binary data (InputStream/OutputStream) and text (Reader, Writer).

InputStreamReader, OutputStreamWriter桥接二进制数据(InputStream/OutputStream)和文本(Reader、Writer)也是如此。

回答by Tom Blodget

Please don't confuse yourself. "String" is usually used to refer to values in a datatype that stores text. In this case, java.lang.String.

请不要混淆自己。“字符串”通常用于引用存储文本的数据类型中的值。在这种情况下,java.lang.String

Serialized text is a sequence of bytes created by applying a character encoding to a string. In this case, byte[].

序列化文本是通过将字符编码应用于字符串而创建的字节序列。在这种情况下,byte[]

There are no UTF-8-encoded strings in Java.

Java 中没有 UTF-8 编码的字符串。

If your web service client library takes a string, pass it the string. If it lets you specify an encoding to use for serialization, pass it StandardCharsets.UTF_8or equivalent.

如果您的 Web 服务客户端库采用字符串,请将字符串传递给它。如果它允许您指定用于序列化的编码,请传递它StandardCharsets.UTF_8或等效的。

If it doesn't take a string, thenpass it string1.GetBytes(StandardCharsets.UTF_8)and use whatever other mechanism it provides to let you tell the recipient that the bytes are UTF-8-encoded text. Or, get a different client library.

如果它不接受字符串,传递它string1.GetBytes(StandardCharsets.UTF_8)并使用它提供的任何其他机制来告诉接收者这些字节是 UTF-8 编码的文本。或者,获取不同的客户端库。