字符串编码 (UTF-8) JAVA

Question

提问by no_name22

Could anyone please help me out here. I want to know the difference in below two string formatting. I am trying to encode the string to UTF-8. which one is the correct method.

任何人都可以请帮助我。我想知道以下两种字符串格式的区别。我正在尝试将字符串编码为 UTF-8。哪一种是正确的方法。

String string2 = new String(string1.getBytes("UTF-8"), "UTF-8"));

OR

或者

String string3 = new String(string1.getBytes(),"UTF-8"));

ALSO if I use above two code together i.e.

如果我同时使用以上两个代码，即

line 1 :string1 = new String(string1.getBytes("UTF-8"), "UTF-8")); 
line 2 :string1 = new String(string1.getBytes(),"UTF-8"));

Will the value of string1 will be the same in both the lines?

两行中 string1 的值是否相同？

PS: Purpose of doing all this is to send Japanese text in web service call. So I want to send it with UTF-8 encoding.

PS：这样做的目的是在网络服务调用中发送日语文本。所以我想用UTF-8编码发送它。

Answer 1

采纳答案by Syed Aqeel Ashiq

According to the javadoc of String#getBytes(String charsetName):

根据的javadoc String#getBytes(String charsetName)：

Encodes this String into a sequence of bytes using the named charset, storing the result into a new byte array.

使用命名字符集将此 String 编码为字节序列，并将结果存储到新的字节数组中。

And the documentation of String(byte[] bytes, Charset charset)

和文档 String(byte[] bytes, Charset charset)

Constructs a new String by decoding the specified array of bytes using the specified charset.

通过使用指定的字符集解码指定的字节数组来构造一个新的 String。

Thus getBytes()is opposite operation of String(byte []). The getBytes()encodes the string to bytes, and String(byte [])will decode the byte array and convert it to string. You will have to use same charset for both methods to preserve the actual string value. I.e. your second example is wrong:

因此getBytes()是的相反操作String(byte [])。的getBytes()编码串字节，并且String(byte [])将字节数组解码并将其转换为字符串。您必须对两种方法使用相同的字符集来保留实际的字符串值。即你的第二个例子是错误的：

// This is wrong because you are calling getBytes() with default charset
// But converting those bytes to string using UTF-8 encoding. This will 
// mostly work because default encoding is usually UTF-8, but it can fail
// so it is wrong.
new String(string1.getBytes(),"UTF-8"));

Answer 2

回答by Joop Eggen

Stringand char(two-bytes UTF-16) in java is for (Unicode) text.

StringcharJava 中的和（两字节 UTF-16）用于（Unicode）文本。

When converting from and to byte[]s one needs the Charset(encoding) of those bytes.

当从和转换为byte[]s 时，需要Charset这些字节的（编码）。

Both String.getBytes()and new String(byte[])are short cuts that use the default operating system encoding. That almost always is wrong for crossplatform usages.

这两个String.getBytes()和new String(byte[])是使用默认的操作系统编码捷径。对于跨平台使用来说，这几乎总是错误的。

So use

所以用

byte[] b = s.getBytes("UTF-8");
s = new String(b, "UTF-8");

Or better, not throwing an UnsupportedCharsetException:

或者更好，不要抛出 UnsupportedCharsetException：

byte[] b = s.getBytes(StandardCharsets.UTF_8);
s = new String(b, StandardCharsets.UTF_8);

(Android does not know StandardCharsets however.)

（不过，Android 不知道 StandardCharsets。）

The same holds for InputStreamReader, OutputStreamWriterthat bridge binary data (InputStream/OutputStream) and text (Reader, Writer).

该InputStreamReader, OutputStreamWriter桥接二进制数据（InputStream/OutputStream）和文本（Reader、Writer）也是如此。

Answer 3

回答by Tom Blodget

Please don't confuse yourself. "String" is usually used to refer to values in a datatype that stores text. In this case, java.lang.String.

请不要混淆自己。“字符串”通常用于引用存储文本的数据类型中的值。在这种情况下，java.lang.String。

Serialized text is a sequence of bytes created by applying a character encoding to a string. In this case, byte[].

序列化文本是通过将字符编码应用于字符串而创建的字节序列。在这种情况下，byte[]。

There are no UTF-8-encoded strings in Java.

Java 中没有 UTF-8 编码的字符串。

If your web service client library takes a string, pass it the string. If it lets you specify an encoding to use for serialization, pass it StandardCharsets.UTF_8or equivalent.

如果您的 Web 服务客户端库采用字符串，请将字符串传递给它。如果它允许您指定用于序列化的编码，请传递它StandardCharsets.UTF_8或等效的。

If it doesn't take a string, thenpass it string1.GetBytes(StandardCharsets.UTF_8)and use whatever other mechanism it provides to let you tell the recipient that the bytes are UTF-8-encoded text. Or, get a different client library.

如果它不接受字符串，则传递它string1.GetBytes(StandardCharsets.UTF_8)并使用它提供的任何其他机制来告诉接收者这些字节是 UTF-8 编码的文本。或者，获取不同的客户端库。

字符串编码 (UTF-8) JAVA

提问by no_name22

采纳答案by Syed Aqeel Ashiq

回答by Joop Eggen

回答by Tom Blodget

相关推荐

最近更新

标签

字符串编码 (UTF-8) JAVA

提问by no_name22

采纳答案by Syed Aqeel Ashiq

回答by Joop Eggen

回答by Tom Blodget

相关推荐

Java 不推荐使用休眠@NotEmpty

切换到 Java 9 时 Spring Boot 的 javax.xml.bind.JAXBException 的 ClassNotFoundException

Java 将双精度舍入到小数点后 1 位 kotlin：从 0.044999 到 0.1

Java Flutter 应用体积过大

相关推荐

最近更新

标签