如何在 Java 中将 UTF-8 转换为 unicode?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18606523/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 09:27:50  来源:igfitidea点击:

How to convert UTF-8 to unicode in Java?

javaunicodeutf-8

提问by XWang

For example, in Emoji Char set, U+1F601is the unicode value for "GRINNING FACE WITH SMILING EYES", and \xF0\x9F\x98\x81is the UTF-8 bytes value for this character.

例如,在 Emoji 字符集中,U+1F601是“GRINNING FACE WITH SMILING EYES”的 unicode 值,是该字符\xF0\x9F\x98\x81的 UTF-8 字节值。

\xE2\x9D\xA4is for heavy black heart, and the unicode is U+2764.

\xE2\x9D\xA4是重黑心,unicode 是U+2764.

So my question is, if I have a byte array with value (0xF0, 0x9F, 0x98, 0x81, 0xE2, 0x9D, 0xA4), then how I can convert it into Unicode value?

所以我的问题是,如果我有一个带有 value 的字节数组(0xF0, 0x9F, 0x98, 0x81, 0xE2, 0x9D, 0xA4),那么如何将其转换为 Unicode 值?

For the above result, what I want is a String array with value "1F601"and "2764".

对于上面的结果,我想要的是一个带有值"1F601"和的字符串数组"2764"

I know I can write a complex method to do this work, but I hope there is already a library to do this work.

我知道我可以编写一个复杂的方法来完成这项工作,但我希望已经有一个库来完成这项工作。

回答by Jon Skeet

So my question is, if I have a byte array with value (0xF0, 0x9F, 0x98, 0x81), then how I can convert it into Unicode value?

所以我的问题是,如果我有一个值为 (0xF0, 0x9F, 0x98, 0x81) 的字节数组,那么我如何将其转换为 Unicode 值?

Simply call the Stringconstructor specifying the data and the encoding:

只需调用String指定数据和编码的构造函数:

String text = new String(bytes, "UTF-8");

You can specify a Charsetinstead of the name of the encoding - I like Guava's simple Charsetsclass, which allows you to write:

您可以指定一个Charset而不是编码的名称 - 我喜欢Guava的简单Charsets类,它允许您编写:

String text = new String(bytes, Charsets.UTF_8);

Or for Java 7, use StandardCharsetswithout even needing Guava:

或者对于 Java 7,StandardCharsets甚至不需要 Guava 就可以使用:

String text = new String(bytes, StandardCharsets.UTF_8);

回答by Ashwani

Simply use Stringclass:

只需使用String类:

byte[] bytesArray = new byte[10]; // array of bytes (0xF0, 0x9F, 0x98, 0x81)

String string = new String(bytesArray, Charset.forName("UTF-8")); // covert byteArray

System.out.println(string); // Test result

回答by Mr.Green

Here is an example using InputStreamReader:

下面是一个使用 InputStreamReader 的例子:

InputStream inputStream = new FileInputStream("utf-8-text.txt");
Reader      reader      = new InputStreamReader(inputStream,
                                                Charset.forName("UTF-8"));

int data = reader.read();
while(data != -1){
    char theChar = (char) data;
    data = reader.read();
}

reader.close();

Ref:Java I18N example

参考:Java I18N 示例

回答by che.moor

Here is a function to convert UNICODE (ISO_8859_1) to UTF-8

这是一个将 UNICODE (ISO_8859_1) 转换为 UTF-8 的函数

public static String String_ISO_8859_1To_UTF_8(String strISO_8859_1) {
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < strISO_8859_1.length(); i++) {
  final char ch = strISO_8859_1.charAt(i);
  if (ch <= 127) 
  {
      stringBuilder.append(ch);
  }
  else 
  {
      stringBuilder.append(String.format("%02x", (int)ch));
  }
}
String s = stringBuilder.toString();
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
    data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
                         + Character.digit(s.charAt(i+1), 16));
}
String strUTF_8 =new String(data, StandardCharsets.UTF_8);
return strUTF_8;
}

TEST

测试

String strA_ISO_8859_1_i = new String("??????".getBytes(StandardCharsets.UTF_8), StandardCharsets.ISO_8859_1);

System.out.println("ISO_8859_1 strA est = "+ strA_ISO_8859_1_i + "\n String_ISO_8859_1To_UTF_8 = " + String_ISO_8859_1To_UTF_8(strA_ISO_8859_1_i));

RESULT

结果

ISO_8859_1 strA est = ?§ù?où?§ù String_ISO_8859_1To_UTF_8 = ??????

ISO_8859_1 strA est = ?§ù?où?§ù String_ISO_8859_1To_UTF_8 = ??????