Java 如何获得 Unicode 字符的代码?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2006533/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 02:24:31  来源:igfitidea点击:

How can I get a Unicode character's code?

javaunicodecharacter

提问by Geo

Let's say I have this:

假设我有这个:

char registered = '?';

or an umlaut, or whatever unicode character. How could I get its code?

umlaut, 或任何 unicode 字符。我怎么能得到它的代码?

采纳答案by Jon Skeet

Just convert it to int:

只需将其转换为int

char registered = '?';
int code = (int) registered;

In fact there's an implicit conversion from charto intso you don't have to specify it explicitly as I've done above, but I would do so in this case to make it obvious what you're trying to do.

事实上,有一个从charto的隐式转换,int所以你不必像我上面所做的那样明确指定它,但在这种情况下我会这样做,以使你想要做的事情变得明显。

This will give the UTF-16 code unit - which is the same as the Unicode code point for any character defined in the Basic Multilingual Plane. (And only BMP characters can be represented as charvalues in Java.) As Andrzej Doyle's answer says, if you want the Unicode code point from an arbitrary string, use Character.codePointAt().

这将给出 UTF-16 代码单元——它与基本多语言平面中定义的任何字符的 Unicode 代码点相同。(并且只有 BMP 字符可以表示为charJava 中的值。)正如 Andrzej Doyle 的回答所说,如果您想要来自任意字符串的 Unicode 代码点,请使用Character.codePointAt().

Once you've got the UTF-16 code unit or Unicode code points, but of which are integers, it's up to you what you do with them. If you want a string representation, you need to decide exactly what kindof representation you want. (For example, if you know the value will always be in the BMP, you might want a fixed 4-digit hex representation prefixed with U+, e.g. "U+0020"for space.) That's beyond the scope of this question though, as we don't know what the requirements are.

一旦您获得了 UTF-16 代码单元或 Unicode 代码点,但其中是整数,则取决于您如何处理它们。如果你想要一个字符串表示,你需要确切地决定你想要什么的表示。(例如,如果您知道该值将始终在 BMP 中,您可能需要以 为前缀的固定 4 位十六进制表示U+,例如"U+0020"表示空格。)但这超出了本问题的范围,因为我们不知道是什么要求是。

回答by Andrzej Doyle

A more complete, albeit more verbose, way of doing this would be to use the Character.codePointAtmethod. This will handle 'high surrogate' characters, that cannot be represented by a single integer within the range that a charcan represent.

一种更完整但更冗长的方法是使用Character.codePointAt方法。这将处理不能由 achar可以表示的范围内的单个整数表示的“高代理”字符。

In the example you've given this is not strictly necessary - if the (Unicode) character can fit inside a single (Java) char(such as the registeredlocal variable) then it must fall within the \u0000to \uffffrange, and you won't need to worry about surrogate pairs. But if you're looking at potentially higher code points, from within a String/char array, then calling this method is wise in order to cover the edge cases.

在您给出的示例中,这不是绝对必要的 - 如果(Unicode)字符可以放入单个(Java)char(例如registered局部变量)中,那么它必须在\u0000to\uffff范围内,并且您不需要担心代理对。但是,如果您从 String/char 数组中查看潜在的更高代码点,那么调用此方法是明智的,以涵盖边缘情况。

For example, instead of

例如,代替

String input = ...;
char fifthChar = input.charAt(4);
int codePoint = (int)fifthChar;

use

String input = ...;
int codePoint = Character.codePointAt(input, 4);

Not only is this slightly less code in this instance, but it will handle detection of surrogate pairs for you.

在这种情况下,这不仅代码略少,而且可以为您处理代理对的检测。

回答by Nasser Hadjloo

dear friend, Jon Skeet said you can find character Decimal codebut it is not character Hex code as it should mention in unicode, so you should represent character codes via HexCode not in Deciaml.

亲爱的朋友,Jon Skeet 说你可以找到字符十进制代码,但它不是字符十六进制代码,因为它应该在 unicode 中提到,所以你应该通过十六进制代码而不是十进制来表示字符代码。

there is an open source tool at http://unicode.codeplex.comthat provides complete information about a characer or a sentece.

http://unicode.codeplex.com上有一个开源工具,可提供有关字符或句子的完整信息。

so it is better to create a parser that give a char as a parameter and return ahexCode as string

所以最好创建一个解析器,将 char 作为参数并返回 ahexCode 作为字符串

public static String GetHexCode(char character)
    {
        return String.format("{0:X4}", GetDecimal(character));
    }//end

hope it help

希望它有帮助

回答by Felype

In Java, char is technically a "16-bit integer", so you can simply cast it to int and you'll get it's code. From Oracle:

在 Java 中,char 在技术上是一个“16 位整数”,所以你可以简单地将它转换为 int 并且你会得到它的代码。从甲骨文

The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).

char 数据类型是单个 16 位 Unicode 字符。它的最小值为 '\u0000'(或 0),最大值为 '\uffff'(或 65,535)。

So you can simply cast it to int.

所以你可以简单地将它转换为 int。

char registered = '?';
System.out.println(String.format("This is an int-code: %d", (int) registered));
System.out.println(String.format("And this is an hexa code: %x", (int) registered));

回答by Darius Miliauskas

For me, only "Integer.toHexString(registered)" worked the way I wanted:

对我来说,只有“Integer.toHexString(registered)”按我想要的方式工作:

char registered = '?';
System.out.println("Answer:"+Integer.toHexString(registered));

This answer will give you only string representations what are usually presented in the tables. Jon Skeet's answer explains more.

这个答案只会为您提供表格中通常显示的字符串表示。Jon Skeet 的回答解释了更多。

回答by Michael Gantman

There is an open source library MgntUtils that has a Utility class StringUnicodeEncoderDecoder. That class provides static methods that convert any String into Unicode sequence vise-versa. Very simple and useful. To convert String you just do:

有一个开源库 MgntUtils,它有一个实用程序类 StringUnicodeEncoderDecoder。该类提供了将任何字符串转换为 Unicode 序列的静态方法,反之亦然。非常简单和有用。要转换字符串,您只需执行以下操作:

String codes = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(myString);

For example a String "Hello World" will be converted into

例如,字符串“Hello World”将被转换为

"\u0048\u0065\u006c\u006c\u006f\u0020 \u0057\u006f\u0072\u006c\u0064"

“\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064”

It works with any language. Here is the link to the article that explains all te ditails about the library: MgntUtils. Look for the subtitle "String Unicode converter". The article gives you link to Maven Central where you can get artifacts and github where you can get the project itself. The library comes with well written javadoc and source code.

它适用于任何语言。这是解释有关该库的所有详细信息的文章的链接:MgntUtils。查找副标题“字符串 Unicode 转换器”。本文提供了指向 Maven Central 的链接,您可以在其中获取工件和 github,您可以在其中获取项目本身。该库附带编写良好的 javadoc 和源代码。