如何在 Java 中获取 unicode 字符的十进制值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6766416/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 17:10:54  来源:igfitidea点击:

How do I get the decimal value of a unicode character in Java?

javaunicode

提问by Mike Sickler

I need a programmatic way to get the decimal value of each character in a String, so that I can encode them as HTML entities, for example:

我需要一种编程方式来获取字符串中每个字符的十进制值,以便我可以将它们编码为 HTML 实体,例如:

UTF-8:

UTF-8:

著者名

Decimal:

十进制:

著者名

回答by Jon Skeet

I suspect you're just interested in a conversion from charto int, which is implicit:

我怀疑您只是对从charto的转换感兴趣int,这是隐式的:

for (int i = 0; i < text.length(); i++)
{
    char c = text.charAt(i);
    int value = c;
    System.out.println(value);
}

EDIT: If you want to handle surrogate pairs, you can use something like:

编辑:如果你想处理代理对,你可以使用类似的东西:

for (int i = 0; i < text.length(); i++)
{
    int codePoint = text.codePointAt(i);
    // Skip over the second char in a surrogate pair
    if (codePoint > 0xffff)
    {
        i++;
    }
    System.out.println(codePoint);
}

回答by Voo

Ok after reading Jon's post and still musing about surrogates in Java, I decided to be a bit less lazy and google it up. There's actually support for surrogates in the Character class it's just a bit.. roundabout

好吧,在阅读了 Jon 的帖子并仍在思考 Java 中的代理后,我决定不那么懒惰并用谷歌搜索它。实际上在 Character 类中支持代理,只是有点……迂回

So here's the code that'll work correctly, assuming valid input:

所以这里的代码可以正常工作,假设输入有效:

    for (int i = 0; i < str.length(); i++) {
        char ch = str.charAt(i);
        if (Character.isHighSurrogate(ch)) {
            System.out.println("Codepoint: " + 
                   Character.toCodePoint(ch, str.charAt(i + 1)));
            i++;
        }
        System.out.println("Codepoint: " + (int)ch);
    }