Java 如何将 unicode 代码点转换为其字符表示？

Question

提问by David Michael Gang

How do I convert strings representing code points to the appropriate character?

如何将表示代码点的字符串转换为适当的字符？

For example, I want to have a function which gets U+00E4and returns ?.

例如，我想要一个获取U+00E4和返回?.

I know that in the character class I have a function toChars(int codePoint)which takes an integer but there is no function which takes a string of this type.

我知道在字符类中，我有一个函数toChars(int codePoint)接受一个整数，但没有函数接受这种类型的字符串。

Is there a built in function or do I have to do some transformation on the string to get the integer which I can send to the function?

是否有内置函数，或者我是否必须对字符串进行一些转换才能获得可以发送给函数的整数？

Answer 1

采纳答案by Anirudha

Code points are written as hexadecimal numbers prefixed by U+

代码点写为十六进制数字，前缀为 U+

So,you can do this

所以，你可以这样做

int codepoint=Integer.parseInt(yourString.substring(2),16);
char[] ch=Character.toChars(codepoint);

Answer 2

回答by Joop Eggen

"\u00E4"

new String(new int[] { 0x00E4 }, 0, 1);

Answer 3

回答by tateisu

this example does not use char[].

此示例不使用 char[]。

// this code is Kotlin, but you can write same thing in Java
val sb = StringBuilder()
val cp :Int // codepoint
when {
    Character.isBmpCodePoint(cp) -> sb.append(cp.toChar())
    Character.isValidCodePoint(cp) -> {
        sb.append(Character.highSurrogate(cp))
        sb.append(Character.lowSurrogate(cp))
    }
    else -> sb.append('?')
}

Answer 4

回答by Roovy

The easiest way I've found so far is to just cast the codepoint; if you're just expecting a single char per codepoint, then this might be fine for you.:

到目前为止，我发现的最简单的方法是直接转换代码点；如果您只是期望每个代码点有一个字符，那么这对您来说可能没问题。：

int codepoint = ...;
char c = (char)codepoint;

Answer 5

回答by Abdo Magdy

You can print them

你可以打印它们

s='\u0645\u0635\u0631\u064a'
print(s)

Answer 6

回答by Qubei

Converted from Kotlin:

从 Kotlin 转换而来：

    public String codepointToString(int cp) {
        StringBuilder sb = new StringBuilder();
        if (Character.isBmpCodePoint(cp)) {
            sb.append((char) cp);
        } else if (Character.isValidCodePoint(cp)) {
            sb.append(Character.highSurrogate(cp));
            sb.append(Character.lowSurrogate(cp));
        } else {
            sb.append('?');
        }
        return sb.toString();
    }

Answer 7

回答by skomisa

The question asked for a function to convert a string value representing a Unicode code point (i.e. "+Unnnn"rather than the Java formats of "\unnnn"or "0xnnnn). However, newer releases of Java have enhancements which simplify the processing of a string contain multiple code points in Unicode format:

该问题要求一个函数来转换表示 Unicode 代码点的字符串值（即，"+Unnnn"而不是 Java 格式的"\unnnn"或"0xnnnn）。但是，较新版本的 Java 具有增强功能，可以简化包含 Unicode 格式的多个代码点的字符串的处理：

The introduction of Streams in Java 8.
Method public static String toString?(int codePoint)which was added to the Characterclass in Java 11. It returns a Stringrather than a char[], so Character.toString(0x00E4)returns "?".

Java 8 中 Streams 的引入。
方法public static String toString?(int codePoint)将其添加到CharacterJava中11.类返回String，而不是一个char[]，这样Character.toString(0x00E4)的回报"?"。

Those enhancements allow a different approach to solving the issue raised in the OP. This method transforms a set of code points in Unicode format to a readable Stringin a single statement:

这些增强功能允许采用不同的方法来解决 OP 中提出的问题。此方法将 Unicode 格式的一组代码点转换为String单个语句中的可读代码：

void processUnicode() {

    // Create a test string containing "Hello World " with code points in Unicode format.
    // Include an invalid code point (+U0wxyz), and a code point outside the Unicode range (+U70FFFF).
    String data = "+U0048+U0065+U006c+U006c+U0wxyz+U006f+U0020+U0057+U70FFFF+U006f+U0072+U006c+U0000064+U20+U1f601";

    String text = Arrays.stream(data.split("\+U"))
            .filter(s -> ! s.isEmpty()) // First element returned by split() is a zero length string.
            .map(s -> {
                try {
                    return Integer.parseInt(s, 16);
                } catch (NumberFormatException e) { 
                    System.out.println("Ignoring element [" + s + "]: NumberFormatException from parseInt(\"" + s + "\"}");
                }
                return null; // If the code point is not represented as a valid hex String.
            })
            .filter(v -> v != null) // Ignore syntactically invalid code points.
            .filter(i -> Character.isValidCodePoint(i)) // Ignore code points outside of Unicode range.
            .map(i -> Character.toString(i)) // Obtain the string value directly from the code point. (Requires JDK >= 11 )
            .collect(Collectors.joining());

    System.out.println(text); // Prints "Hello World "
}

And this is the output:

这是输出：

run:
Ignoring element [0wxyz]: NumberFormatException from parseInt("0wxyz"}
Hello World 
BUILD SUCCESSFUL (total time: 0 seconds)

Notes:

笔记：

With this approach there is no longer any need for a specific function to convert a code point in Unicode format. That's dispersed instead, through multiple intermediate operations in the Streamprocessing. Of course the same code could still be used to process just a single code point in Unicode format.
It's easy to add intermediate operations to perform further validation and processing on the Stream, such as case conversion, removal of emoticons, etc.

使用这种方法，不再需要特定函数来转换 Unicode 格式的代码点。相反，它是通过Stream处理中的多个中间操作分散的。当然，同样的代码仍可用于处理 Unicode 格式的单个代码点。
很容易添加中间操作对进行进一步的验证和处理Stream，例如大小写转换、删除表情符号等。

Java 如何将 unicode 代码点转换为其字符表示？

提问by David Michael Gang

采纳答案by Anirudha

回答by Joop Eggen

回答by tateisu

回答by Roovy

回答by Abdo Magdy

回答by Qubei

回答by skomisa

相关推荐

最近更新

标签

Java 如何将 unicode 代码点转换为其字符表示？

提问by David Michael Gang

采纳答案by Anirudha

回答by Joop Eggen

回答by tateisu

回答by Roovy

回答by Abdo Magdy

回答by Qubei

回答by skomisa

相关推荐

Java application.properties 中的 SpringBoot 未知属性

Java 使用 Ant 将非代码资源添加到 jar 文件

NetBeans 安装程序未正确定位 Java

Java 为什么 Eclipse IDE 越来越慢？

相关推荐

最近更新

标签