java Java中Unicode到字符串的转换
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1934842/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Unicode to string conversion in Java
提问by ferronrsmith
I am building a language, a toy language. The syntax \#0061is supposed to convert the given Unicode to an character:
我正在构建一种语言,一种玩具语言。语法\#0061应该将给定的 Unicode 转换为字符:
String temp = yytext().subtring(2);
Then after that try to append '\u'to the string, I noticed that generated an error.
然后在尝试附加'\u'到字符串之后,我注意到产生了一个错误。
I also tried to "\\" + "u" + temp;this way does not do any conversion.
我也试过"\\" + "u" + temp;这种方式不做任何转换。
I am basically trying to convert Unicode to a character by supplying only '0061'to a method, help.
我基本上是在尝试通过仅'0061'提供一种方法帮助来将 Unicode 转换为字符。
回答by Stephen C
Strip the '#' and use Integer.parseInt("0061", 16)to convert the hex digits to an int. Then cast to a char.
去除“#”并用于Integer.parseInt("0061", 16)将十六进制数字转换为int. 然后投射到char.
(If you had implemented the lexer by hand, an alternatively would be to do the conversion on the fly as your lexer matches the unicode literal. But on rereading the question, I see that you are using a lexer generator ... good move!)
(如果您手动实现了词法分析器,另一种方法是在您的词法分析器与 unicode 文字匹配时即时进行转换。但在重新阅读问题时,我发现您正在使用词法分析器生成器......好举措! )
回答by BalusC
You need to convert the particular codepoint to a char. You can do that with a little help of regex:
您需要将特定的代码点转换为char. 你可以在正则表达式的帮助下做到这一点:
String string = "blah #0061 blah";
Matcher matcher = Pattern.compile("\#((?i)[0-9a-f]{4})").matcher(string);
while (matcher.find()) {
int codepoint = Integer.valueOf(matcher.group(1), 16);
string = string.replaceAll(matcher.group(0), String.valueOf((char) codepoint));
}
System.out.println(string); // blah a blah
Editas per the comments, if it is a single token, then just do:
根据评论进行编辑,如果它是单个令牌,则只需执行以下操作:
String string = "0061";
char c = (char) Integer.parseInt(string, 16);
System.out.println(c); // a
回答by danben
i am basically trying to convert unicode to a character by supplying only '0061' to a method, help.
我基本上是在尝试通过仅向方法提供“0061”来将 unicode 转换为字符,帮助。
char fromUnicode(String codePoint) {
return (char) Integer.parseInt(codePoint, 16);
}
You need to handle bad inputs and such, but that will work otherwise.
您需要处理错误的输入等,但否则会起作用。
回答by Kevin Montrose
\uXXXXis an escape sequence. Before execution it has already been converted into the actual character value, its not "evaluated" in anyway at runtime.
\uXXXX是一个转义序列。在执行之前它已经被转换为实际的字符值,它在运行时无论如何都没有被“评估”。
What you probably want to do is define a mapping from your #XXXXsyntax to Unicode code points and cast them to char.
您可能想要做的是定义从您的#XXXX语法到 Unicode 代码点的映射并将它们转换为char.

