Java 中的 Unicode 转义语法

Question

提问by user3265048

In Java, I learned that the following syntax can be used for mentioning Unicode characters that are not on the keyboard (eg. non-ASCII characters):

在 Java 中，我了解到以下语法可用于提及不在键盘上的 Unicode 字符（例如非 ASCII 字符）：

(\u)(u)*(HexDigit)(HexDigit)(HexDigit)(HexDigit)

My question is: What is the purpose of (u)* in the above syntax?

我的问题是：上述语法中 (u)* 的目的是什么？

One use case that I understood which represents Yen symbol in Java is:

我理解的在 Java 中代表日元符号的一个用例是：

char ch = '\u00A5';

Answer 1

采纳答案by Aaron Digulla

Interesting question. Section 3.3 of the JSL says:

有趣的问题。JSL 的第 3.3 节说：

UnicodeEscape:
    \ UnicodeMarker HexDigit HexDigit HexDigit HexDigit

UnicodeMarker:
    u
    UnicodeMarker u

which translates to \\u+\p{XDigit}{4}

这意味着 \\u+\p{XDigit}{4}

and

和

If an eligible \ is followed by u, or more than one u, and the last u is not followed by four hexadecimal digits, then a compile-time error occurs.

如果一个符合条件的 \ 后面跟有 u 或多个 u，并且最后一个 u 后面没有跟四个十六进制数字，那么就会发生编译时错误。

So you're right, there can be one or more uafter the backslash. The reason is given further down:

所以你是对的，u反斜杠后面可以有一个或多个。原因进一步如下：

The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. The transformation involves converting any Unicode escapes in the source text of the program to ASCII by adding an extra u - for example, \uxxxx becomes \uuxxxx - while simultaneously converting non-ASCII characters in the source text to Unicode escapes containing a single u each.
This transformed version is equally acceptable to a Java compiler and represents the exact same program. The exact Unicode source can later be restored from this ASCII form by converting each escape sequence where multiple u's are present to a sequence of Unicode characters with one fewer u, while simultaneously converting each escape sequence with a single u to the corresponding single Unicode character.

Java 编程语言指定了将用 Unicode 编写的程序转换为 ASCII 的标准方法，该方法将程序更改为可由基于 ASCII 的工具处理的形式。转换涉及通过添加额外的 u 将程序源文本中的任何 Unicode 转义符转换为 ASCII - 例如，\uxxxx 变为 \uuxxxx - 同时将源文本中的非 ASCII 字符转换为每个包含一个 u 的 Unicode 转义符.
这个转换后的版本同样可以被 Java 编译器接受，并且代表完全相同的程序。稍后可以通过将每个存在多个 u 的转义序列转换为一个少一个 u 的 Unicode 字符序列，同时将每个带有单个 u 的转义序列转换为相应的单个 Unicode 字符，从而从这种 ASCII 形式恢复确切的 Unicode 源。

So this input

所以这个输入

 \u0020?

becomes

变成

 \uu0020\u00e4

The first uumeans here "this was a unicode escape sequence to begin with" while the second usays "An automatic tool converted a non-ASCII character to a unicode escape."

第一个uu意思是“这是一个 unicode 转义序列”，而第二个意思u是“一个自动工具将非 ASCII 字符转换为 unicode 转义字符。”

This information is useful when you want to convert back from ASCII to unicode: You can restore as much of the original code as possible.

当您想从 ASCII 转换回 unicode 时，此信息很有用：您可以尽可能多地恢复原始代码。

Answer 2

回答by assylias

It means you can add as many uas you want - for example these lines are equivalent:

这意味着您可以添加任意数量的u- 例如这些行是等效的：

char ch = '\u00A5';
char ch = '\uuuuu00A5';
char ch = '\uuuuuuuuuuuuuuuuuu00A5';

(and all compile)

（并且全部编译）

Java 中的 Unicode 转义语法

提问by user3265048

采纳答案by Aaron Digulla

回答by assylias

相关推荐

最近更新

标签

Java 中的 Unicode 转义语法

提问by user3265048

采纳答案by Aaron Digulla

回答by assylias

相关推荐

如何优雅地处理 Java 中的 SIGKILL 信号

Java 如何使我的类可迭代，以便我可以使用 foreach 语法？

Java 如何在 JSTL/JSP 中的循环中连接字符串？

Java 两个四元数旋转的点积

相关推荐

最近更新

标签