java 为什么这段代码向后写,打印“Hello World!”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43943699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 07:50:22  来源:igfitidea点击:

Why does this code, written backwards, print "Hello World!"

javaunicoderight-to-left

提问by Imaginary Pumpkin

Here is some code that I found on the Internet:

下面是我在网上找到的一些代码:

class M?{public static void main(String[]a?){System.out.print(new char[]
{'H','e','l','l','o',' ','W','o','r','l','d','!'});}}    

This code prints Hello World!onto the screen; you can see it run here. I can clearly see public static void mainwritten, but it is backwards. How does this code work? How does this even compile?

此代码打印Hello World!到屏幕上;你可以看到它在这里运行。我可以清楚地看到public static void main写的,但它是倒退的。这段代码是如何工作的?这甚至如何编译?

Edit:I tried this code in IntellIJ, and it works fine. However, for some reason it doesn't work in notepad++, along with cmd. I still haven't found a solution to that, so if anyone does, comment down below.

编辑:我在 IntellIJ 中尝试了这段代码,它工作正常。但是,由于某种原因,它在 notepad++ 和 cmd 中都不起作用。我还没有找到解决方案,所以如果有人找到了,请在下面评论。

采纳答案by Davis Broda

There are invisible characters here that alter how the code is displayed. In Intellij these can be found by copy-pasting the code into an empty string (""), which replaces them with Unicode escapes, removing their effects and revealing the order the compiler sees.

这里有一些不可见的字符会改变代码的显示方式。在 Intellij 中,可以通过将代码复制粘贴到一个空字符串 ( "") 中来找到它们,该字符串将它们替换为 Unicode 转义符,消除它们的影响并显示编译器看到的顺序。

Here is the output of that copy-paste:

这是复制粘贴的输出:

"class M\u202E{public static void main(String[]a\u202D){System.out.print(new char[]\n"+
        "{'H','e','l','l','o',' ','W','o','r','l','d','!'});}}   "

The source code characters are stored in this order, and the compiler treats them as being in this order, but they're displayed differently.

源代码字符按此顺​​序存储,编译器将它们视为按此顺序排列,但它们的显示方式不同。

Note the \u202Echaracter, which is a right-to-left override, starting a block where all characters are forced to be displayed right-to-left, and the \u202D, which is a left-to-right override, starting a nested block where all characters are forced into left-to-right order, overriding the first override.

注意\u202E字符,它是从右到左覆盖,开始一个块,所有字符都被强制从右到左显示,而\u202D,它是一个从左到右覆盖,开始一个嵌套块,其中所有字符都被强制显示字符被强制为从左到右的顺序,覆盖第一个覆盖。

Ergo, when it displays the original code, class Mis displayed normally, but the \u202Ereverses the display order of everything from there to the \u202D, which reverses everything again. (Formally, everything from the \u202Dto the line terminator gets reversed twice, once due to the \u202Dand once with the rest of the text reversed due to the \u202E, which is why this text shows up in the middle of the line instead of the end.) The next line's directionality is handled independently of the first's due to the line terminator, so {'H','e','l','l','o',' ','W','o','r','l','d','!'});}}is displayed normally.

因此,当它显示原始代码时,class M显示正常,但是将\u202E所有内容的显示顺序从那里\u202D颠倒到,这又将所有内容颠倒了。(正式地,从 the\u202D到行终止符的所有内容都被颠倒了两次,一次是由于 ,一次是由于\u202D其余文本被颠倒了\u202E,这就是为什么此文本显示在行的中间而不是末尾。)由于行终止符,下一行的方向性独立于第一行处理,因此{'H','e','l','l','o',' ','W','o','r','l','d','!'});}}正常显示。

For the full (extremely complex, dozens of pages long) Unicode bidirectional algorithm, see Unicode Standard Annex #9.

有关完整(极其复杂,长达数十页)的 Unicode 双向算法,请参阅Unicode 标准附件 #9

回答by James Lawson

It looks different because of the Unicode Bidirectional Algorithm. There are two invisible characters of RLO and LRO that the Unicode Bidirectional Algorithm uses to change the visual appearanceof the characters nested between these two metacharacters.

由于Unicode Bidirectional Algorithm ,它看起来不同。Unicode 双向算法使用 RLO 和 LRO 两个不可见字符来改变嵌套在这两个元字符之间的字符的视觉外观

The result is that visuallythey look in reverse order, but the actual characters in memoryare not reversed. You can analyse the results here. The Java compiler will ignore RLO and LRO, and treat them as whitespace which is why the code compiles.

结果是在视觉上它们看起来顺序相反,但内存中的实际字符并没有颠倒。您可以在此处分析结果。Java 编译器将忽略 RLO 和 LRO,并将它们视为空格,这就是代码编译的原因。

Note 1: This algorithm is used by text editors and browsers to visually display characters both LTR characters (English) and RTL characters (e.g. Arabic, Hebrew) together at the same time - hence "bi"-directional. You can read more about the Bidirectional Algorithm at Unicode's website.
Note 2: The exact behaviour of LRO and RLO is defined in Section 2.2of the Algorithm.

注 1:文本编辑器和浏览器使用此算法同时可视化显示 LTR 字符(英语)和 RTL 字符(例如阿拉伯语、希伯来语)的字符 - 因此是“双向”的。您可以在 Unicode 的网站上阅读有关双向算法的更多信息。
注 2:LRO 和 RLO 的确切行为在算法的第 2.2 节中定义。

回答by Damián Rafael Lattenero

The Character U+202Emirrors the code from right to left, it is very clever though. Is hidden starting in the M,

字符U+202E从右到左镜像代码,但它非常聪明。隐藏在 M 开始,

"class M\u202E{..."

How did I found the magicbehind this?

我是如何发现这背后的魔力的

Well, at first when I saw the question I tough, "it's a kind of joke, to lose somebody else time", but then, I opened my IDE ("IntelliJ"), create a class, and past the code... and it compiled!!! So, I took a better look and saw that the "public static void" was backward, so I went there with the cursor, and erase a few chars... And what happens? The chars started erasing backward, so, I thought mmm.... rare... I have to execute it... So I proceed to execute the program, but first I needed to save it... and that was when I found it!. I couldn't save the file because my IDE said that there was a different encoding for some char, and point me where was it, So I start a research in Google for special chars that could do the job, and that's it :)

好吧,起初当我看到这个问题时,我很难过,“这是一种笑话,浪费别人的时间”,但是后来,我打开了我的 IDE(“IntelliJ”),创建了一个类,然后通过了代码......它编译了!!!所以,我仔细看了一下,看到“public static void”是向后的,所以我带着光标去了那里,并删除了一些字符......然后会发生什么?字符开始向后擦除,所以,我认为嗯......很少......我必须执行它......所以我继续执行程序,但首先我需要保存它......那是我的时候找到了!. 我无法保存文件,因为我的 IDE 说某些字符的编码不同,并指出它在哪里,所以我开始在谷歌研究可以完成这项工作的特殊字符,就是这样:)

A little about

一点关于

the Unicode Bidirectional Algorithm, and U+202Einvolved, a briefly explain:

Unicode双向算法,U+202E涉及到,简单解释一下

The Unicode Standard prescribes a memory representation order known as logical order. When text is presented in horizontal lines, most scripts display characters from left to right. However, there are several scripts (such as Arabic or Hebrew) where the natural ordering of horizontal text in display is from right to left. If all of the text has a uniform horizontal direction, then the ordering of the display text is unambiguous.

However, because these right-to-left scripts use digits that are written from left to right, the text is actually bi-directional: a mixture of right-to-left and left-to-right text. In addition to digits, embedded words from English and other scripts are also written from left to right, also producing bidirectional text. Without a clear specification, ambiguities can arise in determining the ordering of the displayed characters when the horizontal direction of the text is not uniform.

This annex describes the algorithm used to determine the directionality for bidirectional Unicode text. The algorithm extends the implicit model currently employed by a number of existing implementations and adds explicit formatting characters for special circumstances. In most cases, there is no need to include additional information with the text to obtain correct display ordering.

However, in the case of bidirectional text, there are circumstances where an implicit bidirectional ordering is not sufficient to produce comprehensible text. To deal with these cases, a minimal set of directional formatting characters is defined to control the ordering of characters when rendered. This allows exact control of the display ordering for legible interchange and ensures that plain text used for simple items like filenames or labels can always be correctly ordered for display.

Unicode 标准规定了一种称为逻辑顺序的内存表示顺序。当文本以水平线显示时,大多数脚本从左到右显示字符。但是,有几种文字(例如阿拉伯语或希伯来语)显示的水平文本的自然顺序是从右到左。如果所有文本具有统一的水平方向,则显示文本的顺序是明确的。

但是,由于这些从右到左的脚本使用从左到右书写的数字,因此文本实际上是双向的:从右到左和从左到右文本的混合。除了数字之外,来自英语和其他脚本的嵌入词也是从左到右书写的,也产生双向文本。如果没有明确的规范,当文本的水平方向不一致时,在确定显示字符的顺序时会出现歧义。

本附件描述了用于确定双向 Unicode 文本方向性的算法。该算法扩展了许多现有实现当前采用的隐式模型,并为特殊情况添加了显式格式化字符。在大多数情况下,不需要在文本中包含附加信息以获得正确的显示顺序。

但是,在双向文本的情况下,在某些情况下,隐式双向排序不足以生成可理解的文本。为了处理这些情况,定义了一组最小的方向格式字符来控制呈现时字符的顺序。这允许精确控制显示顺序以实现清晰的交换,并确保用于简单项目(如文件名或标签)的纯文本始终可以正确排序以进行显示。

Why create some algorithm like this?

为什么要建立一些算法,像这样

the bidi algorithm can render a sequence of Arabic or Hebrew characters one after the other from right to left.

bidi 算法可以从右到左一个接一个地呈现一系列阿拉伯语或希伯来语字符。

回答by M Anouti

Chapter 3 of the language specificationprovides an explanation by describing in detail how the lexical translation is done for a Java program. What matters most for the question:

语言规范的第 3 章通过详细描述如何为 Java 程序完成词法转换来提供解释。这个问题最重要的是:

Programs are written in Unicode (§3.1), but lexical translations are provided (§3.2) so that Unicode escapes (§3.3) can be used to include any Unicode character using only ASCII characters.

程序是用 Unicode (§3.1) 编写的,但提供了词法翻译(§3.2),因此可以使用 Unicode 转义 (§3.3) 来包含任何仅使用 ASCII 字符的 Unicode 字符。

So a program is written in Unicode characters, and the author can escape them using \uxxxxin case the file encoding does not support the Unicode character, in which case it is translated to the appropriate character. One of the Unicode characters present in this case is \u202E. It is not visually shown in the snippet, but if you try switching the encoding of the browser, the hidden characters may appear.

所以一个程序是用Unicode字符编写的,\uxxxx如果文件编码不支持Unicode字符,作者可以使用它们进行转义,在这种情况下,它被翻译成适当的字符。在这种情况下出现的 Unicode 字符之一是\u202E. 它没有在代码段中直观地显示出来,但是如果您尝试切换浏览器的编码,则可能会出现隐藏字符。

Therefore, the lexical translation results in the class declaration:

因此,词法翻译导致类声明:

class M\u202E{

which means that the class identifier is M\u202E. The specificationconsiders this as a valid identifer:

这意味着类标识符是M\u202E. 该规范认为这是有效的IDENTIFER:

Identifier:
    IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral
IdentifierChars:
    JavaLetter {JavaLetterOrDigit}

A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int)returns true.

“Java 字母或数字”是方法Character.isJavaIdentifierPart(int)返回 true的字符。