Javascript 为什么 2+?40 等于 42?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31507143/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 06:46:29  来源:igfitidea点击:

Why does 2+?40 equal 42?

javascriptunicode

提问by GOTO 0

I was baffled when a colleague showed me this line of JavaScript alerting 42.

当一位同事向我展示这行 JavaScript 警报 42 时,我感到很困惑。

alert(2+?40);

It quickly turns out that what looks like a minus sign is actually an arcane Unicode character with clearly different semantics.

很快就会发现,看起来像减号的实际上是一个具有明显不同语义的神秘 Unicode 字符。

This left me wondering why that character doesn't produce a syntax error when the expression is parsed. I'd also like to know if there are more characters behaving like this.

这让我想知道为什么在解析表达式时该字符不会产生语法错误。我也想知道是否有更多的角色有这样的行为。

回答by Felix Kling

That character is "OGHAM SPACE MARK", which is a space character. So the code is equivalent to alert(2+ 40).

该字符是"OGHAM SPACE MARK",它是一个空格字符。所以代码等价于alert(2+ 40).

I'd also like to know if there are more characters behaving like this.

我也想知道是否有更多的角色有这样的行为。

Any Unicode character in the Zs class is a white space character in JavaScript, but there don't seem to be that many.

Zs 类中的任何 Unicode 字符在JavaScript 中都是空白字符但似乎没有那么多.

However, JavaScript also allows Unicode characters in identifiers, which lets you use interesting variable names like ?_?.

但是,JavaScript 还允许在标识符中使用Unicode 字符,这使您可以使用有趣的变量名称,例如?_?.

回答by GOTO 0

After reading the other answers, I wrote a simple script to find all Unicode characters in the range U+0000–U+FFFF that behave like white spaces. As it seems, there are 26 or 27 of them depending on the browser, with disagreements about U+0085 and U+FFFE.

阅读其他答案后,我编写了一个简单的脚本来查找 U+0000–U+FFFF 范围内的所有 Unicode 字符,这些字符的行为类似于空格。看起来,其中有 26 或 27 个,具体取决于浏览器,关于 U+0085 和 U+FFFE 存在分歧。

Note that most of these characters just look like a regular white space.

请注意,大多数这些字符看起来就像一个普通的空白。

function isSpace(ch)
{
    try
    {
        return Function('return 2 +' + ch + ' 2')() === 4;
    }
    catch(e)
    {
        return false;
    }
}

for (var i = 0; i <= 0xffff; ++i)
{
    var ch = String.fromCharCode(i);
    if (isSpace(ch))
    {
        document.body.appendChild(document.createElement('DIV')).textContent = 'U+' + ('000' + i.toString(16).toUpperCase()).slice(-4) + '    "' + ch + '"';
    }
}
div { font-family: monospace; }

回答by michaelpri

It appears that the character that you are using is actually longer than what the actual minus sign (a hyphen) is.

看起来您使用的字符实际上比实际的减号(连字符)长。

?
-

The top is what you are using, the bottom is what the minus sign should be. You do seem to know that already, so now let's see why Javascript does this.

顶部是你正在使用的,底部是减号应该是什么。您似乎已经知道这一点,所以现在让我们看看为什么 Javascript 会这样做。

The character that you use is actually the ogham space markwhich is a whitespace character, so it is basically interpreted as the same thing as a space, which means that your statement looks like alert(2+ 40)to Javascript.

您使用的字符实际上是ogham 空格标记,它是一个空白字符,因此它基本上被解释为与空格相同的东西,这意味着您的语句看起来像alert(2+ 40)Javascript。

There are other characters like this in Javascript. You can see a full list here on Wikipedia.

Javascript 中还有其他类似的字符。您可以在 Wikipedia 上查看完整列表。



Something interesting I noticed about this character is the way that Google Chrome (and possible other browsers) interprets it in the top bar of the page.

我注意到这个角色的有趣之处在于谷歌浏览器(以及可能的其他浏览器)在页面顶部栏中解释它的方式。

enter image description here

在此处输入图片说明

It is a block with 1680inside of it. That is actually the unicode number for the ogham space mark. It appears to be just my machine doing this, but it is a strange thing.

它是一个1680内部的块。这实际上是 ogham 空格标记的 unicode 编号。似乎只是我的机器这样做,但这是一件奇怪的事情。



I decided to try this out in other languages to see what happens and these are the results that I got.

我决定用其他语言试试这个,看看会发生什么,这些是我得到的结果。



Languages it doesn't work in:

无法使用的语言:

Python 2 & 3

蟒蛇 2 & 3

>> 2+?40
  File "<stdin>", line 1
    2+?40
        ^
SyntaxError: invalid character in identifier

Ruby

红宝石

>> 2+?40
NameError: undefined local variable or method `?40' for main:Object
    from (irb):1
    from /home/michaelpri/.rbenv/versions/2.2.2/bin/irb:11:in `<main>'

Java(inside the mainmethod)

Javamain方法内部)

>> System.out.println(2+?40);
Main.java:3: error: illegal character: 60
            System.out.println(2+?40);
                                 ^
Main.java:3: error: ';' expected
            System.out.println(2+?40);
                                  ^
Main.java:3: error: illegal start of expression
            System.out.println(2+?40);
                                    ^
3 errors

PHP

PHP

>> 2+?40;
Use of undefined constant ?40 - assumed '?40' :1

C

C

>> 2+?40
main.c:1:1: error: expected identifier or '(' before numeric constant
 2+?40
 ^
main.c:1:1: error: stray '1' in program
main.c:1:1: error: stray '2' in program
main.c:1:1: error: stray '0' in program

exit status 1

Go

>> 2+?40
can't load package: package .: 
main.go:1:1: expected 'package', found 'INT' 2
main.go:1:3: illegal character U+1680

exit status 1

Perl 5

Perl 5

>> perl -e'2+?40'                                                                                                                                   
Unrecognized character \xE1; marked by <-- HERE after 2+<-- HERE near column 3 at -e line 1.


Languages it does work in:

它适用的语言:

Scheme

方案

>> (+ 2 ?40)
=> 42

C#(inside the Main()method)

C#Main()方法内部)

Console.WriteLine(2+?40);

Output: 42

Perl 6

6

>> ./perl6 -e'say 2+?40' 
42

回答by PSkocik

I guess it has to do something with the fact that for some strange reason it classifies as whitespace:

我想它必须做一些事情,因为某些奇怪的原因它被归类为空白:

$ unicode ?
U+1680 OGHAM SPACE MARK
UTF-8: e1 9a 80  UTF-16BE: 1680  Decimal: &#5760;
? (?)
Uppercase: U+1680
Category: Zs (Separator, Space)
Bidi: WS (Whitespace)

回答by noonand

I'd also like to know if there are more characters behaving like this.

我也想知道是否有更多的角色有这样的行为。

I seem to remember reading a piece a while back about mischievously replacing semi-colons (U+003B) in someone's code with U+037E which is the Greek question mark.

我似乎记得前段时间读过一篇关于将某人代码中的分号 (U+003B) 替换为 U+037E(希腊问号)的文章。

They both look the same (to the extent that I believe the Greeks themselves use U+003B) but this article stated that the other one wouldn't work.

它们看起来都一样(就我相信希腊人自己使用 U+003B 而言),但本文指出另一个不起作用。

Some more information on this from Wikipedia is here: https://en.wikipedia.org/wiki/Question_mark#Greek_question_mark

来自维基百科的更多信息在这里:https: //en.wikipedia.org/wiki/Question_mark#Greek_question_mark

And a (closed) question on using this as prank from SO itself. Not where I originally read it AFAIR though: JavaScript Prank / Joke

以及关于将其用作 SO 本身的恶作剧的(封闭式)问题。虽然不是我最初阅读的地方 AFAIR: JavaScript 恶作剧/笑话