'\u2028' Unicode 字符上的 Javascript 解析错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2965293/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 02:42:39  来源:igfitidea点击:

Javascript parse error on '\u2028' unicode character

javascriptunicode

提问by klaaspieter

Whenever I use the \u2028 character literal in my javascript source with the content type set to "text/html; charset=utf-8" I get a javascript parse errors.

每当我在我的 javascript 源代码中使用 \u2028 字符文字并将内容类型设置为“text/html; charset=utf-8”时,我都会收到一个 javascript 解析错误。

Example:

例子:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">

<html lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>json</title>

    <script type="text/javascript" charset="utf-8">
    var string = '?    ';
    </script>
</head>
<body>

</body>
</html>

If the <meta http-equiv>is left out everything works as expected. I've tested this on Safari and Firefox, both exhibit the same problem.

如果<meta http-equiv>被遗漏,一切都会按预期进行。我已经在 Safari 和 Firefox 上对此进行了测试,两者都表现出相同的问题。

Any ideas on why this is happening and how to properly fix this (without removing the encoding)?

关于为什么会发生这种情况以及如何正确解决此问题(不删除编码)的任何想法?

Edit: After some more research, the specific problem was that the problem character was returned using JSONP. This was then interpreted by the browser, which reads u2028 as a newline and throws an error about an invalid newline in a string.

编辑:经过更多研究,具体问题是问题字符是使用 JSONP 返回的。这随后被浏览器解释,它将 u2028 读取为换行符并抛出关于字符串中无效换行符的错误。

回答by bobince

Yes, it's a feature of the JavaScript language, documented in the ECMAScript standard (3rd edition section 7.3), that the U+2028 and U+2029 characters count as line endings. Consequently a JavaScript parser will treat any unencoded U+2028/9 character in the same way as a newline. Since you can't put a newline inside a string literal, you get a syntax error.

是的,这是 JavaScript 语言的一个特性,记录在 ECMAScript 标准(第 3 版第 7.3 节)中,U+2028 和 U+2029 字符算作行尾。因此,JavaScript 解析器将以与换行相同的方式处理任何未编码的 U+2028/9 字符。由于不能在字符串文字中放置换行符,因此会出现语法错误。

This is an unfortunate oversight in the design of JSON: it is not actually a proper subset of JavaScript. Raw U+2028/9 characters are valid in string literals in JSON, and will be accepted by JSON.parse, but not so in JavaScript itself.

这是 JSON 设计中的一个不幸疏忽:它实际上不是 JavaScript 的一个适当子集。原始 U+2028/9 字符在 JSON 中的字符串文字中是有效的,并且会被 接受JSON.parse,但在 JavaScript 本身中则不然。

Hence it is only safe to generate JavaScript code using a JSON parser if you're sure it explicitly \u-escapes those characters. Some do, some don't; many \u-escape all non-ASCII characters, which avoids the problem.

因此,如果您确定它明确地\u转义了这些字符,那么使用 JSON 解析器生成 JavaScript 代码才是安全的。有些会,有些不会;many \u-escape 所有非 ASCII 字符,从而避免了该问题。

回答by klaaspieter

Alright,to answer my own question.

好吧,回答我自己的问题。

Normally a JSON parser strips out these problem characters, because I was retrieving JSONP I wasn't using a JSON parser, in stead the browser tried to parse the JSON itself as soon as the callback was called.

通常,JSON 解析器会删除这些有问题的字符,因为我正在检索 JSONP 我没有使用 JSON 解析器,相反,一旦调用回调,浏览器就会尝试解析 JSON 本身。

The only way to fix it was to make sure the server never returns these characters when requesting a JSONP resource.

修复它的唯一方法是确保服务器在请求 JSONP 资源时永远不会返回这些字符。

p.s. My question was about u2028, according to Douglas Crockford's json2 libraryall of the following characters can cause these problems:

ps 我的问题是关于 u2028,根据Douglas Crockford 的 json2 库,以下所有字符都可能导致这些问题:

'\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff'

'\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff'

回答by YOU

Could you just use \u2028, instead of real character?, because U+2028 is unicode line seperator, browsers would think that as real line break character like \n.

你能用\u2028, 而不是真正的字符吗?,因为 U+2028 是unicode 行分隔符,浏览器会认为它是真正的换行符,如\n.

We cannot do like

我们不能像

x = "

"

Right? but we do x = "\n", so might be same concept.

对?但我们这样做x = "\n",所以可能是相同的概念。

回答by Remy Lebeau

Well, that makes sense, since you are telling the browser that the HTML and script are both using UTF-8, but then you specify a character that is not UTF-8 encoded. When you specify "charset=UTF-8", you are respoonsible for making sure the bytes transmitted to the browser are actually UTF-8. The web server and and browser will not do it for you in this situation.

嗯,这是有道理的,因为您告诉浏览器 HTML 和脚本都使用 UTF-8,然后您指定了一个不是 UTF-8 编码的字符。当您指定“charset=UTF-8”时,您有责任确保传输到浏览器的字节实际上是 UTF-8。在这种情况下,Web 服务器和浏览器不会为您执行此操作。