为什么在从 JavaScript 生成 HTML 时使用 \x3C 而不是 <?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8231048/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why use \x3C instead of < when generating HTML from JavaScript?
提问by Mark Whitaker
I see the following HTML code used a lot to load jQuery from a content delivery network, but fall back to a local copy if the CDN is unavailable (e.g. in the Modernizr docs):
我看到以下 HTML 代码经常用于从内容交付网络加载 jQuery,但如果 CDN 不可用(例如在Modernizr 文档中),则回退到本地副本:
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.js"></script>
<script>window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">\x3C/script>')</script>
My question is, why is the last <
character in the document.write()
statement replaced with the escape sequence \x3C
? <
is a safe character in JavaScript and is even used earlier in the same string, so why escape it there? Is it just to prevent bad browser implementations from thinking the </script>
inside the string is the real script end tag? If so are there really any browsers out there that would fail on this?
我的问题是,为什么语句中的最后一个<
字符document.write()
替换为转义序列\x3C
?<
是 JavaScript 中的安全字符,甚至更早在同一个字符串中使用过,那么为什么要在那里转义呢?是否只是为了防止糟糕的浏览器实现认为</script>
字符串内部是真正的脚本结束标记?如果是这样,真的有任何浏览器会失败吗?
As a follow-on question, I've also seen a variant using unescape()
(as given in this answer) in the wild a couple of times too. Is there a reason why that version always seems to substitute allthe <
and >
characters?
作为一个后续问题,我也曾多次在野外看到使用unescape()
(如本答案中给出的)的变体。是否有一个原因,为什么这个版本似乎总是代替所有的<
和>
人物?
回答by balpha
When the browser sees </script>
, it considers this to be the end of the script block (since the HTML parser has no idea about JavaScript, it can't distinguish between something that just appears in a string, and something that's actually meantto end the script element). So </script>
appearing literally in JavaScript that's inside an HTML page will (in the best case) cause errors, and (in the worst case) be a huge security hole.
当浏览器看到 时</script>
,它认为这是脚本块的结尾(因为 HTML 解析器对 JavaScript 一无所知,所以它无法区分出现在字符串中的内容和实际用于结束脚本的内容元素)。因此,直接</script>
出现在 HTML 页面内的 JavaScript 中将(在最好的情况下)导致错误,并且(在最坏的情况下)是一个巨大的安全漏洞。
That's why you somehow have to prevent this sequence of characters to appear. Other common workarounds for this issue are "<"+"/script>"
and "<\/script>"
(they all come down to the same thing).
这就是为什么您必须以某种方式阻止出现此字符序列的原因。此问题的其他常见解决方法是"<"+"/script>"
和"<\/script>"
(它们都归结为同一件事)。
While some consider this to be a "bug", it actually has tohappen this way, since, as per the specification, the HTML part of the user agent is completely separate from the scripting engine. You can put all kinds of things into <script>
tags, not just JavaScript. The W3C mentions VBScript and TCL as examples. Another example is the jQuery template plugin, which uses those tags as well.
虽然有些人认为这是一个“错误”,但它实际上必须以这种方式发生,因为根据规范,用户代理的 HTML 部分与脚本引擎完全分开。您可以将各种内容放入<script>
标签中,而不仅仅是 JavaScript。W3C 提到了 VBScript 和 TCL 作为示例。另一个例子是jQuery 模板插件,它也使用这些标签。
But even within JavaScript, where you could suggest that such content in strings could be recognized and thus not be treated as ending tags, the next ambiguity comes up when you consider comments:
但即使在 JavaScript 中,您可以建议字符串中的此类内容可以被识别,因此不会被视为结束标记,当您考虑注释时,下一个歧义会出现:
<script type="text/javascript">foo(42); // call the function </script>
– what should the browser do in this case?
– 在这种情况下浏览器应该怎么做?
And finally, what about browsers that don't even know JavaScript? They would just ignore the part between <script>
and </script>
, but if you gave different semantics to the character sequence </script>
based on the browsers knowledge of JavaScript, you'd suddenly have two different results in the HTML parsing stage.
最后,那些连 JavaScript 都不知道的浏览器呢?他们会忽略部分之间<script>
和</script>
,但如果你给不同的语义字符序列</script>
基础上的JavaScript的浏览器的知识,你会突然有两个不同的结果HTML解析阶段。
Lastly, regarding your question about substituting allangle brackets: I'd say at least in 99% of the cases, that's for obfuscation, i.e. to hide (from anti-virus software, censoring proxies (like in your example (nested parens are awesome)), etc.) the fact that your JavaScript is doing some HTML-y stuff. I can't think of good technical reasons to hide anything but </script>
, at least not for reasonably modern browsers (and by that, I mean pretty much anything newer than Mosaic).
最后,关于你关于替换所有尖括号的问题:我会说至少在 99% 的情况下,这是为了混淆,即隐藏(从防病毒软件,代理(就像在你的例子中一样(嵌套括号很棒) )) 等)事实上,你的 JavaScript 正在做一些 HTML-y 的东西。我想不出很好的技术原因要隐藏什么,但</script>
至少不是合理的现代浏览器(以及由,我的意思是不是马赛克几乎任何更新版本)。
回答by J. K.
Some parsers handle the <
version as the closing tag and interpret the code as
一些解析器将<
版本作为结束标记处理并将代码解释为
<script>
window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">
</script>
\x3C
is hexadecimal for <
. Those are interchangable within the script.
\x3C
是十六进制的<
。这些在脚本中是可以互换的。