在 <script> 块中转义 JavaScript 字符串文字中的 HTML 实体

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8749001/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 07:19:32  来源:igfitidea点击:

Escaping HTML entities in JavaScript string literals within the <script> block

javascripthtmlescaping

提问by mojuba

On the one hand if I have

一方面,如果我有

<script>
var s = 'Hello </script>';
console.log(s);
</script>

the browser will terminate the <script>block early and basically I get the page screwed up.

浏览器将<script>提前终止该块,基本上我把页面搞砸了。

On the other hand, the value of the string may come from a user (say, via a previously submitted form, and now the string ends up being inserted into a <script>block as a literal), so you can expect anything in that string, including maliciously formed tags. Now, if I escape the string literal with htmlentities() when generating the page, the value of s will contain the escaped entities literally, i.e. s will output

另一方面,字符串的值可能来自用户(例如,通过先前提交的表单,现在字符串最终<script>作为文字插入到块中),因此您可以期待该字符串中的任何内容,包括恶意形成的标签。现在,如果我在生成页面时使用 htmlentities() 对字符串文字进行转义,则 s 的值将包含转义的实体,即 s 将输出

Hello &lt;/script&gt;

which is not desired behavior in this case.

在这种情况下,这不是理想的行为。

One way of properly escaping JS strings within a <script>block is escaping the slash if it follows the left angle bracket, or just always escaping the slash, i.e.

<script>块中正确转义 JS 字符串的一种方法是转义斜线,如果它跟在左尖括号后面,或者总是转义斜线,即

var s = 'Hello <\/script>';

This seems to be working fine.

这似乎工作正常。

Then comes the question of JS code within HTML event handlers, which can be easily broken too, e.g.

然后是 HTML 事件处理程序中 JS 代码的问题,它也很容易被破坏,例如

<div onClick="alert('Hello ">')"></div>

looks valid at first but breaks in most (or all?) browsers. This, obviously requires the full HTML entity encoding.

起初看起来有效,但在大多数(或所有?)浏览器中都会中断。这显然需要完整的 HTML 实体编码。

My question is: what is the best/standard practice for properly covering all the situations above - i.e. JS within a script block, JS within event handlers - if your JS code can partly be generated on the server side and can potentially contain malicious data?

我的问题是:正确覆盖上述所有情况的最佳/标准做法是什么 - 即脚本块中的 JS,事件处理程序中的 JS - 如果您的 JS 代码可以部分在服务器端生成并且可能包含恶意数据?

回答by ThinkingStiff

The following characters couldinterfere with an HTML or Javascript parser and should be escaped in string literals: <, >, ", ', \,and &.

以下字符可能会干扰 HTML 或 Javascript 解析器,应在字符串文字中转义:<, >, ", ', \,&.

In a script block using the escape character, as you found out, works. The concatenation method (</scr' + 'ipt>') can be hard to read.

正如您发现的那样,在使用转义字符的脚本块中是有效的。连接方法 ( </scr' + 'ipt>') 可能难以阅读。

var s = 'Hello <\/script>';

For inline Javascript in HTML, you can use entities:

对于 HTML 中的内联 Javascript,您可以使用实体:

<div onClick="alert('Hello &quot;>')">click me</div>

Demo: http://jsfiddle.net/ThinkingStiff/67RZH/

演示:http: //jsfiddle.net/ThinkingStiff/67RZH/

The method that works in both <script>blocks and inline Javascript is \uxxxx, where xxxxis the hexadecimal character code.

<script>块和内联 Javascript中都有效的方法是\uxxxx,其中xxxx是十六进制字符代码。

  • <- \u003c
  • >- \u003e
  • "- \u0022
  • '- \u0027
  • \- \u005c
  • &- \u0026
  • <—— \u003c
  • >—— \u003e
  • "—— \u0022
  • '—— \u0027
  • \—— \u005c
  • &—— \u0026

Demo: http://jsfiddle.net/ThinkingStiff/Vz8n7/

演示:http: //jsfiddle.net/ThinkingStiff/Vz8n7/

HTML:

HTML:

<div onClick="alert('Hello \u0022>')">click me</div>

<script>
    var s = 'Hello \u003c/script\u003e';
alert( s );
</script>   

回答by Jamie Treworgy

(edit - somehow didn't notice you mentioned slash-escape in your question already...)

(编辑 - 不知何故没有注意到你已经在你的问题中提到了斜线转义......)

OK so you know how to escape a slash.

好的,所以你知道如何逃避斜线。

In inline event handlers, you can't use the bounding character inside a literal, so use the other one:

在内联事件处理程序中,您不能在文字中使用边界字符,因此请使用另一个:

<div onClick='alert("Hello \"")'>test</div>

But this is all in aid of making your life difficult. Just don't use inline event handlers! Or if you absolutely must, then have them call a function defined elsewhere.

但这一切都是为了让你的生活变得困难。只是不要使用内联事件处理程序!或者,如果您绝对必须,则让他们调用在别处定义的函数。

Generally speaking, there are few reasons for your server-side code to be writing javascript. Don't generate scripts from the server - pass data to pre-written scripts instead.

一般来说,您的服务器端代码编写 javascript 的原因很少。不要从服务器生成脚本 - 而是将数据传递给预先编写的脚本。

(original)

(原来的)

You can escape anything in a JS string literal with a backslash (that is not otherwise a special escape character):

您可以使用反斜杠转义 JS 字符串文字中的任何内容(这不是特殊的转义字符):

var s = 'Hello <\/script>';

This also has the positive effect of causing it to not be interpreted as html. So you could do a blanket replace of "/" with "\/" to no ill effect.

这也具有使其不被解释为 html 的积极影响。所以你可以用“\/”一揽子替换“/”,而不会产生不良影响。

Generally, though, I am concerned that you would have user-submitted data embedded as a string literal in javascript. Are you generating javascript code on the server? Why not just pass data as JSON or an HTML "data" attribute or something instead?

不过,一般来说,我担心您会将用户提交的数据作为字符串文字嵌入 javascript 中。您是否在服务器上生成 javascript 代码?为什么不将数据作为 JSON 或 HTML“数据”属性或其他东西传递?

回答by hugomg

I'd say the best practice would be avoiding inline JS in the first place.

我会说最好的做法是首先避免内联 JS

Put the JS code in a separate file and include it with the srcattribute

将 JS 代码放在一个单独的文件中并包含在src属性中

<script src="path/to/file.js"></script>

and use it to set event handlers from the inside isntead of putting those in the HTML.

并使用它从内部设置事件处理程序,而不是将它们放在 HTML 中。

//jquery example
$('div.something').on('click', function(){
    alert('Hello>');
})

回答by Dave Brown

Here's how I do it:

这是我的方法:

function encode(r){
return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"})
}

var myString='Encode HTML entities!\n"Safe" escape <script></'+'script> & other tags!';

test.value=encode(myString);

testing.innerHTML=encode(myString);

/*************
* \x26 is &ampersand (it has to be first),
* \x0A is newline,
*************/
<textarea id=test rows="9" cols="55"></textarea>

<div id="testing">www.WHAK.com</div>

回答by Diodeus - James MacFarlane

Most people use this trick:

大多数人使用这个技巧:

var s = 'Hello </scr' + 'ipt>';