javascript XSS 预防和 .innerHTML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30661497/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-28 12:30:19  来源:igfitidea点击:

XSS prevention and .innerHTML

javascriptencodingxssinnerhtml

提问by stanko

When I allow users to insert data as an argument to the JS innerHTMLfunction like this:

当我允许用户innerHTML像这样插入数据作为 JS函数的参数时:

element.innerHTML = “User provided variable”;

I understood that in order to prevent XSS, I have to HTML encode, and then JS encode the user input because the user could insert something like this:

我知道为了防止 XSS,我必须对用户输入进行 HTML 编码,然后 JS 对用户输入进行编码,因为用户可以插入这样的内容:

<img src=a onerror='alert();'>

Only HTML or only JS encoding would not help because the .innerHTMLmethod as I understood decodes the input before inserting it into the page. With HTML+JS encoding, I noticed that the .innerHTMLdecodes only the JS, but the HTML encoding remains.

只有 HTML 或只有 JS 编码无济于事,因为.innerHTML我理解的方法在将输入插入页面之前对其进行解码。使用 HTML+JS 编码,我注意到.innerHTML只解码了 JS,但 HTML 编码仍然存在。

But I was able to achieve the same by double encoding into HTML.

但是我能够通过双重编码到 HTML 来实现相同的效果。

My question is: Could somebody provide an example of why I should HTML encode and then JS encode, and not double encode in HTML when using the .innerHTMLmethod?

我的问题是:有人可以提供一个示例,说明为什么我应该在使用该.innerHTML方法时进行 HTML 编码然后 JS 编码,而不是在 HTML 中进行双重编码?

回答by SilverlightFox

Could somebody provide an example of why I should HTML encode and then JS encode, and not double encode in HTML when using the .innerHTML method?

有人可以提供一个示例,说明为什么我应该在使用 .innerHTML 方法时进行 HTML 编码然后 JS 编码,而不是在 HTML 中进行双重编码?

Sure.

当然。

Assuming the "user provided data" is populated in your JavaScript by the server, then you will have to JS encode to get it there.

假设服务器在您的 JavaScript 中填充了“用户提供的数据”,那么您将必须对它进行 JS 编码才能到达那里。

This following is pseudocode on the server-side end, but in JavaScript on the front end:

以下是服务器端的伪代码,但前端是 JavaScript:

var userProdividedData = "<%=serverVariableSetByUser %>";
element.innerHTML = userProdividedData;

Like ASP.NET <%= %>outputs the server side variable without encoding. If the user is "good" and supplies the value foothen this results in the following JavaScript being rendered:

像 ASP.NET<%= %>输出服务器端变量而不进行编码。如果用户“好”并提供值,foo那么这将导致呈现以下 JavaScript:

var userProdividedData = "foo";
element.innerHTML = userProdividedData;

So far no problems.

到目前为止没有问题。

Now say a malicious user supplies the value "; alert("xss attack!");//. This would be rendered as:

现在说一个恶意用户提供值"; alert("xss attack!");//。这将呈现为:

var userProdividedData = ""; alert("xss attack!");//";
element.innerHTML = userProdividedData;

which would result in an XSS exploit where the code is actually executed in the first line of the above.

这将导致 XSS 漏洞利用,其中代码实际上在上面的第一行中执行。

To prevent this, as you say you JS encode. The OWASP XSS prevention cheat sheet rule #3says:

为了防止这种情况,正如您所说,您进行了 JS 编码。在OWASP XSS预防小抄规则#3说:

Except for alphanumeric characters, escape all characters less than 256 with the \xHH format to prevent switching out of the data value into the script context or into another attribute.

除字母数字字符外,使用 \xHH 格式对所有小于 256 的字符进行转义,以防止将数据值切换到脚本上下文或另一个属性中。

So to secure against this your code would be

因此,为了防止这种情况,您的代码将是

var userProdividedData = "<%=JsEncode(serverVariableSetByUser) %>";
element.innerHTML = userProdividedData;

where JsEncodeencodes as per the OWASP recommendation.

其中JsEncode根据 OWASP 建议进行编码。

This would prevent the above attack as it would now render as follows:

这将防止上述攻击,因为它现在呈现如下:

var userProdividedData = "\x22\x3b\x20alert\x28\x22xss\x20attack\x21\x22\x29\x3b\x2f\x2f";
element.innerHTML = userProdividedData;

Now you have secured your JavaScript variable assignment against XSS.

现在您已经针对 XSS 保护了您的 JavaScript 变量分配。

However, what if a malicious user supplied <img src="xx" onerror="alert('xss attack')" />as the value? This would be fine for the variable assignment part as it would simply get converted into the hex entity equivalent like above.

但是,如果恶意用户<img src="xx" onerror="alert('xss attack')" />作为值提供怎么办?这对于变量赋值部分来说很好,因为它会像上面一样简单地转换为等效的十六进制实体。

However the line

然而这条线

element.innerHTML = userProdividedData;

would cause alert('xss attack')to be executed when the browser renders the inner HTML. This would be a DOM Based XSSattack.

会导致alert('xss attack')在浏览器呈现内部 HTML 时执行。这将是一种基于 DOM 的 XSS攻击。

This is why you would need to HTML encode too. This can be done via a function such as:

这就是您也需要进行 HTML 编码的原因。这可以通过一个函数来完成,例如:

function escapeHTML (unsafe_str) {
    return unsafe_str
      .replace(/&/g, '&amp;')
      .replace(/</g, '&lt;')
      .replace(/>/g, '&gt;')
      .replace(/\"/g, '&quot;')
      .replace(/\'/g, '&#39;')
      .replace(/\//g, '&#x2F;')
}

making your code

制作你的代码

element.innerHTML = escapeHTML(userProdividedData);

or could be done via JQuery's text()function.

或者可以通过 JQuery 的text()函数来完成。

Update regarding question in comments

关于评论中问题的更新

I just have one more question: You mentioned that we must JS encode because an attacker could enter "; alert("xss attack!");//. But if we would use HTML encoding instead of JS encoding, wouldn't that also HTML encode the "sign and make this attack impossible because we would have: var userProdividedData ="&quot;; alert(&quot;xss attack!&quot;);&#x2F;&#x2F;";

我还有一个问题:您提到我们必须进行 JS 编码,因为攻击者可以输入"; alert("xss attack!");//. 但是,如果我们使用 HTML 编码而不是 JS 编码,那么 HTML 是否也会对"符号进行编码并使这种攻击变得不可能,因为我们将有:var userProdividedData ="&quot;; alert(&quot;xss attack!&quot;);&#x2F;&#x2F;";

I'm taking your question to mean the following: Rather than JS encoding followed by HTML encoding, why don't we don't just HTML encode in the first place, and leave it at that?

我认为您的问题意味着以下内容:与其先进行 JS 编码,然后再进行 HTML 编码,为什么我们不首先进行 HTML 编码,然后就这样呢?

Well because they could encode an attack such as <img src="xx" onerror="alert('xss attack')" />all encoded using the \xHHformat to insert their payload - this would achieve the desired HTML sequence of the attack without using any of the characters that HTML encoding would affect.

嗯,因为他们可以对攻击进行编码,例如<img src="xx" onerror="alert('xss attack')" />使用\xHH插入其有效载荷的格式编码的所有攻击- 这将实现攻击的所需 HTML 序列,而无需使用 HTML 编码会影响的任何字符。

There are some other attacks too: If the attacker entered \then they could force the browser to miss the closing quote (as \is the escape character in JavaScript).

还有一些其他攻击:如果攻击者进入,\那么他们可能会迫使浏览器错过结束引号(就像\JavaScript 中的转义字符一样)。

This would render as:

这将呈现为:

var userProdividedData = "\";

which would trigger a JavaScript error because it is not a properly terminated statement. This could cause a Denial of Service to the application if it is rendered in a prominent place.

这会触发 JavaScript 错误,因为它不是一个正确终止的语句。如果应用程序在显眼位置呈现,这可能会导致应用程序拒绝服务。

Additionally say there were two pieces of user controlled data:

另外说有两个用户控制的数据:

var userProdividedData = "<%=serverVariableSetByUser1 %>" + ' - ' + "<%=serverVariableSetByUser2 %>";

the user could then enter \in the first and ;alert('xss');//in the second. This would change the string concatenation into one big assignment, followed by an XSS attack:

然后用户可以输入\第一个和;alert('xss');//第二个。这会将字符串连接变成一个大任务,然后是 XSS 攻击:

var userProdividedData = "\" + ' - ' + ";alert('xss');//";

Because of edge cases like these it is recommended to follow the OWASP guidelines as they are as close to bulletproof as you can get. You might think that adding \to the list of HTML encoded values solves this, however there are other reasons to use JS followed by HTML when rendering content in this manner because this method also works for data in attribute values:

由于像这样的边缘情况,建议遵循 OWASP 指南,因为它们尽可能接近防弹。您可能认为添加\到 HTML 编码值列表可以解决这个问题,但是在以这种方式呈现内容时使用 JS 后跟 HTML 还有其他原因,因为此方法也适用于属性值中的数据:

<a href="javascript:void(0)" onclick="myFunction('<%=JsEncode(serverVariableSetByUser) %>'); return false">

Despite whether it is single or double quoted:

不管是单引号还是双引号:

<a href='javascript:void(0)' onclick='myFunction("<%=JsEncode(serverVariableSetByUser) %>"); return false'>

Or even unquoted:

甚至不加引号:

<a href=javascript:void(0) onclick=myFunction("<%=JsEncode(serverVariableSetByUser) %>");return false;>

If you HTML encoded like mentioned in your comment an entity value:

如果您按照评论中提到的方式进行 HTML 编码,则为实体值:

onclick='var userProdividedData ="&quot;;"'(shortened version)

onclick='var userProdividedData ="&quot;;"'(缩短版)

the code is actually run via the browser's HTML parser first, so userProdividedDatawould be

代码实际上首先通过浏览器的 HTML 解析器运行,所以userProdividedData

";;

instead of

代替

&quot;;

so when you add it to the innerHTMLcall you would have XSS again. Note that <script>blocks are not processed via the browser's HTML parser, except for the closing </script>tag, butthat's another story.

因此,当您将其添加到innerHTML通话中时,您将再次遇到 XSS。请注意,<script>块不是通过浏览器的 HTML 解析器处理的,除了结束</script>标记,但这另一回事

It is always wise to encode as lateas possible such as shown above. Then if you need to output the value in anything other than a JavaScript context (e.g. an actual alert box does not render HTML, then it will still display correctly).

如上所示,尽可能地编码总是明智的。然后,如果您需要在 JavaScript 上下文以外的任何内容中输出值(例如,实际的警报框不呈现 HTML,那么它仍将正确显示)。

That is, with the above I can call

也就是说,有了上面的我可以打电话

alert(serverVariableSetByUser);

just as easily as setting HTML

就像设置 HTML 一样简单

element.innerHTML = escapeHTML(userProdividedData);

In both cases it will be displayed correctly without certain characters from disrupting output or causing undesirable code execution.

在这两种情况下,它都会正确显示,而不会因某些字符中断输出或导致不受欢迎的代码执行。

回答by adriann

A simple way to make sure the contents of your elementis properly encoded (and will not be parsed as HTML) is to use textContentinstead of innerHTML:

确保您的内容element正确编码(并且不会被解析为 HTML)的一种简单方法是使用textContent而不是innerHTML

element.textContent = "User provided variable with <img src=a>";

Another option is to use innerHTMLonly after you have encoded (preferably on the server if you get the chance) the values you intend to use.

另一种选择是innerHTML仅在您对打算使用的值进行编码后(如果有机会,最好在服务器上)使用。

回答by Navneet Sharma

I have faced this issue in my ASP.NET Webforms application. The fix to this is relatively simple.

我在我的 ASP.NET Webforms 应用程序中遇到过这个问题。对此的修复相对简单。

Install HtmlSanitizationLibrary from NuGet Package Manager and refer this in your application. At the code behind, please use the sanitizer class in the following way.

从 NuGet 包管理器安装 HtmlSanitizationLibrary 并在您的应用程序中引用它。在后面的代码中,请按以下方式使用 sanitizer 类。

For example, if the current code looks something like this,

例如,如果当前代码看起来像这样,

YourHtmlElement.InnerHtml = "Your HTML content" ;

Then, replace this with the following:

然后,将其替换为以下内容:

string unsafeHtml = "Your HTML content"; 
YourHtmlElement.InnerHtml = Sanitizer.GetSafeHtml(unsafeHtml);

This fix will remove the Veracode vulnerability and make sure that the string gets rendered as HTML. Encoding the string at code behind will render it as 'un-encoded string' rather than RAW HTML as it is encoded before the render begins.

此修复程序将消除 Veracode 漏洞并确保将字符串呈现为 HTML。在后面的代码中对字符串进行编码会将其呈现为“未编码的字符串”而不是 RAW HTML,因为它在呈现开始之前已被编码。