在 Javascript 中将用户输入添加到 DOM 之前对其进行清理

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2794137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 01:56:55  来源:igfitidea点击:

Sanitizing user input before adding it to the DOM in Javascript

javascriptxssescaping

提问by I GIVE TERRIBLE ADVICE

I'm writing the JS for a chat application I'm working on in my free time, and I need to have HTML identifiers that change according to user submitted data. This is usually something conceptually shaky enough that I would not even attempt it, but I don't see myself having much of a choice this time. What I need to do then is to escape the HTML id to make sure it won't allow for XSS or breaking HTML.

我正在为空闲时间正在处理的聊天应用程序编写 JS,我需要根据用户提交的数据更改 HTML 标识符。这通常在概念上很不稳定,我什至不会尝试,但我认为这次我没有太多选择。然后我需要做的是转义 HTML id 以确保它不会允许 XSS 或破坏 HTML。

Here's the code:

这是代码:

var user_id = escape(id)
var txt = '<div class="chut">'+
            '<div class="log" id="chut_'+user_id+'"></div>'+
            '<textarea id="chut_'+user_id+'_msg"></textarea>'+
            '<label for="chut_'+user_id+'_to">To:</label>'+
            '<input type="text" id="chut_'+user_id+'_to" value='+user_id+' readonly="readonly" />'+
            '<input type="submit" id="chut_'+user_id+'_send" value="Message"/>'+
          '</div>';

What would be the best way to escape idto avoid any kind of problem mentioned above? As you can see, right now I'm using the built-in escape()function, but I'm not sure of how good this is supposed to be compared to other alternatives. I'm mostly used to sanitizing input before it goes in a text node, not an id itself.

id避免上述任何类型的问题的最佳逃避方法是什么?如您所见,现在我正在使用内置escape()函数,但我不确定与其他替代方案相比,这应该有多好。我主要习惯于在输入文本节点之前对其进行消毒,而不是 id 本身。

回答by bobince

Neveruse escape(). It's nothing to do with HTML-encoding. It's more like URL-encoding, but it's not even properly that. It's a bizarre non-standard encoding available only in JavaScript.

永远不要使用escape(). 这与 HTML 编码无关。它更像是 URL 编码,但它甚至不正确。这是一种奇怪的非标准编码,仅在 JavaScript 中可用。

If you want an HTML encoder, you'll have to write it yourself as JavaScript doesn't give you one. For example:

如果你想要一个 HTML 编码器,你必须自己编写它,因为 JavaScript 没有给你一个。例如:

function encodeHTML(s) {
    return s.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/"/g, '&quot;');
}

However whilst this is enough to put your user_idin places like the input value, it's not enough for idbecause IDs can only use a limited selection of characters. (And %isn't among them, so escape()or even encodeURIComponent()is no good.)

然而,虽然这足以将您user_id放在像 一样的地方input value,但这还不够,id因为 ID 只能使用有限的字符选择。(而且%不在其中,所以escape()甚至encodeURIComponent()都不好。)

You could invent your own encoding scheme to put any characters in an ID, for example:

您可以发明自己的编码方案来将任何字符放入 ID,例如:

function encodeID(s) {
    if (s==='') return '_';
    return s.replace(/[^a-zA-Z0-9.-]/g, function(match) {
        return '_'+match[0].charCodeAt(0).toString(16)+'_';
    });
}

But you've still got a problem if the same user_idoccurs twice. And to be honest, the whole thing with throwing around HTML strings is usually a bad idea. Use DOM methods instead, and retain JavaScript references to each element, so you don't have to keep calling getElementById, or worrying about how arbitrary strings are inserted into IDs.

但是如果同样的user_id情况发生两次,你仍然会遇到问题。老实说,乱扔 HTML 字符串的整个过程通常是个坏主意。改用 DOM 方法,并保留对每个元素的 JavaScript 引用,这样您就不必继续调用getElementById,也不必担心如何将任意字符串插入到 ID 中。

eg.:

例如。:

function addChut(user_id) {
    var log= document.createElement('div');
    log.className= 'log';
    var textarea= document.createElement('textarea');
    var input= document.createElement('input');
    input.value= user_id;
    input.readonly= True;
    var button= document.createElement('input');
    button.type= 'button';
    button.value= 'Message';

    var chut= document.createElement('div');
    chut.className= 'chut';
    chut.appendChild(log);
    chut.appendChild(textarea);
    chut.appendChild(input);
    chut.appendChild(button);
    document.getElementById('chuts').appendChild(chut);

    button.onclick= function() {
        alert('Send '+textarea.value+' to '+user_id);
    };

    return chut;
}

You could also use a convenience function or JS framework to cut down on the lengthiness of the create-set-appends calls there.

您还可以使用便利函数或 JS 框架来减少那里的 create-set-appends 调用的冗长。

ETA:

预计到达时间:

I'm using jQuery at the moment as a framework

我目前使用 jQuery 作为框架

OK, then consider the jQuery 1.4 creation shortcuts, eg.:

好的,然后考虑 jQuery 1.4 创建快捷方式,例如:

var log= $('<div>', {className: 'log'});
var input= $('<input>', {readOnly: true, val: user_id});
...

The problem I have right now is that I use JSONP to add elements and events to a page, and so I can not know whether the elements already exist or not before showing a message.

我现在的问题是我使用 JSONP 向页面添加元素和事件,因此在显示消息之前我无法知道这些元素是否已经存在。

You can keep a lookup of user_idto element nodes (or wrapper objects) in JavaScript, to save putting that information in the DOM itself, where the characters that can go in an idare restricted.

您可以user_id在 JavaScript 中保持对元素节点(或包装器对象)的查找,以节省将该信息放在 DOM 本身中,其中可以进入的字符id受到限制。

var chut_lookup= {};
...

function getChut(user_id) {
    var key= '_map_'+user_id;
    if (key in chut_lookup)
        return chut_lookup[key];
    return chut_lookup[key]= addChut(user_id);
}

(The _map_prefix is because JavaScript objects don't quitework as a mapping of arbitrary strings. The empty string and, in IE, some Objectmember names, confuse it.)

_map_前缀是因为 JavaScript 对象不能完全用作任意字符串的映射。空字符串和 IE 中的一些Object成员名称会混淆它。)

回答by SilentImp

You can use this:

你可以使用这个:

function sanitize(string) {
  const map = {
      '&': '&amp;',
      '<': '&lt;',
      '>': '&gt;',
      '"': '&quot;',
      "'": '&#x27;',
      "/": '&#x2F;',
  };
  const reg = /[&<>"'/]/ig;
  return string.replace(reg, (match)=>(map[match]));
}

Also see OWASP XSS Prevention Cheat Sheet.

另请参阅 OWASP XSS 预防备忘单

回答by aaaaaaaaaaaa

You could use a simple regular expression to assert that the id only contains allowed characters, like so:

您可以使用一个简单的正则表达式来断言 id 只包含允许的字符,如下所示:

if(id.match(/^[0-9a-zA-Z]{1,16}$/)){
    //The id is fine
}
else{
    //The id is illegal
}

My example allows only alphanumerical characters, and strings of length 1 to 16, you should change it to match the type of ids that you use.

我的示例只允许使用字母数字字符和长度为 1 到 16 的字符串,您应该更改它以匹配您使用的 id 类型。

By the way, at line 6, the value property is missing a pair of quotes, an easy mistake to make when you quote on two levels.

顺便说一下,在第 6 行,value 属性缺少一对引号,当您在两个级别上引用时很容易犯这个错误。

I can't see your actual data flow, depending on context this check may not at all be needed, or it may not be enough. In order to make a proper security review we would need more information.

我看不到您的实际数据流,根据上下文,此检查可能根本不需要,或者可能不够。为了进行适当的安全,我们需要更多信息。

In general, about built in escape or sanitize functions, don't trust them blindly. You need to know exactly what they do, and you need to establish that that is actually what you need. If it is not what you need, the code your own, most of the time a simple whitelisting regex like the one I gave you works just fine.

一般来说,关于内置的转义或消毒功能,不要盲目相信它们。您需要确切地知道他们做什么,并且您需要确定这实际上是您所需要的。如果它不是你需要的,你自己的代码,大多数时候像我给你的一个简单的白名单正则表达式就可以了。

回答by Brandon

Since the text that you are escaping will appear in an HTML attribute, you must be sure to escape not only HTML entities but also HTML attributes:

由于您要转义的文本将出现在 HTML 属性中,因此您必须确保不仅要转义 HTML 实体,还要转义 HTML 属性:

var ESC_MAP = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    "'": '&#39;'
};

function escapeHTML(s, forAttribute) {
    return s.replace(forAttribute ? /[&<>'"]/g : /[&<>]/g, function(c) {
        return ESC_MAP[c];
    });
}

Then, your escaping code becomes var user_id = escapeHTML(id, true).

然后,您的转义代码变为var user_id = escapeHTML(id, true).

For more information, see Foolproof HTML escaping in Javascript.

有关更多信息,请参阅在 Javascript 中进行万无一失的 HTML 转义

回答by kozmic

You need to take extra precautions when using user supplied data in HTML attributes. Because attributes has many more attack vectors than output inside HTML tags.

在 HTML 属性中使用用户提供的数据时,您需要采取额外的预防措施。因为属性比 HTML 标签内的输出具有更多的攻击向量。

The only way to avoid XSS attacks is to encode everything except alphanumeric characters. Escape all characters with ASCII values less than 256 with the &#xHH; format. Which unfortunately may cause problems in your scenario, if you are using CSS classes and javascript to fetch those elements.

避免 XSS 攻击的唯一方法是对除字母数字字符之外的所有内容进行编码。使用 &#xHH; 转义所有 ASCII 值小于 256 的字符 格式。不幸的是,如果您使用 CSS 类和 javascript 来获取这些元素,这可能会导致您的场景出现问题。

OWASP has a good description of how to mitigate HTML attribute XSS:

OWASP 对如何缓解 HTML 属性 XSS 有很好的描述:

http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_HTML_JavaScript_Data_Values

http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_HTML_JavaScript_Data_Values