Javascript 如何转义 HTML 属性值中的引号?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7753448/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 03:28:09  来源:igfitidea点击:

How do I escape quotes in HTML attribute values?

javascripthtml

提问by Steve Walsh

I'm building up a row to insert in a table using jQuery by creating a html string, i.e.

我正在通过创建一个 html 字符串来构建一行以使用 jQuery 插入到表中,即

var row = "";
row += "<tr>";
row += "<td>Name</td>";
row += "<td><input value='"+data.name+"'/></td>";
row += "</tr>";

data.nameis a string returned from an ajax call which could contain any characters. If it contains a single quote, ', it will break the HTML by defining the end of the attribute value.

data.name是从 ajax 调用返回的字符串,可以包含任何字符。如果它包含单引号, ',它将通过定义属性值的结尾来破坏 HTML。

How can I ensure that the string is rendered correctly in the browser?

如何确保字符串在浏览器中正确呈现?

采纳答案by Andy E

You just need to swap any 'characters with the equivalent HTML entity character code:

你只需要用'等效的 HTML 实体字符代码交换任何字符:

data.name.replace(/'/g, "&#39;");

Alternatively, you could create the whole thing using jQuery's DOM manipulation methods:

或者,您可以使用 jQuery 的 DOM 操作方法创建整个内容:

var row = $("<tr>").append("<td>Name</td><td></td>");
$("<input>", { value: data.name }).appendTo(row.children("td:eq(1)"));

回答by verdy_p

Actually you may need one of these two functions (this depends on the context of use). These functions handle all kind of string quotes, and also protect from the HTML/XML syntax.

实际上,您可能需要这两个函数之一(这取决于使用的上下文)。这些函数处理所有类型的字符串引号,并且还保护不受 HTML/XML 语法的影响。

1. The quoteattr()function for embeding text into HTML/XML:

1、quoteattr()HTML/XML嵌入文字功能:

The quoteattr()function is used in a context, where the result will notbe evaluated by javascript but must be interpreted by an XML or HTML parser, and it must absolutely avoid breaking the syntax of an element attribute.

quoteattr()函数在上下文中使用,其中结果不会由 javascript 评估,但必须由 XML 或 HTML 解析器解释,并且必须绝对避免破坏元素属性的语法。

Newlines are natively preserved if generating the content of a text elements. However, if you're generating the value of an attribute this assigned value will be normalized by the DOM as soon as it will be set, so all whitespaces (SPACE, TAB, CR, LF) will be compressed, stripping leading and trailing whitespaces and reducing all middle sequences of whitespaces into a single SPACE.

如果生成文本元素的内容,则本机保留换行符。但是,如果您正在生成属性的值,此分配的值将在设置后立即由 DOM 规范化,因此所有空格(SPACE、TAB、CR、LF)都将被压缩,去除前导和尾随空格并将所有中间的空格序列减少到一个空格中。

But there's an exception: the CR character will be preserved and nottreated as whitespace, onlyif it is represented with a numeric character reference! The result will be valid for all element attributes, with the exception of attributes of type NMTOKEN or ID, or NMTOKENS: the presence of the referenced CR will make the assigned value invalid for those attributes (for example the id="..." attribute of HTML elements): this value being invalid, will be ignored by the DOM. But in other attributes (of type CDATA), all CR characters represented by a numeric character reference will be preserved and not normalized. Note that this trick will not work to preserve other whitespaces (SPACE, TAB, LF), even if they are represented by NCR, because the normalization of all whitespaces (with the exception of the NCR to CR) is mandatory in allattributes.

但有一个例外:CR 字符将被保留,不会被视为空格,只有如果它用数字字符引用表示!结果对所有元素属性都有效,但 NMTOKEN 或 ID 或 NMTOKENS 类型的属性除外:引用的 CR 的存在将使这些属性的分配值无效(例如 id="..." HTML 元素的属性):此值无效,将被 DOM 忽略。但是在其他属性(CDATA 类型)中,所有由数字字符引用表示的 CR 字符将被保留而不是规范化。请注意,此技巧不适用于保留其他空格(SPACE、TAB、LF),即使它们由 NCR 表示,因为所有空格(NCR 到 CR 除外)的规范化在所有属性中都是强制性的。

Note that this function itself does not perform any HTML/XML normalization of whitespaces, so it remains safe when generating the content of a text element (don't pass the second preserveCR parameter for such case).

请注意,此函数本身不会对空格执行任何 HTML/XML 规范化,因此在生成文本元素的内容时它仍然是安全的(对于这种情况,不要传递第二个 preserveCR 参数)。

So if you pass an optional second parameter (whose default will be treated as if it was false) and if that parameter evaluates as true, newlines will be preserved using this NCR, when you want to generate a literal attribute value, and this attribute is of type CDATA (for example a title="..." attribute) and not of type ID, IDLIST, NMTOKEN or NMTOKENS (for example an id="..." attribute).

因此,如果您传递可选的第二个参数(其默认值将被视为假)并且该参数的计算结果为真,则当您想要生成文字属性值时,将使用此 NCR 保留换行符,并且此属性是CDATA 类型(例如 title="..." 属性)而不是 ID、IDLIST、NMTOKEN 或 NMTOKENS 类型(例如 id="..." 属性)。

function quoteattr(s, preserveCR) {
    preserveCR = preserveCR ? '&#13;' : '\n';
    return ('' + s) /* Forces the conversion to string. */
        .replace(/&/g, '&amp;') /* This MUST be the 1st replacement. */
        .replace(/'/g, '&apos;') /* The 4 other predefined entities, required. */
        .replace(/"/g, '&quot;')
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;')
        /*
        You may add other replacements here for HTML only 
        (but it's not necessary).
        Or for XML, only if the named entities are defined in its DTD.
        */ 
        .replace(/\r\n/g, preserveCR) /* Must be before the next replacement. */
        .replace(/[\r\n]/g, preserveCR);
        ;
}

Warning!This function still does not check the source string (which is just, in Javascript, an unrestrictedstream of 16-bit code units) for its validity in a file that must be a valid plain text source and also as valid source for an HTML/XML document.

警告!此函数仍然不检查源字符串(在 Javascript 中,它只是16 位代码单元的不受限制的流)在文件中的有效性,该文件必须是有效的纯文本源,并且也是 HTML/ XML 文档。

  • It should be updated to detect and reject (by an exception):
    • any code units representing code points assigned to non-characters (like \uFFFE and \uFFFF): this is an Unicode requirement only for valid plain-texts;
    • any surrogate code units which are incorrectly paired to form a valid pair for an UTF-16-encoded code point: this is an Unicode requirement for valid plain-texts;
    • any valid pair of surrogate code units representing a valid Unicode code point in supplementary planes, but which is assigned to non-characters (like U+10FFFE or U+10FFFF): this is an Unicode requirement only for valid plain-texts;
    • most C0 and C1 controls (in the ranges \u0000..\u1F and \u007F..\u009F with the exception of TAB and newline controls): this is not an Unicode requirement but an additional requirement for valid HTML/XML.
  • Despite of this limitation, the code above is almost what you'll want to do. Normally. Modern javascript engine should provide this function natively in the default system object, but in most cases, it does not completely ensure the strict plain-text validity, not the HTML/XML validity. But the HTML/XML document object from which your Javascript code will be called, should redefine this native function.
  • This limitation is usually not a problem in most cases, because the source string are the result of computing from sources strings coming from the HTML/XML DOM.
  • But this may fail if the javascript extract substrings and break pairs of surrogates, or if it generates text from computed numeric sources (converting any 16-bit code value into a string containing that one-code unit, and appending those short strings, or inserting these short strings via replacement operations): if you try to insert the encoded string into a HTML/XML DOM text element or in an HTML/XML attribute value or element name, the DOM will itself reject this insertion and will throw an exception; if your javascript inserts the resulting string in a local binary file or sends it via a binary network socket, there will be no exception thrown for this emission. Such non-plain text strings would also be the result of reading from a binary file (such as an PNG, GIF or JPEG image file) or from your javascript reading from a binary-safe network socket (such that the IO stream passes 16-bit code units rather than just 8-bit units: most binary I/O streams are byte-based anyway, and text I/O streams need that you specify a charset to decode files into plain-text, so that invalid encodings found in the text stream will throw an I/O exception in your script).
  • 应该更新它以检测和拒绝(通过异常):
    • 任何表示分配给非字符的代码点的代码单元(如 \uFFFE 和 \uFFFF):这是仅适用于有效纯文本的 Unicode 要求;
    • 任何错误配对以形成 UTF-16 编码代码点的有效对的代理代码单元:这是有效纯文本的 Unicode 要求;
    • 任何代表补充平面中有效 Unicode 代码点的有效代理代码单元对,但分配给非字符(如 U+10FFFE 或 U+10FFFF):这是仅适用于有效纯文本的 Unicode 要求;
    • 大多数 C0 和 C1 控件(在 \u0000..\u1F 和 \u007F..\u009F 范围内,TAB 和换行控件除外):这不是 Unicode 要求,而是对有效 HTML/XML 的附加要求。
  • 尽管有这个限制,上面的代码几乎就是你想要做的。一般。现代 javascript 引擎应该在默认的系统对象中原生提供这个功能,但在大多数情况下,它并不能完全保证严格的纯文本有效性,而不是 HTML/XML 有效性。但是将调用您的 Javascript 代码的 HTML/XML 文档对象应该重新定义这个本机函数。
  • 在大多数情况下,此限制通常不是问题,因为源字符串是根据来自 HTML/XML DOM 的源字符串计算得出的结果。
  • 但是,如果 javascript 提取子字符串并打破代理对,或者如果它从计算数字源生成文本(将任何 16 位代码值转换为包含该单一代码单元的字符串,并附加这些短字符串,或插入这些短字符串通过替换操作):如果您尝试将编码后的字符串插入到 HTML/XML DOM 文本元素或 HTML/XML 属性值或元素名称中,DOM 本身将拒绝此插入并抛出异常;如果您的 javascript 将生成的字符串插入本地二进制文件或通过二进制网络套接字发送它,则不会为此发射抛出异常。这种非纯文本字符串也可能是从二进制文件(例如 PNG、

Note that this function, the way it is implemented (if it is augmented to correct the limitations noted in the warning above), can be safely used as well to quote also the content of a literal text element in HTML/XML (to avoid leaving some interpretable HTML/XML elements from the source string value), not just the content of a literal attribute value ! So it should be better named quoteml(); the name quoteattr()is kept only by tradition.

请注意,这个函数,它的实现方式(如果它被增强以纠正上面警告中指出的限制),也可以安全地用于引用 HTML/XML 中的文字文本元素的内容(以避免离开来自源字符串值的一些可解释的 HTML/XML 元素),而不仅仅是文字属性值的内容!所以它应该更好地命名quoteml();这个名字quoteattr()只为传统保留。

This is the case in your example:

在您的示例中就是这种情况:

data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = '';
row += '<tr>';
row += '<td>Name</td>';
row += '<td><input value="' + quoteattr(data.value) + '" /></td>';
row += '</tr>';

Alternative to quoteattr(), using only the DOM API:

替代quoteattr(),仅使用 DOM API:

The alternative, if the HTML code you generate will be part of the current HTML document, is to create each HTML element individually, using the DOM methods of the document, such that you can set its attribute values directly through the DOM API, instead of inserting the full HTML content using the innerHTML property of a single element :

另一种方法是,如果您生成的 HTML 代码将成为当前 HTML 文档的一部分,则使用文档的 DOM 方法单独创建每个 HTML 元素,以便您可以直接通过 DOM API 设置其属性值,而不是使用单个元素的 innerHTML 属性插入完整的 HTML 内容:

data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = document.createElement('tr');
var cell = document.createElement('td');
cell.innerText = 'Name';
row.appendChild(cell);
cell = document.createElement('td');
var input = document.createElement('input');
input.setAttribute('value', data.value);
cell.appendChild(input);
tr.appendChild(cell);
/*
The HTML code is generated automatically and is now accessible in the
row.innerHTML property, which you are not required to insert in the
current document.

But you can continue by appending tr into a 'tbody' element object, and then
insert this into a new 'table' element object, which ou can append or insert
as a child of a DOM object of your document.
*/

Note that this alternative does not attempt to preserve newlines present in the data.value, becase you're generating the content of a text element, not an attribute value here. If you really want to generate an attribute value preserving newlines using &#13;, see the start of section 1, and the code within quoteattr()above.

请注意,此替代方法不会尝试保留 data.value 中存在的换行符,因为您正在生成文本元素的内容,而不是此处的属性值。如果您真的想使用 生成一个保留换行符的属性值&#13;,请参阅第 1 节的开头和quoteattr()上面的代码。

2. The escape()function for embedding into a javascript/JSON literal string:

2.escape()嵌入javascript/JSON文字串的函数:

In other cases, you'll use the escape()function below when the intent is to quote a string that will be part of a generated javascript code fragment, that you also want to be preserved (that may optionally also be first parsed by an HTML/XML parser in which a larger javascript code could be inserted):

在其他情况下,escape()当意图引用一个字符串时,您将使用下面的函数,该字符串将成为生成的 javascript 代码片段的一部分,您也希望保留该字符串(也可以选择首先由 HTML/XML 解析)可以插入更大的 javascript 代码的解析器):

function escape(s) {
    return ('' + s) /* Forces the conversion to string. */
        .replace(/\/g, '\\') /* This MUST be the 1st replacement. */
        .replace(/\t/g, '\t') /* These 2 replacements protect whitespaces. */
        .replace(/\n/g, '\n')
        .replace(/\u00A0/g, '\u00A0') /* Useful but not absolutely necessary. */
        .replace(/&/g, '\x26') /* These 5 replacements protect from HTML/XML. */
        .replace(/'/g, '\x27')
        .replace(/"/g, '\x22')
        .replace(/</g, '\x3C')
        .replace(/>/g, '\x3E')
        ;
}

Warning!This source code does not check for the validity of the encoded document as a valid plain-text document. However it should neverraise an exception (except for out of memory condition): Javascript/JSON source strings are just unrestricted streams of 16-bit code units and do not need to be valid plain-text or are not restricted by HTML/XML document syntax. This means that the code is incomplete, and should also replace:

警告!此源代码不检查编码文档作为有效纯文本文档的有效性。但是,它永远不应该引发异常(内存不足情况除外):Javascript/JSON 源字符串只是不受限制的 16 位代码单元流,不需要是有效的纯文本或不受 HTML/XML 文档限制句法。这意味着代码不完整,还应该替换:

  • all other code units representing C0 and C1 controls (with the exception of TAB and LF, handled above, but that may be left intact without substituting them) using the \xNN notation;
  • all code units that are assigned to non-characters in Unicode, which should be replaced using the \uNNNN notation (for example \uFFFE or \uFFFF);
  • all code units usable as Unicode surrogates in the range \uD800..\DFFF, like this:
    • if they are not correctly paired into a valid UTF-16 pair representing a valid Unicode code point in the full range U+0000..U+10FFFF, these surrogate code units should be individually replaced using the notation \uDNNN;
    • else if if the code point that the code unit pair represents is not valid in Unicode plain-text, because the code point is assigned to a non-character, the two code points should be replaced using the notation \U00NNNNNN;
  • finally, if the code point represented by the code unit (or the pair of code units representing a code point in a supplementary plane), independantly of if that code point is assigned or reserved/unassigned, is also invalid in HTML/XML source documents (see their specification), the code point should be replaced using the \uNNNN notation (if the code point is in the BMP) or the \u00NNNNNN (if the code point is in a supplementary plane) ;
  • 使用 \xNN 表示法表示 C0 和 C1 控件的所有其他代码单元(除了上面处理的 TAB 和 LF,但可以保持不变而不替换它们);
  • 在 Unicode 中分配给非字符的所有代码单元,应使用 \uNNNN 符号替换(例如 \uFFFE 或 \uFFFF);
  • \uD800..\DFFF 范围内可用作 Unicode 代理的所有代码单元,如下所示:
    • 如果它们没有正确配对成代表完整范围 U+0000..U+10FFFF 中有效 Unicode 代码点的有效 UTF-16 对,则应使用符号 \uDNNN 单独替换这些代理代码单元;
    • 否则,如果代码单元对表示的代码点在 Unicode 纯文本中无效,因为代码点被分配给非字符,则应使用符号 \U00NNNNNN 替换这两个代码点;
  • 最后,如果由代码单元表示的代码点(或表示补充平面中的代码点的一对代码单元),独立于该代码点是否已分配或保留/未分配,在 HTML/XML 源文档中也是无效的(请参阅他们的规范),应使用 \uNNNN 表示法(如果代码点在 BMP 中)或 \u00NNNNNN(如果代码点在补充平面中)替换代码点;

Note also that the 5 last replacements are not really necessary. But it you don't include them, you'll sometimes need to use the <![CDATA[ ... ]]>compatibility "hack" in some cases, such as further including the generated javascript in HTML or XML (see the example below where this "hack" is used in a <script>...</script>HTML element).

另请注意,最后 5 个替换并不是真正必要的。但是如果你不包括它们,你有时需要<![CDATA[ ... ]]>在某些情况下使用兼容性“hack”,例如在 HTML 或 XML 中进一步包含生成的 javascript(请参阅下面的示例,其中在<script>...</script>HTML 元素)。

The escape()function has the advantage that it does notinsert any HTML/XML character reference, the result will be first interpreted by Javascript and it will keep later at runtime the exact string length when the resulting string will be evaluated by the javascript engine. It saves you from having to manage mixed contextthroughout your application code (see the final section about them and about the related security considerations). Notably because if you use quoteattr()in this context, the javascript evaluated and executed later would have to explicitty handle character references to redecode them, something that would not be appropriate. Usage cases include:

escape()功能的优点,它并没有将任何HTML / XML字符引用,结果将首先由Java脚本解释,它会在运行时以后保持精确的字符串长度时,得到的字符串将通过JavaScript引擎进行评估。它使您不必在整个应用程序代码中管理混合上下文(请参阅有关它们和相关安全注意事项的最后一节)。值得注意的是,如果您quoteattr()在此上下文中使用,稍后评估和执行的 javascript 将必须明确处理字符引用以重新解码它们,这是不合适的。使用案例包括:

  1. when the replaced string will be inserted in a generated javascript event handler surrounded by some other HTML code where the javascript fragment will contain attributes surrounded by literal quotes).
  2. when the replaced string will be part of a settimeout() parameter which will be later eval()ed by the Javascript engine.
  1. 当替换的字符串将插入生成的 javascript 事件处理程序中时,该事件处理程序由一些其他 HTML 代码包围,其中 javascript 片段将包含由文字引号包围的属性)。
  2. 当替换的字符串将成为 settimeout() 参数的一部分时,该参数将在稍后由 Javascript 引擎进行 eval() 处理。

Example 1 (generating only JavaScript, no HTML content generated):

示例 1(仅生成 JavaScript,不生成 HTML 内容):

var title = "It's a \"title\"!";
var msg   = "Both strings contain \"quotes\" & 'apostrophes'...";
setTimeout(
    '__forceCloseDialog("myDialog", "' +
        escape(title) + '", "' +
        escape(msg) + '")',
    2000);

Exemple 2 (generating valid HTML):

示例 2(生成有效的 HTML):

var msg =
    "It's just a \"sample\" <test>.\n\tTry & see yourself!";
/* This is similar to the above, but this JavaScript code will be reinserted below: */ 
var scriptCode =
    'alert("' +
    escape(msg) + /* important here!, because part of a JS string literal */
    '");';

/* First case (simple when inserting in a text element): */
document.write(
    '<script type="text/javascript">' +
    '\n//<![CDATA[\n' + /* (not really necessary but improves compatibility) */
    scriptCode +
    '\n//]]>\n' +       /* (not really necessary but improves compatibility) */
    '</script>');

/* Second case (more complex when inserting in an HTML attribute value): */
document.write(
    '<span onclick="' +
    quoteattr(scriptCode) + /* important here, because part of an HTML attribute */
    '">Click here !</span>');

In this second example, you see that both encoding functions are simultaneouslyused on the part of the generated text that is embedded in JavasSript literals (using escape()), with the the generated JavaScript code (containing the generated string literal) being itself embedded again and reencoded using quoteattr(), because that JavaScript code is inserted in an HTML attribute (in the second case).

在第二个示例中,您会看到两个编码函数同时用于嵌入 JavasSript 文字的生成文本部分(使用escape()),生成的 JavaScript 代码(包含生成的字符串文字)本身再次嵌入并重新编码使用quoteattr(),因为 JavaScript 代码插入到 HTML 属性中(在第二种情况下)。

3. General considerations for safely encoding texts to embed in syntaxic contexts:

3. 安全编码文本以嵌入句法上下文的一般注意事项:

So in summary,

所以总结一下,

  • the quotattr()function must be used when generating the contant of an HTML/XML attribute literal, where the surrounding quotes are added externallywithin a concatenation to produce a complete HTML/XML code.
  • the escape()function must be used when generating the content of a JavaScript string constant literal, where the surrounding quotes are added externallywithin a concatenation to produce a complete HTML/XML code.
  • If used carefully, and everywhereyou will find variable contents to safely insert into another context, and under only these rules (with the functions implemented exactly like above which takes care of "special characters" used in both contexts), you may mix both via multiple escaping, and the transform will still be safe, and will not require additional code to decode them in the application using those literals. Do notuse these functions.
  • quotattr()在生成HTML/XML 属性文字的常量时必须使用该函数,其中将周围的引号从外部添加到串联中以生成完整的 HTML/XML 代码。
  • escape()在生成JavaScript 字符串常量文字的内容时必须使用该函数,其中将周围的引号从外部添加到串联中以生成完整的 HTML/XML 代码。
  • 如果小心使用,并且在任何地方您都会发现变量内容可以安全地插入到另一个上下文中,并且仅在这些规则下(使用与上面完全相同的函数来处理两个上下文中使用的“特殊字符”),您可以通过以下方式混合两者多次转义,并且转换仍然是安全的,并且不需要额外的代码来使用这些文字在应用程序中对它们进行解码。千万不能使用这些功能。

Those functions are only safe in those strict contexts (i.e. onlyHTML/XML attribute values for quoteattr(), and onlyJavascript string literals for escape()).

这些功能仅在那些严格的环境(即安全为HTML / XML属性值quoteattr(),并且针对JavaScript字符串文字escape())。

There are other contexts using different quoting and escaping mechanisms (e.g. SQL string literals, or Visual Basic string literals, or regular expression literals, or text fields of CSV datafiles, or MIME header values), which will eachrequire their owndistinct escaping function used onlyin these contexts:

还有其他上下文使用不同的引用和转义机制(例如 SQL 字符串文字,或 Visual Basic 字符串文字,或正则表达式文字,或 CSV 数据文件的文本字段,或 MIME 标头值),每个都需要使用自己独特的转义函数在这些情况下:

  • Never assume that quoteattr()or escape()will be safe or will not alter the semantic of the escaped string, before checking first, that the syntax of (respectively) HTML/XML attribute values or JavaScript string litterals will be natively understood and supported in those contexts.
  • For example the syntax of Javascript string literals generated by escape()is also appropriate and natively supported in the two other contexts of string literals used in Java programming source code, or text values in JSON data.
  • 在首先检查(分别)HTML/XML 属性值或 JavaScript 字符串文字的语法将在这些上下文中被本地理解和支持之前,永远不要假设quoteattr()escape()将是安全的或不会改变转义字符串的语义。
  • 例如,escape()在 Java 编程源代码中使用的字符串文字或 JSON 数据中的文本值的其他两个上下文中,由 生成的 Javascript 字符串文字的语法也是适当的并且本机支持。

But the reverse is notalways true. For example:

但反过来并不总是正确的。例如:

  • Interpreting the encoded escaped literals initially generated for other contexts than Javascript string literals (including for example string literals in PHP source code), is not always safe for direct use as Javascript literals. through the javascript eval()system function to decode those generated string literals that were not escaped using escape(), because those other string literals may contain other special characters generated specificly to those other initial contexts, which will be incorrectly interpreted by Javascript, this could include additionnal escapes such as "\Uxxxxxxxx", or "\e", or "${var}" and "$$", or the inclusion of additional concatenation operators such as ' + "which changes the quoting style, or of "transparent" delimiters, such as "<!--" and "-->" or "<[DATA[" and "]]>" (that may be found and safe within a different only complex context supporting multiple escaping syntaxes: see below the last paragraph of this section about mixed contexts).
  • The same will apply to the interpretation/decoding of encoded escaped literals that were initially generated for other contexts that HTML/XML attributes values in documents created using their standard textual representation (for example, trying to interpret the string literals that were generated for embedding in a non standard binary format representation of HTML/XML documents!)
  • This will also apply to the interpretation/decoding with the javascript function eval()of string literals that were only safely generated for inclusion in HTML/XML attribute literals using quotteattr(), which will notbe safe, because the contexts have been incorrectly mixed.
  • This will also apply to the interpretation/decoding with an HTML/XML text document parser of attribute value literals that were only safely generated for inclusion in a Javascript string literal using escape(), which will notbe safe, because the contexts have also been incorrectly mixed.
  • 解释最初为 Javascript 字符串文字以外的其他上下文生成的编码转义文字(包括例如 PHP 源代码中的字符串文字),对于直接用作 Javascript 文字并不总是安全的。通过 javascripteval()系统函数解码那些未使用 转义的生成的字符串文字escape(),因为那些其他字符串文字可能包含专门为其他初始上下文生成的其他特殊字符,这些字符将被 Javascript 错误解释,这可能包括额外的转义,例如“ \Uxxxxxxxx”,或“ \e”,或“ ${var}”和“ $$”,或包含其他连接运算符,例如' + "更改引用样式或“透明”分隔符,<!---->" 或 " <[DATA[" 和 " ]]>"(可以在支持多种转义语法的不同复杂上下文中找到并且安全:请参阅本节关于混合上下文的最后一段)。
  • 这同样适用于最初为其他上下文生成的编码转义文字的解释/解码,这些上下文是使用标准文本表示创建的文档中的 HTML/XML 属性值(例如,尝试解释为嵌入而生成的字符串文字) HTML/XML 文档的非标准二进制格式表示!)
  • 这也将适用于解释/使用JavaScript函数解码eval()/ XML属性文字使用包含在HTML只安全地生成的字符串字面量quotteattr(),这将不会是安全的,因为环境已经被错误地混合。
  • 这也将适用于解释/与属性值文字的HTML / XML文本文档解析器被纳入只有安全地生成一个JavaScript字符串字面使用的解码escape(),这将不会是安全的,因为背景也被不正确地混合。

4. Safely decoding the value of embedded syntaxic literals:

4. 安全解码嵌入的语法文字的值:

If you want to decode or interpret string literalsin contexts were the decoded resulting string valueswill be used interchangeably and undistinctly without change in another context, so called mixed contexts(including, for example: naming some identifiers in HTML/XML with string literals initially dafely encoded with quotteattr(); naming some programming variables for Javascript from strings initially safely encoded with escape(); and so on...), you'll need to prepare and use a new escaping function (which will also check the validity of the string value before encoding it, or reject it, or truncate/simplify/filter it), as well as a new decoding function (which will also carefully avoid interpreting valid but unsafe sequences, only accepted internally but not acceptable for unsafe external sources, which also means that decoding function such as eval()in javascript mustbe absolutely avoided for decoding JSON data sources, for which you'll need to use a safer native JSON decoder; a native JSON decoder will not be interpreting valid Javascript sequences, such as the inclusion of quoting delimiters in the literal expression, operators, or sequences like "{$var}"), to enforce the safety of such mapping!

如果您想在上下文中解码或解释字符串文字,解码的结果字符串值将在另一个上下文中互换且不区分地使用而不会改变,所谓的混合上下文(包括,例如:最初使用字符串文字命名 HTML/XML 中的某些标识符dafely 编码为quotteattr(); 从最初安全编码的字符串中命名一些用于 Javascript 的编程变量escape(); 等等...),您还需要准备并使用一个新的转义函数(它还将在编码之前检查字符串值的有效性,或者拒绝它,或者截断/简化/过滤它),以及作为一个新的解码函数(它也会小心地避免解释有效但不安全的序列,只在内部接受但不接受不安全的外部源,这也意味着解码 JSON 数据源必须绝对避免使用eval()javascript中的解码函数,对于其中您需要使用更安全的原生 JSON 解码器;原生 JSON 解码器不会解释有效的 Javascript 序列,例如在文字表达式、运算符或序列(如“ ”)中包含引号分隔符,以强制执行这样的映射!{$var}

These last considerations about the decoding of literals in mixed contexts, that were only safely encoded with any syntax for the transport of data to be safe only a a more restrictive single context, is absolutely critical for the security of your application or web service. Never mix those contexts between the encoding place and the decoding place, if those places do not belong to the samesecurity realm (but even in that case, using mixed contexts is always very dangerous, it is very difficult to track precisely in your code.

关于在混合上下文中解码文字的最后考虑,仅使用任何语法安全编码以确保数据传输安全,仅适用于更严格的单一上下文,对于您的应用程序或 Web 服务的安全性绝对至关重要。切勿在编码位置和解码位置之间混合这些上下文,如果这些位置不属于同一安全领域(但即使在这种情况下,使用混合上下文始终非常危险,很难在代码中进行精确跟踪。

For this reason I recommend you neveruse or assume mixed contextsanywhere in your application: instead write a safe encoding and decoding function for a single precide context that has precise length and validity rules on the decoded string values, and precise length and validity rules on the encoded string string literals. Ban those mixed contexts: for each changeof context, use another matching pair of encoding/decoding functions (which function is used in this pair depends on which context is embedded in the other context; and the pair of matching functions is also specific to each pair of contexts).

出于这个原因,我建议您永远不要在应用程序中的任何地方使用或假设混合上下文:而是为单个精确上下文编写安全的编码和解码函数,该函数对解码的字符串值具有精确的长度和有效性规则,并在编码的字符串字符串文字。禁止那些混合上下文:对于上下文的每次变化,使用另一对匹配的编码/解码函数(这对中使用哪个函数取决于另一个上下文中嵌入的上下文;并且匹配函数对也特定于每个一对上下文)。

This means that:

这意味着:

  • To safely decode an HTML/XML attribute value literal that has been initially encoded with quoteattr(), you must '''not''' assume that it has been encoded using other named entities whose value will depend on a specific DTD defining it. You mustinstead initialize the HTML/XML parser to support onlythe few default named character entities generated by quoteattr()and optionally the numeric character entities (which are also safe is such context: the quoteattr()function only generates a few of them but could generate more of these numeric character references, but must notgenerate other named character entities which are not predefined in the default DTD). Allother named entities mustbe rejected by your parser, as being invalid in the source string literal to decode. Alternatively you'll get better performance by defining an unquoteattrfunction (which will reject any presence of literal quotes within the source string, as well as unsupported named entities).
  • To safely decode a Javascript string literal (or JSON string literal) that has been initially encoded with escape(), you mustuse the safe JavaScript unescape()function, but notthe unsafe Javascript eval()function!
  • 要安全地解码最初用 编码的 HTML/XML 属性值文字quoteattr(),您必须“不”假定它已使用其他命名实体编码,其值将取决于定义它的特定 DTD。您必须改为初始化 HTML/XML 解析器以支持由quoteattr()数字字符实体和可选数字字符实体生成的少数默认命名字符实体(这些上下文也是安全的:该quoteattr()函数仅生成其中的几个,但可以生成更多这些数字字符实体)字符引用,但不得生成其他未在默认 DTD 中预定义的命名字符实体)。所有其他命名实体必须被解析器拒绝,因为在要解码的源字符串文字中无效。或者,您可以通过定义一个unquoteattr函数来获得更好的性能(它将拒绝源字符串中任何文字引号的存在,以及不受支持的命名实体)。
  • 要安全地解码最初用 编码的 Javascript 字符串文字(或 JSON 字符串文字)escape(),您必须使用安全的 JavaScriptunescape()函数,而不是不安全的 Javascripteval()函数!

Examples for these two associated safe decoding functions follow.

下面是这两个相关安全解码功能的示例。

5. The unquoteattr()function to parse text embedded in HTML/XML text elements or attribute values literals:

5.unquoteattr()解析嵌入在 HTML/XML 文本元素或属性值文字中的文本的函数:

function unquoteattr(s) {
    /*
    Note: this can be implemented more efficiently by a loop searching for
    ampersands, from start to end of ssource string, and parsing the
    character(s) found immediately after after the ampersand.
    */
    s = ('' + s); /* Forces the conversion to string type. */
    /*
    You may optionally start by detecting CDATA sections (like
    `<![CDATA[` ... `]]>`), whose contents must not be reparsed by the
    following replacements, but separated, filtered out of the CDATA
    delimiters, and then concatenated into an output buffer.
    The following replacements are only for sections of source text
    found *outside* such CDATA sections, that will be concatenated
    in the output buffer only after all the following replacements and
    security checkings.

    This will require a loop starting here.

    The following code is only for the alternate sections that are
    not within the detected CDATA sections.
    */
    /* Decode by reversing the initial order of replacements. */
    s = s
        .replace(/\r\n/g, '\n') /* To do before the next replacement. */ 
        .replace(/[\r\n]/, '\n')
        .replace(/&#13;&#10;/g, '\n') /* These 3 replacements keep whitespaces. */
        .replace(/&#1[03];/g, '\n')
        .replace(/&#9;/g, '\t')
        .replace(/&gt;/g, '>') /* The 4 other predefined entities required. */
        .replace(/&lt;/g, '<')
        .replace(/&quot;/g, '"')
        .replace(/&apos;/g, "'")
        ;
    /*
    You may add other replacements here for predefined HTML entities only 
    (but it's not necessary). Or for XML, only if the named entities are
    defined in *your* assumed DTD.
    But you can add these replacements only if these entities will *not* 
    be replaced by a string value containing *any* ampersand character.
    Do not decode the '&amp;' sequence here !

    If you choose to support more numeric character entities, their
    decoded numeric value *must* be assigned characters or unassigned
    Unicode code points, but *not* surrogates or assigned non-characters,
    and *not* most C0 and C1 controls (except a few ones that are valid
    in HTML/XML text elements and attribute values: TAB, LF, CR, and
    NL='\x85').

    If you find valid Unicode code points that are invalid characters
    for XML/HTML, this function *must* reject the source string as
    invalid and throw an exception.

    In addition, the four possible representations of newlines (CR, LF,
    CR+LF, or NL) *must* be decoded only as if they were '\n' (U+000A).

    See the XML/HTML reference specifications !
    */
    /* Required check for security! */
    var found = /&[^;]*;?/.match(s);
    if (found.length >0 && found[0] != '&amp;')
        throw 'unsafe entity found in the attribute literal content';
     /* This MUST be the last replacement. */
    s = s.replace(/&amp;/g, '&');
    /*
    The loop needed to support CDATA sections will end here.
    This is where you'll concatenate the replaced sections (CDATA or
    not), if you have splitted the source string to detect and support
    these CDATA sections.

    Note that all backslashes found in CDATA sections do NOT have the
    semantic of escapes, and are *safe*.

    On the opposite, CDATA sections not properly terminated by a
    matching `]]>` section terminator are *unsafe*, and must be rejected
    before reaching this final point.
    */
    return s;
}

Note that this function does notparse the surrounding quote delimiters which are used to surround HTML attribute values. This function can in fact decode any HTML/XML text element content as well, possibly containing literal quotes, which are safe. It's your reponsability of parsing the HTML code to extract quoted strings used in HTML/XML attributes, and to strip those matching quote delimiters before calling the unquoteattr()function.

请注意,此功能对不能解析它们用于环绕HTML属性值周围的引号分隔符。这个函数实际上也可以解码任何 HTML/XML 文本元素内容,可能包含文字引号,这是安全的。您有责任解析 HTML 代码以提取 HTML/XML 属性中使用的带引号的字符串,并在调用unquoteattr()函数之前去除那些匹配的引号分隔符。

6. The unescape()function to parse text contents embedded in Javascript/JSON literals:

6 unescape()、解析嵌入在Javascript/JSON文字中的文本内容的函数:

function unescape(s) {
    /*
    Note: this can be implemented more efficiently by a loop searching for
    backslashes, from start to end of source string, and parsing and
    dispatching the character found immediately after the backslash, if it
    must be followed by additional characters such as an octal or
    hexadecimal 7-bit ASCII-only encoded character, or an hexadecimal Unicode
    encoded valid code point, or a valid pair of hexadecimal UTF-16-encoded
    code units representing a single Unicode code point.

    8-bit encoded code units for non-ASCII characters should not be used, but
    if they are, they should be decoded into a 16-bit code units keeping their
    numeric value, i.e. like the numeric value of an equivalent Unicode
    code point (which means ISO 8859-1, not Windows 1252, including C1 controls).

    Note that Javascript or JSON does NOT require code units to be paired when
    they encode surrogates; and Javascript/JSON will also accept any Unicode
    code point in the valid range representable as UTF-16 pairs, including
    NULL, all controls, and code units assigned to non-characters.
    This means that all code points in \U00000000..\U0010FFFF are valid,
    as well as all 16-bit code units in \u0000..\uFFFF, in any order.
    It's up to your application to restrict these valid ranges if needed.
    */
    s = ('' + s) /* Forces the conversion to string. */
    /* Decode by reversing the initial order of replacements */
        .replace(/\x3E/g, '>')
        .replace(/\x3C/g, '<')
        .replace(/\x22/g, '"')
        .replace(/\x27/g, "'")
        .replace(/\x26/g, '&') /* These 5 replacements protect from HTML/XML. */
        .replace(/\u00A0/g, '\u00A0') /* Useful but not absolutely necessary. */
        .replace(/\n/g, '\n')
        .replace(/\t/g, '\t') /* These 2 replacements protect whitespaces. */
        ;
    /*
    You may optionally add here support for other numerical or symbolic
    character escapes.
    But you can add these replacements only if these entities will *not* 
    be replaced by a string value containing *any* backslash character.
    Do not decode to any doubled backslashes here !
    */
    /* Required check for security! */
    var found = /\[^\])?/.match(s);
    if (found.length > 0 && found[0] != '\\')
        throw 'Unsafe or unsupported escape found in the literal string content';
    /* This MUST be the last replacement. */
    return s.replace(/\\/g, '\');
}

Note that this function does notparse the surrounding quote delimiters which are used to surround Javascript or JSON string litterals. It's your reponsability of parsing the Javascript or JSON source code to extract quoted strings literals, and to strip those matching quote delimiters before calling the unescape()function.

请注意,此功能对不能解析它们用于环绕Javascript或JSON字符串litterals周围引号分隔符。您有责任解析 Javascript 或 JSON 源代码以提取带引号的字符串文字,并在调用unescape()函数之前去除那些匹配的引号分隔符。

回答by Shadow2531

" = &quot; or &#34;

' = &#39;

Examples:

例子:

<div attr="Tim &quot;The Toolman&quot; Taylor"
<div attr='Tim "The Toolman" Taylor'
<div attr="Tim 'The Toolman' Taylor"
<div attr='Tim &#39;The Toolman&#39; Taylor'

In JavaScript strings, you use \ to escape the quote character:

在 JavaScript 字符串中,您使用 \ 来转义引号字符:

var s = "Tim \"The Toolman\" Taylor";
var s = 'Tim \'The Toolman\' Taylor';

So, quote your attribute values with " and use a function like this:

因此,用 " 引用您的属性值并使用如下函数:

function escapeAttrNodeValue(value) {
    return value.replace(/(&)|(")|(\u00A0)/g, function(match, amp, quote) {
        if (amp) return "&amp;";
        if (quote) return "&quot;";
        return "&nbsp;";
    });
}

回答by JLarky

My answer is partially based on Andy E and I still recommend reading what verdy_p wrote, but here it is

我的回答部分基于 Andy E,我仍然建议阅读 verdy_p 所写的内容,但这里是

$("<a>", { href: 'very<script>\'b"ad' }).text('click me')[0].outerHTML

Disclaimer: this is answer not to exact question, but just "how to escape attribute"

免责声明:这不是对确切问题的回答,而只是“如何转义属性”

回答by Lucio

Using Lodash:

使用 Lodash:

const serialised = _.escape("Here's a string that could break HTML");

// Add it into data-attr in HTML
<a data-value-serialised=" + serialised + " onclick="callback()">link</a>
// and then at JS where this value will be read:
function callback(e) {  
    $(e.currentTarget).data('valueSerialised'); // with a bit of help from jQuery

    const originalString = _.unescape(serialised); // can be used as part of a payload or whatever.
}

回答by Daniel Sokolowski

The given answers seem rather complicated, so for my use case I have tried the built in encodeURIComponentand decodeURIComponentand have found they worked well, as per comments this does not escape 'but for that you can use escape()and unescape()methods instead.

给定的答案似乎相当复杂,因此对于我的用例,我尝试了内置encodeURIComponentdecodeURIComponent发现它们运行良好,根据评论,这并没有逃脱,'但为此您可以使用escape()unescape()方法。

回答by vietean

I think you could do:

我认为你可以这样做:

var row = "";
row += "<tr>";
row += "<td>Name</td>";
row += "<td><input value=\""+data.name+"\"/></td>";
row += "</tr>";

If you are worried about in data.namewhich is existing single quote.

如果您担心data.name其中存在单引号。

In best case, you could create an INPUTelement then setValue(data.name)for it.

在最好的情况下,您可以为它创建一个INPUT元素setValue(data.name)