Javascript 清理 HTML 输入值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29410746/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sanitizing HTML input value
提问by KaekeaSchmear
Do you have to convert anything besides the quotes (") to (") inside of:
除了引号 ( ") 到 ( ")之外,您是否必须将以下内容转换为:
<input type="text" value="$var">
<input type="text" value="$var">
I personally do not see how you can possibly break out of that without using " on*=....
我个人认为不使用" on*=....
Is this correct?
这样对吗?
Edit: Apparently some people think my question is too vague;
编辑:显然有些人认为我的问题太含糊了;
<input type="text" value="<script>alert(0)</script>">does not execute. Thus, making it impossible to break out of using without the usage of ".
<input type="text" value="<script>alert(0)</script>">不执行。因此,如果不使用".
Is this correct?
这样对吗?
回答by ircmaxell
There really are two questions that you're asking (or at least can be interpreted):
您确实要问两个问题(或者至少可以解释):
Can the quoted
valueattribute ofinput[type="text"]be injected if quotes are disallowed?Can an arbitrary quoted attribute of an element be injected if quotes are disallowed.
如果不允许引用
value,是否input[type="text"]可以注入引用的属性?如果不允许引用,可以注入元素的任意引用属性。
The second is trivially demonstrated by the following:
第二个是通过以下简单的证明:
<a href="javascript:alert(1234);">Foo</a>
Or
或者
<div onmousemove="alert(123);">...
The first is a bit more complicated.
第一个有点复杂。
HTML5
HTML5
According to the HTML5 spec:
根据HTML5 规范:
Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand.
属性值是文本和字符引用的混合,除了文本不能包含不明确的&符号的附加限制。
Which is further refined in quoted attributes to:
在引用的属性中进一步细化为:
The attribute name, followed by zero or more space characters, followed by a single U+003D EQUALS SIGN character, followed by zero or more space characters, followed by a single """ (U+0022) character, followed by the attribute value, which, in addition to the requirements given above for attribute values, must not contain any literal U+0022 QUOTATION MARK characters ("), and finally followed by a second single """ (U+0022) character.
属性名称,后跟零个或多个空格字符,后跟单个 U+003D EQUALS SIGN 字符,后跟零个或多个空格字符,后跟单个 """ (U+0022) 字符,后跟属性值,除了上面对属性值给出的要求之外,不得包含任何文字 U+0022 QUOTATION MARK 字符 ("),最后跟第二个单个 """ (U+0022) 字符。
So in short, any character except an "ambiguous ampersand" (&[a-zA-Z0-9]+;when the result is not a valid character reference) and a quote character is valid inside of an attribute.
简而言之,除了“模棱两可的&符号”(&[a-zA-Z0-9]+;当结果不是有效的字符引用时)和引号字符之外的任何字符在属性内都是有效的。
HTML 4.01
HTML 4.01
HTML 4.01 is less descriptive than HTML5 about the syntax (one of the reasons HTML5 was created in the first place). However, it does say this:
在语法方面,HTML 4.01 的描述性不如 HTML5(这也是最初创建 HTML5 的原因之一)。但是,它确实是这样说的:
When script or style data is the value of an attribute (either style or the intrinsic event attributes), authors should escape occurrences of the delimiting single or double quotation mark within the value according to the script or style language convention. Authors should also escape occurrences of "&" if the "&" is not meant to be the beginning of a character reference.
当脚本或样式数据是属性的值(样式或内在事件属性)时,作者应根据脚本或样式语言约定对值中的分隔单引号或双引号进行转义。如果“&”不是字符引用的开头,作者还应该对出现的“&”进行转义。
Note, this is saying what an author should do, not what a parser should do. So a parser could technically accept or reject invalid input (or mangle it to be valid).
注意,这是说作者应该做什么,而不是解析器应该做什么。因此,解析器可以在技术上接受或拒绝无效输入(或将其修改为有效)。
XML 1.0
XML 1.0
The XML 1.0 Specdefines an attribute as:
的XML 1.0规格定义的属性为:
Attribute ::= Name Eq AttValue
属性 ::= 名称 Eq AttValue
where AttValueis defined as:
其中AttValue定义为:
AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
AttValue ::= '"' ([^<&"] | 参考)* '"' | "'" ([^<&'] | 参考)* "'"
The &is similar to the concept of an "ambiguous ampersand" from HTML5, however it's basically saying "any unencoded ampersand".
这&类似于 HTML5 中“模棱两可的&符号”的概念,但它基本上是说“任何未编码的&符号”。
Note though that it explicitly denies <from attribute values.
但请注意,它明确拒绝<了属性值。
So while HTML5 allows it, XML1.0 explicitly denies it.
因此,虽然 HTML5 允许它,但 XML1.0 明确拒绝它。
What Does It Mean
这是什么意思
It means that for a compliant and bug free parser, HTML5 will ignore <characters in an attribute, and XML will error.
这意味着对于兼容且<无错误的解析器,HTML5 将忽略属性中的字符,而 XML 将出错。
It also means that for a compliant and bug free parser, HTML 4.01 will behave in unspecified and potentially odd ways (since the specification doesn't detail the behavior).
这也意味着对于兼容且无错误的解析器,HTML 4.01 将以未指定且可能奇怪的方式运行(因为规范没有详细说明行为)。
And this gets down to the crux of the issue. In the past, HTML was such a loose spec, that every browser had slightly different rules for how it would deal with malformed html. Each would try to "fix" it, or "interpret" what you meant. So that means that while a HTML5 compliant browser wouldn't execute the JS in <input type="text" value="<script>alert(0)</script>">, there's nothing to say that a HTML 4.01 compliant browser wouldn't. And there's nothing to say that a bug may not exist in the XML or HTML5 parser that causes it to be executed (though that would be a pretty significant problem).
这归结为问题的症结所在。过去,HTML 是一个非常松散的规范,以至于每个浏览器对于如何处理格式错误的 html 的规则都略有不同。每个人都会尝试“修复”它,或“解释”你的意思。所以这意味着虽然符合 HTML5 的浏览器不会执行 JS 中的 JS <input type="text" value="<script>alert(0)</script>">,但没有什么可以说符合 HTML 4.01 的浏览器不会。并且没有什么可以说 XML 或 HTML5 解析器中可能不存在导致它被执行的错误(尽管这将是一个非常重要的问题)。
THATis why OWASP (and most security experts) recommend you encode either all non-alpha-numeric characters or &<"inside of an attribute value. There's no cost in doing so, only the added security of knowinghow the browser's parser will interpret the value.
这就是为什么 OWASP(和大多数安全专家)建议您对所有非字母数字字符或&<"属性值内部进行编码。这样做没有任何成本,只是增加了了解浏览器解析器将如何解释该值的安全性。
Do you haveto? no. But defense in depth suggests that, since there's no cost to doing so, the potential benefit is worth it.
你必须吗?不。但纵深防御表明,因为这样做没有成本,潜在的好处是值得的。
回答by Adrian Cid Almaguer
When users submit data, you need to make sure that they've provided something you expect.
当用户提交数据时,您需要确保他们提供了您期望的内容。
For example, if you expect a number, make sure the submitted data is a number. You can also cast user data into other types. Everything submitted is initially treated like a string, so forcing known-numeric data into being an integer or float makes sanitization fast and painless.
例如,如果您需要一个数字,请确保提交的数据是一个数字。您还可以将用户数据转换为其他类型。提交的所有内容最初都被视为字符串,因此将已知数字数据强制转换为整数或浮点数可以使清理快速而轻松。
You need to make sure that fields that should not have any HTML content do not actually contain HTML. There are different ways in you can deal with this problem.
您需要确保不应包含任何 HTML 内容的字段实际上不包含 HTML。有不同的方法可以解决这个问题。
You can try escaping HTML input with htmlspecialchars. You should not use htmlentitiesto neutralize HTML, as it will also perform encoding of accented and other characters that it thinks also need to be encoded.
您可以尝试使用htmlspecialchars转义 HTML 输入。您不应该使用htmlentities来中和 HTML,因为它还会执行重音符号和它认为也需要编码的其他字符的编码。
You can try removing any possible HTML. strip_tagsis quick and easy, but also sloppy. HTML Purifierdoes a much more thorough job of both stripping out all HTML and also allowing a selective whitelist of tags and attributes through.
您可以尝试删除任何可能的 HTML。strip_tags既快速又简单,但也很草率。HTML Purifier在去除所有 HTML 和允许选择性的标签和属性白名单通过方面做得更彻底。
You can use the OWASP PHP Filters. They're really simple to use and effective.
您可以使用OWASP PHP 过滤器。它们使用起来非常简单且有效。
You can use the filter extension, which provides a comprehensive way to sanitize user input.
您可以使用过滤器扩展,它提供了一种全面的方式来清理用户输入。
Examples
例子
the below code will remove all HTML tags from a string:
以下代码将从字符串中删除所有 HTML 标签:
$string = "<h1>Hello, World!</h1>";
$new_string = filter_var($string, FILTER_SANITIZE_STRING);
// $new_string is now "Hello, World!"
The below code will ensure the value of the variable is a valid IP address:
下面的代码将确保变量的值是一个有效的 IP 地址:
$ip = "127.0.0.1";
$valid_ip = filter_var($ip, FILTER_VALIDATE_IP);
// $valid_ip is TRUE
$ip = "127.0.1.1.1.1";
$valid_ip = filter_var($ip, FILTER_VALIDATE_IP);
// $valid_ip is FALSE
Sanitizing and validating email addresses:
清理和验证电子邮件地址:
<?php
$a = '[email protected]';
$b = 'bogus - at - example dot org';
$c = '([email protected])';
$sanitized_a = filter_var($a, FILTER_SANITIZE_EMAIL);
if (filter_var($sanitized_a, FILTER_VALIDATE_EMAIL)) {
echo "This (a) sanitized email address is considered valid.\n";
}
$sanitized_b = filter_var($b, FILTER_SANITIZE_EMAIL);
if (filter_var($sanitized_b, FILTER_VALIDATE_EMAIL)) {
echo "This sanitized email address is considered valid.";
} else {
echo "This (b) sanitized email address is considered invalid.\n";
}
$sanitized_c = filter_var($c, FILTER_SANITIZE_EMAIL);
if (filter_var($sanitized_c, FILTER_VALIDATE_EMAIL)) {
echo "This (c) sanitized email address is considered valid.\n";
echo "Before: $c\n";
echo "After: $sanitized_c\n";
}
?>
Reference:
参考:
What are the best PHP input sanitizing functions?
http://code.tutsplus.com/tutorials/sanitize-and-validate-data-with-php-filters--net-2595
http://code.tutsplus.com/tutorials/sanitize-and-validate-data-with-php-filters--net-2595
回答by bigbobr
If your question is "what types of xss-attacks are possible" then you better google it. I'll just leavev some examples of why you should sanitize your inputs
如果你的问题是“什么类型的 xss-attacks 是可能的”,那么你最好谷歌一下。我会留下一些例子来说明为什么你应该清理你的输入
If input is generated by
echo '<input type="text" value="$var">', then simple'breaks it.If input is plain HTML in PHP page then
value=<?php deadly_php_script ?>breaks itIf this is plain HTML input in HTML file - then converting doublequotes should be enough.
如果输入是由 生成的
echo '<input type="text" value="$var">',那么 simple'会破坏它。如果输入是 PHP 页面中的纯 HTML,则将其
value=<?php deadly_php_script ?>中断如果这是 HTML 文件中的纯 HTML 输入 - 那么转换双引号就足够了。
Although, converting other special symbols (like <, >and so on) is a good practice. Inputs are made to input info that would be stored on server\transferred into another page\script, so you need to check what could break those files. Let's say we have this setup:
虽然,转换其他特殊符号(如<,>等等)是一种很好的做法。对输入信息进行输入,这些信息将存储在服务器上\转移到另一个页面\脚本中,因此您需要检查什么可能破坏这些文件。假设我们有这个设置:
index.html:
索引.html:
<form method=post action=getinput.php>
<input type="text" name="xss">
<input type="submit"></form>
<form method=post action=getinput.php>
<input type="text" name="xss">
<input type="submit"></form>
getinput.php:
获取输入.php:
echo $_POST['xss'];
echo $_POST['xss'];
Input value ;your_deadly_php_scriptbreaks it totally (you can also sanitize server-side in that case)
输入值;your_deadly_php_script完全破坏它(在这种情况下,您也可以清理服务器端)
If that's not enough - provide more info on your question, add more examples of your code.
如果这还不够 - 提供有关您的问题的更多信息,请添加更多代码示例。
回答by webternals
I believe the person is referring to cross site scripting attacks. They tagged this as php, security, and xss
我相信此人指的是跨站点脚本攻击。他们将其标记为 php、security 和 xss
take for example
以
<input type="text" value=""><script>alert(0)</script><"">
The above code will execute the alert box code;
以上代码将执行警报框代码;
<?php $var= "\"><script>alert(0)</script><\""; ?>
<input type="text" value="<?php echo $var ?>">
This will also execute the alert box. To solve this you need to escape ", < >, and a few more to be safe. PHP has a couple of functions worth looking into and each have their ups and downs!
这也将执行警报框。要解决这个问题,您需要转义 "、< > 和其他几个以确保安全。PHP 有几个值得研究的函数,每个函数都有其起伏!
htmlentities() - Convert all applicable characters to HTML entities
htmlspecialchars() - Convert special characters to HTML entities
get_html_translation_table() - Returns the translation table used by htmlspecialchars and htmlentities
urldecode() - Decodes URL-encoded string
What you have to be careful of is that you are passing in a variable and there ways to create errors and such to cause it to break out. Your best bet is to make sure that data is not formatted in an executable manner in case of errors. But you are right if they are no quotes you can't break out but there are ways you or I don't understand at this point that will allow that to happen.
您必须小心的是,您正在传递一个变量,并且有多种方法可以创建错误,从而导致它爆发。最好的办法是确保数据没有以可执行的方式格式化以防出错。但是你是对的,如果它们不是你不能爆发的引号,但是在这一点上,你或我不明白这将允许这种情况发生。
回答by mathieu
$var = "><script>alert(0);</script>would work... If you can close the quotes you can then close the tag and open another one... But I think you are right, without closing the quotes no injection is possible...
$var ="><script>alert(0);</script>会工作...如果你可以关闭引号,你可以关闭标签并打开另一个......但我认为你是对的,不关闭引号就不可能注入......

