php FILTER_SANITIZE_STRING 有什么作用?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23392128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What does FILTER_SANITIZE_STRING do?
提问by rr-
There's like a million Q&A that explain the options like FILTER_FLAG_STRIP_LOW, but what does FILTER_SANITIZE_STRINGdo on its own, without any options? Does it just filter tags?
有一百万个问答可以解释诸如 之类的选项FILTER_FLAG_STRIP_LOW,但是如果FILTER_SANITIZE_STRING没有任何选项,它自己会做什么?它只是过滤标签吗?
回答by rr-
According to PHP Manual:
根据PHP 手册:
Strip tags, optionally strip or encode special characters.
剥离标签,可选择剥离或编码特殊字符。
According to W3Schools:
根据W3Schools 的说法:
The FILTER_SANITIZE_STRINGfilter strips or encodes unwanted characters.This filter removes data that is potentially harmful for your application. It is used to strip tags and remove or encode unwanted characters.
The FILTER_SANITIZE_STRING过滤器去除或编码不需要的字符。此过滤器会删除可能对您的应用程序有害的数据。它用于剥离标签并删除或编码不需要的字符。
Now, that doesn't tell us much. Let's go see some PHP sources.
现在,这并没有告诉我们太多。让我们去看看一些 PHP 源代码。
ext/filter/filter.c:
ext/filter/filter.c:
static const filter_list_entry filter_list[] = {
/*...*/
{ "string", FILTER_SANITIZE_STRING, php_filter_string },
{ "stripped", FILTER_SANITIZE_STRING, php_filter_string },
{ "encoded", FILTER_SANITIZE_ENCODED, php_filter_encoded },
/*...*/
Now, let's go see how php_filter_stringis defined.ext/filter/sanitizing_filters.c:
现在,让我们看看php_filter_string是如何定义的。ext/filter/sanitizing_filters.c:
/* {{{ php_filter_string */
void php_filter_string(PHP_INPUT_FILTER_PARAM_DECL)
{
size_t new_len;
unsigned char enc[256] = {0};
/* strip high/strip low ( see flags )*/
php_filter_strip(value, flags);
if (!(flags & FILTER_FLAG_NO_ENCODE_QUOTES)) {
enc['\''] = enc['"'] = 1;
}
if (flags & FILTER_FLAG_ENCODE_AMP) {
enc['&'] = 1;
}
if (flags & FILTER_FLAG_ENCODE_LOW) {
memset(enc, 1, 32);
}
if (flags & FILTER_FLAG_ENCODE_HIGH) {
memset(enc + 127, 1, sizeof(enc) - 127);
}
php_filter_encode_html(value, enc);
/* strip tags, implicitly also removes var_dump(filter_var("yo", FILTER_SANITIZE_STRING, FILTER_FLAG_EMPTY_STRING_NULL));
var_dump(filter_var("string(2) "yo"
NULL
string(2) "yo"
string(0) ""
", FILTER_SANITIZE_STRING, FILTER_FLAG_EMPTY_STRING_NULL));
var_dump(filter_var("yo", FILTER_SANITIZE_STRING));
var_dump(filter_var("curl --data-urlencode 'my-input='\
'1. ASCII b/n 32 and 127: ABC abc 012 '\
'2. ASCII higher than 127: ?üé '\
'3. PHP tag: <?php $i = 0; ?> '\
'4. HTML tag: <script type="text/javascript">var i = 0;</script> '\
'5. Ampersand: & '\
'6. Backtick: ` '\
'7. Double quote: " '\
'8. Single quote: '"'" \
http://localhost/sanitize.php
", FILTER_SANITIZE_STRING));
chars */
new_len = php_strip_tags_ex(Z_STRVAL_P(value), Z_STRLEN_P(value), NULL, NULL, 0, 1);
Z_STRLEN_P(value) = new_len;
if (new_len == 0) {
zval_dtor(value);
if (flags & FILTER_FLAG_EMPTY_STRING_NULL) {
ZVAL_NULL(value);
} else {
ZVAL_EMPTY_STRING(value);
}
return;
}
}
I'll skip commenting flags since they're already explained on the Internet, like you said, and focus on what is alwaysperformed instead, which is not so well documented.
我将跳过评论标志,因为它们已经在 Internet 上进行了解释,就像您说的那样,而是专注于始终执行的内容,而这并没有很好的文档记录。
First - php_filter_strip. It doesn't do much, just takes the flags you pass to the function and processes them accordingly. It does the well-documented stuff.
首先- php_filter_strip。它没有做太多事情,只是获取您传递给函数的标志并相应地处理它们。它做有据可查的事情。
Then we construct some kind of map and call php_filter_encode_html. It's more interesting: it converts stuff like ", ', &and chars with their ASCII codes lower than 32 and higher than 127 to HTML entities, so &in your string becomes &. Again, it uses flags for this.
然后我们构造某种地图并调用php_filter_encode_html. 它更有趣:它转换的东西一样",',&并与他们的ASCII码字符低于32和高于127为HTML实体,所以&在你的字符串变成&。同样,它为此使用标志。
Then we get call to php_strip_tags_ex, which just strips HTML, XML and PHP tags (according to its definition in /ext/standard/string.c) and removes NULL bytes, like the comment says.
然后我们调用php_strip_tags_ex,它只是去除 HTML、XML 和 PHP 标签(根据其在 中的定义/ext/standard/string.c)并删除 NULL 字节,如注释所述。
The code that follows it is used for internal string management and doesn't really do any sanitization. Well, not exactly - passing undocumented flag FILTER_FLAG_EMPTY_STRING_NULLwill return NULLif the sanitized string is empty, instead of returning just an empty string, but it's not really that much useful. An example:
后面的代码用于内部字符串管理,并没有真正进行任何清理。嗯,不完全是 -如果经过清理的字符串为空,则传递未记录的标志FILTER_FLAG_EMPTY_STRING_NULL将返回NULL,而不是仅返回一个空字符串,但它实际上并没有多大用处。一个例子:
→
→
##代码##There isn't much more going on, so the manual was fairly correct - to sum it up:
没有更多的事情发生,所以手册是相当正确的 - 总结一下:
- Always: strip HTML, XML and PHP tags, strip NULL bytes.
FILTER_FLAG_NO_ENCODE_QUOTES- This flag does not encode quotes.FILTER_FLAG_STRIP_LOW- Strip characters with ASCII value below 32.FILTER_FLAG_STRIP_HIGH- Strip characters with ASCII value above 127.FILTER_FLAG_ENCODE_LOW- Encode characters with ASCII value below 32.FILTER_FLAG_ENCODE_HIGH- Encode characters with ASCII value above 127.FILTER_FLAG_ENCODE_AMP- Encode the & character to&(not&).FILTER_FLAG_EMPTY_STRING_NULL- ReturnNULLinstead of empty strings.
- 始终:去除 HTML、XML 和 PHP 标签,去除 NULL 字节。
FILTER_FLAG_NO_ENCODE_QUOTES- 此标志不编码引号。FILTER_FLAG_STRIP_LOW- 去除 ASCII 值低于 32 的字符。FILTER_FLAG_STRIP_HIGH- 去除 ASCII 值大于 127 的字符。FILTER_FLAG_ENCODE_LOW- 对 ASCII 值低于 32 的字符进行编码。FILTER_FLAG_ENCODE_HIGH- 对 ASCII 值大于 127 的字符进行编码。FILTER_FLAG_ENCODE_AMP- 将 & 字符编码为&(不是&)。FILTER_FLAG_EMPTY_STRING_NULL- 返回NULL而不是空字符串。
回答by Jan ?ankowski
I wasn't sure if "stripping tags" means just the <>characters, and if it preserves content between tags, e.g. the string "Hello!" from <b>Hello!</b>, so I decided to check. Here are the results, using PHP 7.1.5 (and Bash for the command line):
我不确定“剥离标签”是否仅表示<>字符,以及是否保留标签之间的内容,例如字符串“Hello!” 来自<b>Hello!</b>,所以我决定检查一下。以下是结果,使用 PHP 7.1.5(命令行使用 Bash):
- sanitize.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING); - output:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
- sanitize.php:
- sanitize.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_NO_ENCODE_QUOTES); - output:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
- sanitize.php:
- sanitize.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH); - output:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
- sanitize.php:
- sanitize.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_BACKTICK); - output:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: 7. Double quote: " 8. Single quote: '
- sanitize.php:
- sanitize.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_HIGH); - output:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: Çüé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
- sanitize.php:
- sanitize.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_AMP); - output:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
- sanitize.php:
- 消毒.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING); - 输出:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
- 消毒.php:
- 消毒.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_NO_ENCODE_QUOTES); - 输出:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
- 消毒.php:
- 消毒.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH); - 输出:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
- 消毒.php:
- 消毒.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_BACKTICK); - 输出:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: 7. Double quote: " 8. Single quote: '
- 消毒.php:
- 消毒.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_HIGH); - 输出:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: Çüé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
- 消毒.php:
- 消毒.php:
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_AMP); - 输出:
1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
- 消毒.php:
Also, for the flags FILTER_FLAG_STRIP_LOW & FILTER_FLAG_ENCODE_LOW, since my Bash doesn't display these characters, I checked using the bell character (, ASCII 007) and Restman Chrome extension that:
此外,对于标志 FILTER_FLAG_STRIP_LOW 和 FILTER_FLAG_ENCODE_LOW,由于我的 Bash 不显示这些字符,我使用钟形字符 (?, ASCII 007) 和 Restman Chrome 扩展程序进行了检查:
- without either of these flags, the character is preserved
- with FILTER_FLAG_STRIP_LOW, it is removed
- with FILTER_FLAG_ENCODE_LOW, it is encoded to

- 如果没有这些标志中的任何一个,字符将被保留
- 使用 FILTER_FLAG_STRIP_LOW,它被删除
- 使用 FILTER_FLAG_ENCODE_LOW,它被编码为


