php FILTER_SANITIZE_STRING 有什么作用?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23392128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 16:41:45  来源:igfitidea点击:

What does FILTER_SANITIZE_STRING do?

phpsanitization

提问by rr-

There's like a million Q&A that explain the options like FILTER_FLAG_STRIP_LOW, but what does FILTER_SANITIZE_STRINGdo on its own, without any options? Does it just filter tags?

有一百万个问答可以解释诸如 之类的选项FILTER_FLAG_STRIP_LOW,但是如果FILTER_SANITIZE_STRING没有任何选项,它自己会做什么?它只是过滤标签吗?

回答by rr-

According to PHP Manual:

根据PHP 手册

Strip tags, optionally strip or encode special characters.

剥离标签,可选择剥离或编码特殊字符。

According to W3Schools:

根据W3Schools 的说法:

The FILTER_SANITIZE_STRINGfilter strips or encodes unwanted characters.

This filter removes data that is potentially harmful for your application. It is used to strip tags and remove or encode unwanted characters.

The FILTER_SANITIZE_STRING过滤器去除或编码不需要的字符。

此过滤器会删除可能对您的应用程序有害的数据。它用于剥离标签并删除或编码不需要的字符。

Now, that doesn't tell us much. Let's go see some PHP sources.

现在,这并没有告诉我们太多。让我们去看看一些 PHP 源代码。

ext/filter/filter.c:

ext/filter/filter.c

static const filter_list_entry filter_list[] = {                                       
    /*...*/
    { "string",          FILTER_SANITIZE_STRING,        php_filter_string          },  
    { "stripped",        FILTER_SANITIZE_STRING,        php_filter_string          },  
    { "encoded",         FILTER_SANITIZE_ENCODED,       php_filter_encoded         },  
    /*...*/

Now, let's go see how php_filter_stringis defined.
ext/filter/sanitizing_filters.c:

现在,让我们看看php_filter_string是如何定义的。
ext/filter/sanitizing_filters.c

/* {{{ php_filter_string */
void php_filter_string(PHP_INPUT_FILTER_PARAM_DECL)
{
    size_t new_len;
    unsigned char enc[256] = {0};

    /* strip high/strip low ( see flags )*/
    php_filter_strip(value, flags);

    if (!(flags & FILTER_FLAG_NO_ENCODE_QUOTES)) {
        enc['\''] = enc['"'] = 1;
    }
    if (flags & FILTER_FLAG_ENCODE_AMP) {
        enc['&'] = 1;
    }
    if (flags & FILTER_FLAG_ENCODE_LOW) {
        memset(enc, 1, 32);
    }
    if (flags & FILTER_FLAG_ENCODE_HIGH) {
        memset(enc + 127, 1, sizeof(enc) - 127);
    }

    php_filter_encode_html(value, enc);

    /* strip tags, implicitly also removes 
var_dump(filter_var("yo", FILTER_SANITIZE_STRING, FILTER_FLAG_EMPTY_STRING_NULL));
var_dump(filter_var("
string(2) "yo"
NULL
string(2) "yo"
string(0) ""
", FILTER_SANITIZE_STRING, FILTER_FLAG_EMPTY_STRING_NULL)); var_dump(filter_var("yo", FILTER_SANITIZE_STRING)); var_dump(filter_var("
curl --data-urlencode 'my-input='\
'1. ASCII b/n 32 and 127: ABC abc 012 '\
'2. ASCII higher than 127: ?üé '\
'3. PHP tag: <?php $i = 0; ?> '\
'4. HTML tag: <script type="text/javascript">var i = 0;</script> '\
'5. Ampersand: & '\
'6. Backtick: ` '\
'7. Double quote: " '\
'8. Single quote: '"'" \
http://localhost/sanitize.php
", FILTER_SANITIZE_STRING));
chars */ new_len = php_strip_tags_ex(Z_STRVAL_P(value), Z_STRLEN_P(value), NULL, NULL, 0, 1); Z_STRLEN_P(value) = new_len; if (new_len == 0) { zval_dtor(value); if (flags & FILTER_FLAG_EMPTY_STRING_NULL) { ZVAL_NULL(value); } else { ZVAL_EMPTY_STRING(value); } return; } }

I'll skip commenting flags since they're already explained on the Internet, like you said, and focus on what is alwaysperformed instead, which is not so well documented.

我将跳过评论标志,因为它们已经在 Internet 上进行了解释,就像您说的那样,而是专注于始终执行的内容,而这并没有很好的文档记录。

First - php_filter_strip. It doesn't do much, just takes the flags you pass to the function and processes them accordingly. It does the well-documented stuff.

首先- php_filter_strip。它没有做太多事情,只是获取您传递给函数的标志并相应地处理它们。它做有据可查的事情。

Then we construct some kind of map and call php_filter_encode_html. It's more interesting: it converts stuff like ", ', &and chars with their ASCII codes lower than 32 and higher than 127 to HTML entities, so &in your string becomes &#38;. Again, it uses flags for this.

然后我们构造某种地图并调用php_filter_encode_html. 它更有趣:它转换的东西一样"'&并与他们的ASCII码字符低于32和高于127为HTML实体,所以&在你的字符串变成&#38;。同样,它为此使用标志。

Then we get call to php_strip_tags_ex, which just strips HTML, XML and PHP tags (according to its definition in /ext/standard/string.c) and removes NULL bytes, like the comment says.

然后我们调用php_strip_tags_ex,它只是去除 HTML、XML 和 PHP 标签(根据其在 中的定义/ext/standard/string.c)并删除 NULL 字节,如注释所述。

The code that follows it is used for internal string management and doesn't really do any sanitization. Well, not exactly - passing undocumented flag FILTER_FLAG_EMPTY_STRING_NULLwill return NULLif the sanitized string is empty, instead of returning just an empty string, but it's not really that much useful. An example:

后面的代码用于内部字符串管理,并没有真正进行任何清理。嗯,不完全是 -如果经过清理的字符串为空,则传递未记录的标志FILTER_FLAG_EMPTY_STRING_NULL将返回NULL,而不是仅返回一个空字符串,但它实际上并没有多大用处。一个例子:

##代码##

##代码##

There isn't much more going on, so the manual was fairly correct - to sum it up:

没有更多的事情发生,所以手册是相当正确的 - 总结一下:

  • Always: strip HTML, XML and PHP tags, strip NULL bytes.
  • FILTER_FLAG_NO_ENCODE_QUOTES- This flag does not encode quotes.
  • FILTER_FLAG_STRIP_LOW- Strip characters with ASCII value below 32.
  • FILTER_FLAG_STRIP_HIGH- Strip characters with ASCII value above 127.
  • FILTER_FLAG_ENCODE_LOW- Encode characters with ASCII value below 32.
  • FILTER_FLAG_ENCODE_HIGH- Encode characters with ASCII value above 127.
  • FILTER_FLAG_ENCODE_AMP- Encode the & character to &#38;(not &amp;).
  • FILTER_FLAG_EMPTY_STRING_NULL- Return NULLinstead of empty strings.
  • 始终:去除 HTML、XML 和 PHP 标签,去除 NULL 字节。
  • FILTER_FLAG_NO_ENCODE_QUOTES- 此标志不编码引号。
  • FILTER_FLAG_STRIP_LOW- 去除 ASCII 值低于 32 的字符。
  • FILTER_FLAG_STRIP_HIGH- 去除 ASCII 值大于 127 的字符。
  • FILTER_FLAG_ENCODE_LOW- 对 ASCII 值低于 32 的字符进行编码。
  • FILTER_FLAG_ENCODE_HIGH- 对 ASCII 值大于 127 的字符进行编码。
  • FILTER_FLAG_ENCODE_AMP- 将 & 字符编码为&#38;(不是&amp;)。
  • FILTER_FLAG_EMPTY_STRING_NULL- 返回NULL而不是空字符串。

回答by Jan ?ankowski

I wasn't sure if "stripping tags" means just the <>characters, and if it preserves content between tags, e.g. the string "Hello!" from <b>Hello!</b>, so I decided to check. Here are the results, using PHP 7.1.5 (and Bash for the command line):

我不确定“剥离标签”是否仅表示<>字符,以及是否保留标签之间的内容,例如字符串“Hello!” 来自<b>Hello!</b>,所以我决定检查一下。以下是结果,使用 PHP 7.1.5(命令行使用 Bash):

##代码##
    • sanitize.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING);
    • output: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: &#34; 8. Single quote: &#39;
    • sanitize.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_NO_ENCODE_QUOTES);
    • output: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
    • sanitize.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH);
    • output: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: &#34; 8. Single quote: &#39;
    • sanitize.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_BACKTICK);
    • output: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: 7. Double quote: &#34; 8. Single quote: &#39;
    • sanitize.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_HIGH);
    • output: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: &#195;&#135;&#195;&#188;&#195;&#169; 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: &#34; 8. Single quote: &#39;
    • sanitize.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_AMP);
    • output: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: &#38; 6. Backtick: ` 7. Double quote: &#34; 8. Single quote: &#39;
    • 消毒.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING);
    • 输出: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: &#34; 8. Single quote: &#39;
    • 消毒.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_NO_ENCODE_QUOTES);
    • 输出: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '
    • 消毒.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH);
    • 输出: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: &#34; 8. Single quote: &#39;
    • 消毒.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_BACKTICK);
    • 输出: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: 7. Double quote: &#34; 8. Single quote: &#39;
    • 消毒.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_HIGH);
    • 输出: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: &#195;&#135;&#195;&#188;&#195;&#169; 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: &#34; 8. Single quote: &#39;
    • 消毒.php: <?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_AMP);
    • 输出: 1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: ?üé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: &#38; 6. Backtick: ` 7. Double quote: &#34; 8. Single quote: &#39;

Also, for the flags FILTER_FLAG_STRIP_LOW & FILTER_FLAG_ENCODE_LOW, since my Bash doesn't display these characters, I checked using the bell character (, ASCII 007) and Restman Chrome extension that:

此外,对于标志 FILTER_FLAG_STRIP_LOW 和 FILTER_FLAG_ENCODE_LOW,由于我的 Bash 不显示这些字符,我使用钟形字符 (?, ASCII 007) 和 Restman Chrome 扩展程序进行了检查:

  • without either of these flags, the character is preserved
  • with FILTER_FLAG_STRIP_LOW, it is removed
  • with FILTER_FLAG_ENCODE_LOW, it is encoded to &#7;
  • 如果没有这些标志中的任何一个,字符将被保留
  • 使用 FILTER_FLAG_STRIP_LOW,它被删除
  • 使用 FILTER_FLAG_ENCODE_LOW,它被编码为 &#7;