从 PHP 字符串中删除控制字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1497885/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 02:49:43  来源:igfitidea点击:

Remove control characters from PHP string

phpregexstring

提问by KB22

How can I remove control characters like STX from a PHP string? I played around with

如何从 PHP 字符串中删除 STX 等控制字符?我玩过

preg_replace("/[^a-zA-Z0-9 .\-_;!:?????üü?<>='\"]/","",$pString)

but found that it removed way to much. Is there a way to remove onlycontrol chars?

但发现它删除了很多。有没有办法删除 控制字符?

回答by Stephan202

If you mean by control characters the first 32 ascii characters and \x7F(that includes the carriage return, etc!), then this will work:

如果您的意思是控制字符前 32 个 ascii 字符和\x7F(包括回车等!),那么这将起作用:

preg_replace('/[\x00-\x1F\x7F]/', '', $input);

(Note the single quotes: with double quotes the use of \x00causes a parse error, somehow.)

(注意单引号:使用双引号\x00会导致解析错误,不知何故。)

The line feed and carriage return (often written \rand \n) may be saved from removal like so:

换行和回车(通常写成\r\n)可以像这样从删除中保存:

preg_replace('/[\x00-\x09\x0B\x0C\x0E-\x1F\x7F]/', '', $input);

I must say that I think Bobby's answeris better, in the sense that [:cntrl:]better conveys what the code does than [\x00-\x1F\x7F].

我必须说,我认为Bobby 的答案更好,因为[:cntrl:]它比[\x00-\x1F\x7F].

WARNING:ereg_replaceis deprecated in PHP >= 5.3.0 and removed in PHP >= 7.0.0!, please use preg_replaceinstead of ereg_replace:

警告:ereg_replace在 PHP >= 5.3.0 中已弃用,并在 PHP >= 7.0.0 中删除!,请使用preg_replace代替ereg_replace

preg_replace('/[[:cntrl:]]/', '', $input);

回答by Scott Jungwirth

For Unicode input, this will remove all control characters, unassigned, private use, formatting and surrogate code points (that are not also space characters, such as tab, new line) from your input text. I use this to remove all non-printable characters from my input.

对于 Unicode 输入,这将从您的输入文本中删除所有控制字符、未分配的、私人使用的、格式和代理代码点(不也是空格字符,例如制表符、换行符)。我用它从我的输入中删除所有不可打印的字符。

<?php
$clean = preg_replace('/[^\PC\s]/u', '', $input);

for more info on \p{C}see http://www.regular-expressions.info/unicode.html#category

有关更多信息,\p{C}请参见http://www.regular-expressions.info/unicode.html#category

回答by Bobby

PHP does support POSIX-Classes so you can use [:cntrl:]instead of some fancy character-magic-stuff:

PHP 确实支持 POSIX-Classes,因此您可以使用[:cntrl:]而不是一些花哨的字符魔术:

ereg_replace("[:cntrl:]", "", $pString);

Edit:

编辑:

A extra pair of square brackets might be needed in 5.3.

5.3 中可能需要一对额外的方括号。

ereg_replace("[[:cntrl:]]", "", $pString);

回答by Jamie

To keep the control characters but make them compatible for JSON, I had to to

为了保留控制字符但使它们与 JSON 兼容,我不得不

$str = preg_replace(
    array(
        '/\x00/', '/\x01/', '/\x02/', '/\x03/', '/\x04/',
        '/\x05/', '/\x06/', '/\x07/', '/\x08/', '/\x09/', '/\x0A/',
        '/\x0B/','/\x0C/','/\x0D/', '/\x0E/', '/\x0F/', '/\x10/', '/\x11/',
        '/\x12/','/\x13/','/\x14/','/\x15/', '/\x16/', '/\x17/', '/\x18/',
        '/\x19/','/\x1A/','/\x1B/','/\x1C/','/\x1D/', '/\x1E/', '/\x1F/'
    ), 
    array(
        "\u0000", "\u0001", "\u0002", "\u0003", "\u0004",
        "\u0005", "\u0006", "\u0007", "\u0008", "\u0009", "\u000A",
        "\u000B", "\u000C", "\u000D", "\u000E", "\u000F", "\u0010", "\u0011",
        "\u0012", "\u0013", "\u0014", "\u0015", "\u0016", "\u0017", "\u0018",
        "\u0019", "\u001A", "\u001B", "\u001C", "\u001D", "\u001E", "\u001F"
    ), 
    $str
);

(The JSON rules state: “All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).”)

(JSON 规则规定:“除了必须转义的字符:引号、反斜线和控制字符(U+0000 到 U+001F)之外,所有 Unicode 字符都可以放在引号内。”)

回答by Anthony

regex free method

正则表达式免费方法

If you are only zapping the control characters I'm familiar with (those under 32 and 127), try this out:

如果您只是更改我熟悉的控制字符(32 和 127 岁以下的字符),请尝试以下操作:

 for($control = 0; $control < 32; $control++) {
     $pString = str_replace(chr($control), "", $pString;
 }

$pString = str_replace(chr(127), "", $pString;

The loop gets rid of all but DEL, which we just add to the end.

循环去掉了除 DEL 之外的所有内容,我们只是将其添加到末尾。

I'm thinking this will be a lot less stressful on you and the script then dealing with regex and the regex library.

我认为这对你和脚本的压力会小很多,然后处理正则表达式和正则表达式库。

Updated regex free method

更新了正则表达式免费方法

Just for kicks, I came up with another way to do it. This one does it using an array of control characters:

只是为了踢球,我想出了另一种方法来做到这一点。这是使用一组控制字符来完成的:

$ctrls = range(chr(0), chr(31));
$ctrls[] = chr(127);

$clean_string = str_replace($ctrls, "", $string);