正则表达式清理 (PHP)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3022185/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 08:27:15  来源:igfitidea点击:

Regular Expression Sanitize (PHP)

phpregexpreg-replacesanitization

提问by Atif Mohammed Ameenuddin

I would like to sanitize a string in to a URL so this is what I basically need.

我想将一个字符串清理到一个 URL 中,所以这就是我基本上需要的。

  1. Everything must be removed except alphanumeric characters and spaces and dashed.
  2. Spaces should be converter into dashes.
  1. 除了字母数字字符和空格和虚线外,所有内容都必须删除。
  2. 空格应转换为破折号。

Eg.

例如。

This, is the URL!

must return

必须返回

this-is-the-url

回答by SilentGhost

function slug($z){
    $z = strtolower($z);
    $z = preg_replace('/[^a-z0-9 -]+/', '', $z);
    $z = str_replace(' ', '-', $z);
    return trim($z, '-');
}

回答by Rooneyl

First strip unwanted characters

首先去除不需要的字符

$new_string = preg_replace("/[^a-zA-Z0-9\s]/", "", $string);

Then changes spaces for unserscores

然后为 unsercores 更改空格

$url = preg_replace('/\s/', '-', $new_string);

Finally encode it ready for use

最后编码它准备使用

$new_url = urlencode($url);

回答by user1484291

This will do it in a Unix shell (I just tried it on my MacOS):

这将在 Unix shell 中完成(我刚刚在我的 MacOS 上尝试过):

$ tr -cs A-Za-z '-' < infile.txt > outfile.txt

I got the idea from a blog post on More Shell, Less Egg

我从一篇关于更多壳,更少鸡蛋的博客文章中得到了这个想法

回答by Abhishek Goel

Try This

尝试这个

 function clean($string) {
       $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
       $string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.

       return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
    }

Usage:

用法:

echo clean('a|"bc!@£de^&$f g');

Will output: abcdef-g

将输出: abcdef-g

source : https://stackoverflow.com/a/14114419/2439715

来源:https: //stackoverflow.com/a/14114419/2439715

回答by Denis Matafonov

All previous asnwers deal with url, but in case some one will need to sanitize string for login (e.g.) and keep it as text, here is you go:

所有以前的 asnwers 都处理 url,但如果有人需要清理登录字符串(例如)并将其保留为文本,那么您可以这样做:

function sanitizeText($str) {
    $withSpecCharacters = htmlspecialchars($str);
    $splitted_str = str_split($str);
    $result = '';
    foreach ($splitted_str as $letter){
        if (strpos($withSpecCharacters, $letter) !== false) {
            $result .= $letter;
        }
    }
    return $result;
}

echo sanitizeText('ОРРииыфвсси ajvnsakjvnHB "&nvsp;\n" <script>alert()</script>');
//ОРРииыфвсси ajvnsakjvnHB &nvsp;\n scriptalert()/script
//No injections possible, all info at max keeped

回答by DjimOnDev

You should use the slugify package and not reinvent the wheel ;)

您应该使用 slugify 包而不是重新发明轮子;)

https://github.com/cocur/slugify

https://github.com/cocur/slugify

回答by Adeel Raza Azeemi

The following will replace spaces with dashes.

以下将用破折号替换空格。

$str = str_replace(' ', '-', $str);

Then the following statement will remove everything except alphanumeric characters and dashed. (didn't have spaces because in previous step we had replaced them with dashes.

然后以下语句将删除除字母数字字符和虚线以外的所有内容。(没有空格,因为在上一步中我们用破折号替换了它们。

// Char representation     0 -  9   A-   Z   a-   z  -    
$str = preg_replace('/[^\x30-\x39\x41-\x5A\x61-\x7A\x2D]/', '', $str);

Which is equivalent to

这相当于

$str = preg_replace('/[^0-9A-Za-z-]+/', '', $str);

FYI: To remove all special characters from a string use

仅供参考:要从字符串中删除所有特殊字符,请使用

$str = preg_replace('/[^\x20-\x7E]/', '', $str); 

\x20 is hexadecimal for space that is start of Acsii charecter and \x7E is tilde. As accordingly to wikipedia https://en.wikipedia.org/wiki/ASCII#Printable_characters

\x20 是 Acsii 字符开头的空间的十六进制,而 \x7E 是波浪号。根据维基百科https://en.wikipedia.org/wiki/ASCII#Printable_characters

FYI: look into the Hex Column for the interval 20-7E

仅供参考:查看间隔 20-7E 的十六进制列

Printable characters Codes 20hex to 7Ehex, known as the printable characters, represent letters, digits, punctuation marks, and a few miscellaneous symbols. There are 95 printable characters in total.

可打印字符 代码 20hex 到 7Ehex,称为可打印字符,代表字母、数字、标点符号和一些杂项符号。共有 95 个可打印字符。