如何在 PHP 中编写正则表达式以删除特殊字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/745282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I write a regex in PHP to remove special characters?
提问by Ben McRae
I'm pretty new to PHP, and I noticed there are many different ways of handling regular expressions.
我对 PHP 还很陌生,我注意到有许多不同的处理正则表达式的方法。
This is what I'm currently using:
这是我目前使用的:
$replace = array(" ",".",",","'","@");
$newString = str_replace($replace,"_",$join);
$join = "the original string i'm parsing through";
I want to remove everything which isn't a-z, A-Z, or 0-9. I'm looking for a reverse function of the above. A pseudocode way to write it would be
我想删除所有不是 az、AZ 或 0-9 的内容。我正在寻找上述功能的反向功能。编写它的伪代码方法是
If characters in $join are not equal to a-z,A-Z,0-9 then change characters in
$jointo"_"
如果 $join 中的字符不等于 az,AZ,0-9 则将字符更改
$join为"_"
回答by runfalk
$newString = preg_replace('/[^a-z0-9]/i', '_', $join);
This should do the trick.
这应该可以解决问题。
回答by Gavin Miller
The regular expression for anything which isn't a-z, A-Z, 0-9 is:
任何不是 az, AZ, 0-9 的正则表达式是:
preg_replace('/[^a-zA-Z0-9]/', "_", $join);
This is known as a Negated Character Class
这被称为否定字符类
回答by Powerlord
The easiest way is this:
最简单的方法是这样的:
preg_replace('/\W/', '_', $join);
\W is the non-word character group. A word character is a-z, A-Z, 0-9, and _. \W matches everything not previously mentioned*.
\W 是非单词字符组。单词字符是 az、AZ、0-9 和 _。\W 匹配前面没有提到的所有内容*。
Edit: preg uses Perl's regular expressions, documented in the perlman perlredocument.
编辑:preg 使用 Perl 的正则表达式,记录在perlman perlre文档中。
*Edit 2: This assumes a C or one of the English locales. Other locales may have accented letters in the word character class. The Unicode locales will only consider characters below code point 128 to be characters.
*编辑 2:这假定 C 或英语语言环境之一。其他语言环境可能在单词字符类中具有带重音的字母。Unicode 语言环境只会将代码点 128 以下的字符视为字符。

