如何在 PHP 中编写正则表达式以删除特殊字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/745282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 23:43:37  来源:igfitidea点击:

How do I write a regex in PHP to remove special characters?

phpregex

提问by Ben McRae

I'm pretty new to PHP, and I noticed there are many different ways of handling regular expressions.

我对 PHP 还很陌生,我注意到有许多不同的处理正则表达式的方法。

This is what I'm currently using:

这是我目前使用的:

$replace = array(" ",".",",","'","@");
$newString = str_replace($replace,"_",$join);

$join = "the original string i'm parsing through";

I want to remove everything which isn't a-z, A-Z, or 0-9. I'm looking for a reverse function of the above. A pseudocode way to write it would be

我想删除所有不是 az、AZ 或 0-9 的内容。我正在寻找上述功能的反向功能。编写它的伪代码方法是

If characters in $join are not equal to a-z,A-Z,0-9 then change characters in $jointo "_"

如果 $join 中的字符不等于 az,AZ,0-9 则将字符更改$join"_"

回答by runfalk

$newString = preg_replace('/[^a-z0-9]/i', '_', $join);

This should do the trick.

这应该可以解决问题。

回答by Gavin Miller

The regular expression for anything which isn't a-z, A-Z, 0-9 is:

任何不是 az, AZ, 0-9 的正则表达式是:

preg_replace('/[^a-zA-Z0-9]/', "_", $join);

This is known as a Negated Character Class

这被称为否定字符类

回答by Powerlord

The easiest way is this:

最简单的方法是这样的:

preg_replace('/\W/', '_', $join);

\W is the non-word character group. A word character is a-z, A-Z, 0-9, and _. \W matches everything not previously mentioned*.

\W 是非单词字符组。单词字符是 az、AZ、0-9 和 _。\W 匹配前面没有提到的所有内容*。

Edit: preg uses Perl's regular expressions, documented in the perlman perlredocument.

编辑:preg 使用 Perl 的正则表达式,记录在perlman perlre文档中。

*Edit 2: This assumes a C or one of the English locales. Other locales may have accented letters in the word character class. The Unicode locales will only consider characters below code point 128 to be characters.

*编辑 2:这假定 C 或英语语言环境之一。其他语言环境可能在单词字符类中具有带重音的字母。Unicode 语言环境只会将代码点 128 以下的字符视为字符。