Javascript 使用 RegExp 删除所有特殊字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4374822/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 12:09:00  来源:igfitidea点击:

Remove all special characters with RegExp

javascriptregexspecial-characters

提问by Timothy Ruhle

I would like a RegExp that will remove all special characters from a string. I am trying something like this but it doesn't work in IE7, though it works in Firefox.

我想要一个从字符串中删除所有特殊字符的 RegExp。我正在尝试这样的东西,但它在 IE7 中不起作用,尽管它在 Firefox 中有效。

var specialChars = "!@#$^&%*()+=-[]\/{}|:<>?,.";

for (var i = 0; i < specialChars.length; i++) {
  stringToReplace = stringToReplace.replace(new RegExp("\" + specialChars[i], "gi"), "");
}

A detailed description of the RegExp would be helpful as well.

RegExp 的详细描述也会有所帮助。

回答by annakata

var desired = stringToReplace.replace(/[^\w\s]/gi, '')

As was mentioned in the comments it's easier to do this as a whitelist - replace the characters which aren'tin your safelist.

正如评论中提到的那样,将其作为白名单更容易 - 替换不在安全列表中的字符。

The caret (^) character is the negation of the set [...], gisay global and case-insensitive (the latter is a bit redundant but I wanted to mention it) and the safelist in this example is digits, word characters, underscores (\w) and whitespace (\s).

插入符号 ( ^) 字符是 set 的否定[...]gi比如全局和不区分大小写的(后者有点多余,但我想提一下),本例中的安全列表是数字、单词字符、下划线 ( \w) 和空格 ( \s)。

回答by noinput

Note that if you still want to exclude a set, including things like slashes and special characters you can do the following:

请注意,如果您仍想排除一个集合,包括斜杠和特殊字符之类的内容,您可以执行以下操作:

var outString = sourceString.replace(/[`~!@#$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\/]/gi, '');

take special note that in order to also include the "minus" character, you need to escape it with a backslash like the latter group. if you don't it will also select 0-9 which is probably undesired.

请特别注意,为了还包括“减号”字符,您需要像后一组一样使用反斜杠对其进行转义。如果你不这样做,它也会选择 0-9 这可能是不受欢迎的。

回答by freedev

Plain Javascript regex does not handle Unicode letters.

纯 Javascript 正则表达式不处理 Unicode 字母

Do not use [^\w\s], this will remove letters with accents (like àèéìòù), not to mention to Cyrillic or Chinese, letters coming from such languages will be completed removed.

不要使用[^\w\s],这将删除带重音的字母(如 àèéìòù),更不用说西里尔文或中文了,来自此类语言的字母将被完全删除。

You really don't want remove these letters together with all the special characters. You have two chances:

您真的不想将这些字母与所有特殊字符一起删除。你有两个机会:

  • Add in your regex all the special characters you don't want remove,
    for example: [^èéòàùì\w\s].
  • Have a look at xregexp.com. XRegExp adds base support for Unicode matching via the \p{...}syntax.
  • 在您的正则表达式中添加您不想删除的所有特殊字符,
    例如:[^èéòàùì\w\s].
  • 看看xregexp.com。XRegExp 通过\p{...}语法添加了对 Unicode 匹配的基本支持。

var str = "?жак::: résd,$%& adùf"
var search = XRegExp('([^?<first>\pL ]+)');
var res = XRegExp.replace(str, search, '',"all");

console.log(res); // returns "?жак::: resd,adf"
console.log(str.replace(/[^\w\s]/gi, '') ); // returns " rsd adf"
console.log(str.replace(/[^\wèéòàùì\s]/gi, '') ); // returns " résd adùf"
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.js"></script>

回答by Seagull

The first solution does not work for any UTF-8 alphabet. (It will cut text such as ?жак). I have managed to create a function which does not use RegExp and use good UTF-8 support in the JavaScript engine. The idea is simple if a symbol is equal in uppercase and lowercase it is a special character. The only exception is made for whitespace.

第一个解决方案不适用于任何 UTF-8 字母表。(它会剪切文本,如 ?жак)。我设法创建了一个不使用 RegExp 并在 JavaScript 引擎中使用良好的 UTF-8 支持的函数。这个想法很简单,如果一个符号的大小写相等,那么它就是一个特殊字符。唯一的例外是空白。

function removeSpecials(str) {
    var lower = str.toLowerCase();
    var upper = str.toUpperCase();

    var res = "";
    for(var i=0; i<lower.length; ++i) {
        if(lower[i] != upper[i] || lower[i].trim() === '')
            res += str[i];
    }
    return res;
}

Update:Please note, that this solution works only for languages where there are small and capital letters. In languages like Chinese, this won't work.

更新:请注意,此解决方案仅适用于有小写和大写字母的语言。在像中文这样的语言中,这是行不通的。

Update 2:I came to the original solution when I was working on a fuzzy search. If you also trying to remove special characters to implement search functionality, there is a better approach. Use any transliteration librarywhich will produce you string only from Latin characters and then the simple Regexp will do all magic of removing special characters. (This will work for Chinese also and you also will receive side benefits by making Troms?== Tromso).

更新 2:当我进行模糊搜索时,我来到了原始解决方案。如果您还尝试删除特殊字符以实现搜索功能,则有更好的方法。使用任何可以仅从拉丁字符生成字符串的音译库,然后简单的 Regexp 将完成删除特殊字符的所有魔术。(这也适用于china人,你也可以通过制作Troms?==获得额外的好处Tromso)。

回答by millebii

I use RegexBuddy for debbuging my regexes it has almost all languages very usefull. Than copy/paste for the targeted language. Terrific tool and not very expensive.

我使用 RegexBuddy 调试我的正则表达式,它几乎对所有语言都非常有用。比复制/粘贴目标语言。很棒的工具,而且不是很贵。

So I copy/pasted your regex and your issue is that [,] are special characters in regex, so you need to escape them. So the regex should be : /!@#$^&%*()+=-[\x5B\x5D]\/{}|:<>?,./im

所以我复制/粘贴了你的正则表达式,你的问题是 [,] 是正则表达式中的特殊字符,所以你需要对它们进行转义。所以正则表达式应该是: /!@#$^&%*()+=-[\x5B\x5D]\/{}|:<>?,./im

回答by AnD

why dont you do something like:

你为什么不做这样的事情:

re = /^[a-z0-9 ]$/i;
var isValid = re.test(yourInput);

to check if your input contain any special char

检查您的输入是否包含任何特殊字符

回答by Eldar Mammadov

str.replace(/\s|[0-9_]|\W|[#$%^&*()]/g, "")I did sth like this. But there is some people who did it much easier like str.replace(/\W_/g,"");

str.replace(/\s|[0-9_]|\W|[#$%^&*()]/g, "")我是这样做的。但有些人做得更容易,比如str.replace(/\W_/g,"");