C# 如何从字符串中去除非字母数字字符(包括空格)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8779189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 04:23:59  来源:igfitidea点击:

How do I strip non-alphanumeric characters (including spaces) from a string?

c#asp.net.netregex

提问by James

How do I strip non alphanumeric characters from a string and loose spaces in C# with Replace?

如何使用替换从 C# 中的字符串和松散空格中去除非字母数字字符?

I want to keep a-z, A-Z, 0-9 and nothing more (not even " " spaces).

我想保留 az, AZ, 0-9 仅此而已(甚至不是“”空格)。

"Hello there(hello#)".Replace(regex-i-want, "");

should give

应该给

"Hellotherehello"

I have tried "Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", "");but the spaces remain.

我试过了,"Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", "");但空间仍然存在。

采纳答案by Tim Pietzcker

In your regex, you have excluded the spaces from being matched (and you haven't used Regex.Replace()which I had overlooked completely...):

在您的正则表达式中,您已经排除了匹配的空格(并且您还没有使用Regex.Replace()我完全忽略的空间......):

result = Regex.Replace("Hello there(hello#)", @"[^A-Za-z0-9]+", "");

should work. The +makes the regex a bit more efficient by matching more than one consecutive non-alphanumeric character at once instead of one by one.

应该管用。+通过一次匹配多个连续的非字母数字字符而不是一个一个匹配,这使正则表达式更有效一点。

If you want to keep non-ASCII letters/digits, too, use the following regex:

如果您也想保留非 ASCII 字母/数字,请使用以下正则表达式:

@"[^\p{L}\p{N}]+"

which leaves

哪个离开

BonjourmesélèvesGutenMorgenliebeSchüler

instead of

代替

BonjourmeslvesGutenMorgenliebeSchler

回答by Veronica

In .Net 4.0 you can use the IsNullOrWhitespace method of the String class to remove the so called white space characters. Please take a look here http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspxHowever as @CodeInChaos pointed there are plenty of characters which could be considered as letters and numbers. You can use a regular expression if you only want to find A-Za-z0-9.

在 .Net 4.0 中,您可以使用 String 类的 IsNullOrWhitespace 方法删除所谓的空白字符。请在这里查看http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspx但是正如@CodeInChaos 指出的那样,有很多字符可以被视为字母和数字。如果只想找到 A-Za-z0-9,可以使用正则表达式。

回答by Adrianne

Or you can do this too:

或者你也可以这样做:

    public static string RemoveNonAlphanumeric(string text)
    {
        StringBuilder sb = new StringBuilder(text.Length);

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                sb.Append(text[i]);
        }

        return sb.ToString();
    }

Usage:

用法:

string text = SomeClass.RemoveNonAlphanumeric("text LaLa (lol) á ? $ 123 ?????");

//text: textLaLalol123

回答by James

The mistake made above was using Replace incorrectly (it doesn't take regex, thanks CodeInChaos).

上面犯的错误是不正确地使用 Replace(它不需要正则表达式,感谢 CodeInChaos)。

The following code should do what was specified:

以下代码应执行指定的操作:

Regex reg = new Regex(@"[^\p{L}\p{N}]+");//Thanks to Tim Pietzcker for regex
string regexed = reg.Replace("Hello there(hello#)", "");

This gives:

这给出:

regexed = "Hellotherehello"

回答by K D

Use following regex to strip those all characters from the string using Regex.Replace

使用以下正则表达式从使用 Regex.Replace 的字符串中删除所有字符

([^A-Za-z0-9\s])

回答by Michel Bechelani

var text = "Hello there(hello#)";

var rgx = new Regex("[^a-zA-Z0-9]");

text = rgx.Replace(text, string.Empty);

回答by Justin Caldicott

And as a replace operation as an extension method:

并作为替换操作作为扩展方法:

public static class StringExtensions
{
    public static string ReplaceNonAlphanumeric(this string text, char replaceChar)
    {
        StringBuilder result = new StringBuilder(text.Length);

        foreach(char c in text)
        {
            if(c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                result.Append(c);
            else
                result.Append(replaceChar);
        }

        return result.ToString();
    } 
}

And test:

并测试:

[TestFixture]
public sealed class StringExtensionsTests
{
    [Test]
    public void Test()
    {
        Assert.AreEqual("text_LaLa__lol________123______", "text LaLa (lol) á ? $ 123 ?????".ReplaceNonAlphanumeric('_'));
    }
}

回答by Dmitry Bychenko

You can use Linqto filter out required characters:

您可以使用Linq过滤掉所需的字符:

  String source = "Hello there(hello#)";

  // "Hellotherehello"
  String result = new String(source
    .Where(ch => Char.IsLetterOrDigit(ch))
    .ToArray());

Or

或者

  String result = String.Concat(source
    .Where(ch => Char.IsLetterOrDigit(ch)));  

And so you have no need in regular expressions.

所以你不需要正则表达式