C# 如何从字符串中去除非字母数字字符（包括空格）？

Question

提问by James

How do I strip non alphanumeric characters from a string and loose spaces in C# with Replace?

如何使用替换从 C# 中的字符串和松散空格中去除非字母数字字符？

I want to keep a-z, A-Z, 0-9 and nothing more (not even " " spaces).

我想保留 az, AZ, 0-9 仅此而已（甚至不是“”空格）。

"Hello there(hello#)".Replace(regex-i-want, "");

should give

应该给

"Hellotherehello"

I have tried "Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", "");but the spaces remain.

我试过了，"Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", "");但空间仍然存在。

Answer 1

采纳答案by Tim Pietzcker

In your regex, you have excluded the spaces from being matched (and you haven't used Regex.Replace()which I had overlooked completely...):

在您的正则表达式中，您已经排除了匹配的空格（并且您还没有使用Regex.Replace()我完全忽略的空间......）：

result = Regex.Replace("Hello there(hello#)", @"[^A-Za-z0-9]+", "");

should work. The +makes the regex a bit more efficient by matching more than one consecutive non-alphanumeric character at once instead of one by one.

应该管用。+通过一次匹配多个连续的非字母数字字符而不是一个一个匹配，这使正则表达式更有效一点。

If you want to keep non-ASCII letters/digits, too, use the following regex:

如果您也想保留非 ASCII 字母/数字，请使用以下正则表达式：

@"[^\p{L}\p{N}]+"

which leaves

哪个离开

BonjourmesélèvesGutenMorgenliebeSchüler

instead of

代替

BonjourmeslvesGutenMorgenliebeSchler

Answer 2

回答by Veronica

In .Net 4.0 you can use the IsNullOrWhitespace method of the String class to remove the so called white space characters. Please take a look here http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspxHowever as @CodeInChaos pointed there are plenty of characters which could be considered as letters and numbers. You can use a regular expression if you only want to find A-Za-z0-9.

在 .Net 4.0 中，您可以使用 String 类的 IsNullOrWhitespace 方法删除所谓的空白字符。请在这里查看http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspx但是正如@CodeInChaos 指出的那样，有很多字符可以被视为字母和数字。如果只想找到 A-Za-z0-9，可以使用正则表达式。

Answer 3

回答by Adrianne

Or you can do this too:

或者你也可以这样做：

    public static string RemoveNonAlphanumeric(string text)
    {
        StringBuilder sb = new StringBuilder(text.Length);

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                sb.Append(text[i]);
        }

        return sb.ToString();
    }

Usage:

用法：

string text = SomeClass.RemoveNonAlphanumeric("text LaLa (lol) á ? $ 123 ?????");

//text: textLaLalol123

Answer 4

回答by James

The mistake made above was using Replace incorrectly (it doesn't take regex, thanks CodeInChaos).

上面犯的错误是不正确地使用 Replace（它不需要正则表达式，感谢 CodeInChaos）。

The following code should do what was specified:

以下代码应执行指定的操作：

Regex reg = new Regex(@"[^\p{L}\p{N}]+");//Thanks to Tim Pietzcker for regex
string regexed = reg.Replace("Hello there(hello#)", "");

This gives:

这给出：

regexed = "Hellotherehello"

Answer 5

回答by K D

Use following regex to strip those all characters from the string using Regex.Replace

使用以下正则表达式从使用 Regex.Replace 的字符串中删除所有字符

([^A-Za-z0-9\s])

Answer 6

回答by Michel Bechelani

var text = "Hello there(hello#)";

var rgx = new Regex("[^a-zA-Z0-9]");

text = rgx.Replace(text, string.Empty);

Answer 7

回答by Justin Caldicott

And as a replace operation as an extension method:

并作为替换操作作为扩展方法：

public static class StringExtensions
{
    public static string ReplaceNonAlphanumeric(this string text, char replaceChar)
    {
        StringBuilder result = new StringBuilder(text.Length);

        foreach(char c in text)
        {
            if(c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                result.Append(c);
            else
                result.Append(replaceChar);
        }

        return result.ToString();
    } 
}

And test:

并测试：

[TestFixture]
public sealed class StringExtensionsTests
{
    [Test]
    public void Test()
    {
        Assert.AreEqual("text_LaLa__lol________123______", "text LaLa (lol) á ? $ 123 ?????".ReplaceNonAlphanumeric('_'));
    }
}

Answer 8

回答by Dmitry Bychenko

You can use Linqto filter out required characters:

您可以使用Linq过滤掉所需的字符：

  String source = "Hello there(hello#)";

  // "Hellotherehello"
  String result = new String(source
    .Where(ch => Char.IsLetterOrDigit(ch))
    .ToArray());

Or

或者

  String result = String.Concat(source
    .Where(ch => Char.IsLetterOrDigit(ch)));

And so you have no need in regular expressions.

所以你不需要正则表达式。

C# 如何从字符串中去除非字母数字字符（包括空格）？

提问by James

采纳答案by Tim Pietzcker

回答by Veronica

回答by Adrianne

回答by James

回答by K D

回答by Michel Bechelani

回答by Justin Caldicott

回答by Dmitry Bychenko

相关推荐

最近更新

标签

C# 如何从字符串中去除非字母数字字符（包括空格）？

提问by James

采纳答案by Tim Pietzcker

回答by Veronica

回答by Adrianne

回答by James

回答by K D

回答by Michel Bechelani

回答by Justin Caldicott

回答by Dmitry Bychenko

相关推荐

C# 如何在 XML 或 XElement 变量中获取特定元素计数

C# 在 wpf 中实时更新进度条

使用 Regex.Match() 的 C# 正则表达式验证规则

C# System.Net.WebException：底层连接已关闭：发送时发生意外错误

相关推荐

最近更新

标签