C# 如何从字符串中去除非字母数字字符(包括空格)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8779189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I strip non-alphanumeric characters (including spaces) from a string?
提问by James
How do I strip non alphanumeric characters from a string and loose spaces in C# with Replace?
如何使用替换从 C# 中的字符串和松散空格中去除非字母数字字符?
I want to keep a-z, A-Z, 0-9 and nothing more (not even " " spaces).
我想保留 az, AZ, 0-9 仅此而已(甚至不是“”空格)。
"Hello there(hello#)".Replace(regex-i-want, "");
should give
应该给
"Hellotherehello"
I have tried "Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", "");but the spaces remain.
我试过了,"Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", "");但空间仍然存在。
采纳答案by Tim Pietzcker
In your regex, you have excluded the spaces from being matched (and you haven't used Regex.Replace()which I had overlooked completely...):
在您的正则表达式中,您已经排除了匹配的空格(并且您还没有使用Regex.Replace()我完全忽略的空间......):
result = Regex.Replace("Hello there(hello#)", @"[^A-Za-z0-9]+", "");
should work. The +makes the regex a bit more efficient by matching more than one consecutive non-alphanumeric character at once instead of one by one.
应该管用。+通过一次匹配多个连续的非字母数字字符而不是一个一个匹配,这使正则表达式更有效一点。
If you want to keep non-ASCII letters/digits, too, use the following regex:
如果您也想保留非 ASCII 字母/数字,请使用以下正则表达式:
@"[^\p{L}\p{N}]+"
which leaves
哪个离开
BonjourmesélèvesGutenMorgenliebeSchüler
instead of
代替
BonjourmeslvesGutenMorgenliebeSchler
回答by Veronica
In .Net 4.0 you can use the IsNullOrWhitespace method of the String class to remove the so called white space characters. Please take a look here http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspxHowever as @CodeInChaos pointed there are plenty of characters which could be considered as letters and numbers. You can use a regular expression if you only want to find A-Za-z0-9.
在 .Net 4.0 中,您可以使用 String 类的 IsNullOrWhitespace 方法删除所谓的空白字符。请在这里查看http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspx但是正如@CodeInChaos 指出的那样,有很多字符可以被视为字母和数字。如果只想找到 A-Za-z0-9,可以使用正则表达式。
回答by Adrianne
Or you can do this too:
或者你也可以这样做:
public static string RemoveNonAlphanumeric(string text)
{
StringBuilder sb = new StringBuilder(text.Length);
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
sb.Append(text[i]);
}
return sb.ToString();
}
Usage:
用法:
string text = SomeClass.RemoveNonAlphanumeric("text LaLa (lol) á ? $ 123 ?????");
//text: textLaLalol123
回答by James
The mistake made above was using Replace incorrectly (it doesn't take regex, thanks CodeInChaos).
上面犯的错误是不正确地使用 Replace(它不需要正则表达式,感谢 CodeInChaos)。
The following code should do what was specified:
以下代码应执行指定的操作:
Regex reg = new Regex(@"[^\p{L}\p{N}]+");//Thanks to Tim Pietzcker for regex
string regexed = reg.Replace("Hello there(hello#)", "");
This gives:
这给出:
regexed = "Hellotherehello"
回答by K D
Use following regex to strip those all characters from the string using Regex.Replace
使用以下正则表达式从使用 Regex.Replace 的字符串中删除所有字符
([^A-Za-z0-9\s])
回答by Michel Bechelani
var text = "Hello there(hello#)";
var rgx = new Regex("[^a-zA-Z0-9]");
text = rgx.Replace(text, string.Empty);
回答by Justin Caldicott
And as a replace operation as an extension method:
并作为替换操作作为扩展方法:
public static class StringExtensions
{
public static string ReplaceNonAlphanumeric(this string text, char replaceChar)
{
StringBuilder result = new StringBuilder(text.Length);
foreach(char c in text)
{
if(c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
result.Append(c);
else
result.Append(replaceChar);
}
return result.ToString();
}
}
And test:
并测试:
[TestFixture]
public sealed class StringExtensionsTests
{
[Test]
public void Test()
{
Assert.AreEqual("text_LaLa__lol________123______", "text LaLa (lol) á ? $ 123 ?????".ReplaceNonAlphanumeric('_'));
}
}
回答by Dmitry Bychenko
You can use Linqto filter out required characters:
您可以使用Linq过滤掉所需的字符:
String source = "Hello there(hello#)";
// "Hellotherehello"
String result = new String(source
.Where(ch => Char.IsLetterOrDigit(ch))
.ToArray());
Or
或者
String result = String.Concat(source
.Where(ch => Char.IsLetterOrDigit(ch)));
And so you have no need in regular expressions.
所以你不需要正则表达式。

