C# 如何从字符串中删除标点符号?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/421616/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I strip punctuation from a string?
提问by Tom Ritter
For the hope-to-have-an-answer-in-30-seconds part of this question, I'm specifically looking for C#
对于这个问题的希望在 30 秒内回答的部分,我特别在寻找 C#
But in the general case, what's the best way to strip punctuation in any language?
但在一般情况下,在任何语言中去除标点符号的最佳方法是什么?
I should add:Ideally, the solutions won't require you to enumerate all the possible punctuation marks.
我应该补充一点:理想情况下,解决方案不需要您枚举所有可能的标点符号。
Related: Strip Punctuation in Python
回答by TheTXI
The most braindead simple way of doing it would be using string.replace
最简单的方法是使用 string.replace
The other way I would imagine is a regex.replace and have your regular expression with all the appropriate punctuation marks in it.
我想象的另一种方式是 regex.replace 并在其中包含所有适当的标点符号的正则表达式。
回答by Joachim Sauer
Assuming "best" means "simplest" I suggest using something like this:
假设“最好”意味着“最简单”,我建议使用这样的东西:
String stripped = input.replaceAll("\p{Punct}+", "");
This example is for Java,but all sufficiently modern Regex engines should support this (or something similar).
这个例子是针对Java 的,但是所有足够现代的 Regex 引擎都应该支持这个(或类似的东西)。
Edit: the Unicode-Aware version would be this:
编辑:Unicode-Aware 版本是这样的:
String stripped = input.replaceAll("\p{P}+", "");
The first version only looks at punctuation characters contained in ASCII.
第一个版本只查看包含在 ASCII 中的标点符号。
回答by GWLlosa
new string(myCharCollection.Where(c => !char.IsPunctuation(c)).ToArray());
回答by Anton
You can use the regex.replace method:
您可以使用 regex.replace 方法:
replace(YourString, RegularExpressionWithPunctuationMarks, Empty String)
Since this returns a string, your method will look something like this:
由于这将返回一个字符串,因此您的方法将如下所示:
string s = Regex.Replace("Hello!?!?!?!", "[?!]", "");
You can replace "[?!]" with something more sophiticated if you want:
如果你愿意,你可以用更复杂的东西替换“[?!]”:
(\p{P})
This should find any punctuation.
这应该找到任何标点符号。
回答by Tom Ritter
Based off GWLlosa's idea, I was able to come up with the supremely ugly, but working:
基于 GWLlosa 的想法,我能够想出极其丑陋但有效的方法:
string s = "cat!"; s = s.ToCharArray().ToList<char>() .Where<char>(x => !char.IsPunctuation(x)) .Aggregate<char, string>(string.Empty, new Func<string, char, string>( delegate(string s, char c) { return s + c; }));
string s = "cat!"; s = s.ToCharArray().ToList<char>() .Where<char>(x => !char.IsPunctuation(x)) .Aggregate<char, string>(string.Empty, new Func<string, char, string>( delegate(string s, char c) { return s + c; }));
回答by JoshBerke
Here's a slightly different approach using linq. I like AviewAnew's but this avoids the Aggregate
这是使用 linq 的一种略有不同的方法。我喜欢 AviewAnew 的,但这避免了聚合
string myStr = "Hello there..';,]';';., Get rid of Punction";
var s = from ch in myStr
where !Char.IsPunctuation(ch)
select ch;
var bytes = UnicodeEncoding.ASCII.GetBytes(s.ToArray());
var stringResult = UnicodeEncoding.ASCII.GetString(bytes);
回答by Hades32
Why not simply:
为什么不简单:
string s = "sxrdct?fvzguh,bij."; var sb = new StringBuilder(); foreach (char c in s) { if (!char.IsPunctuation(c)) sb.Append(c); } s = sb.ToString();
The usage of RegEx is normally slower than simple char operations. And those LINQ operations look like overkill to me. And you can't use such code in .NET 2.0...
RegEx 的使用通常比简单的 char 操作慢。那些 LINQ 操作对我来说看起来有点矫枉过正。你不能在 .NET 2.0 中使用这样的代码......
回答by Hades32
#include<string>
#include<cctype>
using namespace std;
int main(int a, char* b[]){
string strOne = "H,e.l/l!o W#o@r^l&d!!!";
int punct_count = 0;
cout<<"before : "<<strOne<<endl;
for(string::size_type ix = 0 ;ix < strOne.size();++ix)
{
if(ispunct(strOne[ix]))
{
++punct_count;
strOne.erase(ix,1);
ix--;
}//if
}
cout<<"after : "<<strOne<<endl;
return 0;
}//main
回答by Brian Low
Describes intent, easiest to read (IMHO) and best performing:
描述意图,最容易阅读(恕我直言)和最佳表现:
s = s.StripPunctuation();
to implement:
实施:
public static class StringExtension
{
public static string StripPunctuation(this string s)
{
var sb = new StringBuilder();
foreach (char c in s)
{
if (!char.IsPunctuation(c))
sb.Append(c);
}
return sb.ToString();
}
}
This is using Hades32's algorithm which was the best performing of the bunch posted.
这是使用 Hades32 的算法,这是所发布的一堆中表现最好的。
回答by Ash Youssef
$newstr=ereg_replace("[[:punct:]]",'',$oldstr);