计算C#中的单词数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8784517/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Counting number of words in C#
提问by Wern Ancheta
I'm trying to count the number of words from a rich textbox in C# the code that I have below only works if it is a single line. How do I do this without relying on regex or any other special functions.
我正在尝试计算 C# 中富文本框中的单词数,我下面的代码仅在单行时才有效。如何在不依赖正则表达式或任何其他特殊功能的情况下执行此操作。
string whole_text = richTextBox1.Text;
string trimmed_text = whole_text.Trim();
string[] split_text = trimmed_text.Split(' ');
int space_count = 0;
string new_text = "";
foreach(string av in split_text)
{
if (av == "")
{
space_count++;
}
else
{
new_text = new_text + av + ",";
}
}
new_text = new_text.TrimEnd(',');
split_text = new_text.Split(',');
MessageBox.Show(split_text.Length.ToString ());
采纳答案by Groo
Since you are only interested in word count, and you don't care about individual words, String.Splitcould be avoided. String.Splitis handy, but it unnecessarily generates a (potentially) large number of Stringobjects, which in turn creates an unnecessary burden on the garbage collector. For each word in your text, a new Stringobject needs to be instantiated, and then soon collected since you are not using it.
由于您只对字数感兴趣,而不关心单个单词,String.Split因此可以避免。String.Split很方便,但它不必要地生成(可能)大量String对象,这反过来又给垃圾收集器造成了不必要的负担。对于文本中的每个单词,都String需要实例化一个新对象,然后很快将其收集起来,因为您没有使用它。
For a homework assignment, this may not matter, but if your text box contents change often and you do this calculation inside an event handler, it may be wiser to simply iterate through characters manually. If you really want to use String.Split, then go for a simpler version like Yonixrecommended.
对于家庭作业,这可能无关紧要,但如果您的文本框内容经常更改并且您在事件处理程序中执行此计算,那么简单地手动遍历字符可能更明智。如果您真的想使用String.Split,请选择像Yonix推荐的更简单的版本。
Otherwise, use an algorithm similar to this:
否则,请使用类似于此的算法:
int wordCount = 0, index = 0;
// skip whitespace until first word
while (index < text.Length && char.IsWhiteSpace(text[index]))
index++;
while (index < text.Length)
{
// check if current char is part of a word
while (index < text.Length && !char.IsWhiteSpace(text[index]))
index++;
wordCount++;
// skip whitespace until next word
while (index < text.Length && char.IsWhiteSpace(text[index]))
index++;
}
This code should work better with cases where you have multiple spaces between each word, you can test the code online.
回答by LewisBenge
Your approach is on the right path. I would do something like, passing the text property of richTextBox1 into the method. This however won't be accurate if your rich textbox is formatting HTML, so you'll need to strip out any HTML tags prior to running the word count:
你的方法是在正确的道路上。我会做一些类似的事情,将 richTextBox1 的 text 属性传递到方法中。但是,如果您的富文本框正在格式化 HTML,这将不准确,因此您需要在运行字数统计之前去除所有 HTML 标签:
public static int CountWords(string s)
{
int c = 0;
for (int i = 1; i < s.Length; i++)
{
if (char.IsWhiteSpace(s[i - 1]) == true)
{
if (char.IsLetterOrDigit(s[i]) == true ||
char.IsPunctuation(s[i]))
{
c++;
}
}
}
if (s.Length > 2)
{
c++;
}
return c;
}
回答by Matt Sieker
Have a look at the Linesproperty mentioned in @Jay Riggs comment, along with this overload of String.Splitto make the code much simpler. Then the simplest approach would be to loop over each line in the Linesproperty, call String.Spliton it, and add the length of the array it returns to a running count.
看看Lines@Jay Riggs 评论中提到的属性,以及String.Split 的重载,使代码更简单。然后最简单的方法是循环遍历Lines属性中的每一行,调用String.Split它,并将它返回的数组的长度添加到运行计数中。
EDIT: Also, is there any reason you're using a RichTextBox instead of a TextBox with Multilineset to True?
编辑:另外,你有什么理由使用 RichTextBox 而不是Multiline设置为的 TextBoxTrue吗?
回答by Jason Down
There are some better ways to do this, but in keeping with what you've got, try the following:
有一些更好的方法可以做到这一点,但根据您的情况,请尝试以下操作:
string whole_text = richTextBox1.Text;
string trimmed_text = whole_text.Trim();
// new line split here
string[] lines = trimmed_text.Split(Environment.NewLine.ToCharArray());
// don't need this here now...
//string[] split_text = trimmed_text.Split(' ');
int space_count = 0;
string new_text = "";
Now make two foreach loops. One for each line and one for counting words within the lines.
现在做两个 foreach 循环。每行一个,一个用于计算行内的单词。
foreach (string line in lines)
{
// Modify the inner foreach to do the split on ' ' here
// instead of split_text
foreach (string av in line.Split(' '))
{
if (av == "")
{
space_count++;
}
else
{
new_text = new_text + av + ",";
}
}
}
new_text = new_text.TrimEnd(',');
// use lines here instead of split_text
lines = new_text.Split(',');
MessageBox.Show(lines.Length.ToString());
}
回答by Bedasso
char[] delimiters = new char[] {' ', '\r', '\n' };
whole_text.Split(delimiters,StringSplitOptions.RemoveEmptyEntries).Length;
回答by Yoshi
This was a phone screening interview question that I just took (by a large company located in CA who sells all kinds of devices that starts with a letter "i"), and I think I franked... after I got offline, I wrote this. I wish I were able to do it during interview..
这是我刚接的电话筛选面试问题(由一家位于加州的大公司提供,该公司销售以字母“i”开头的各种设备),我想我坦白了......下线后,我写了这个。我希望我能在面试时做到这一点。
static void Main(string[] args)
{
Debug.Assert(CountWords("Hello world") == 2);
Debug.Assert(CountWords(" Hello world") == 2);
Debug.Assert(CountWords("Hello world ") == 2);
Debug.Assert(CountWords("Hello world") == 2);
}
public static int CountWords(string test)
{
int count = 0;
bool wasInWord = false;
bool inWord = false;
for (int i = 0; i < test.Length; i++)
{
if (inWord)
{
wasInWord = true;
}
if (Char.IsWhiteSpace(test[i]))
{
if (wasInWord)
{
count++;
wasInWord = false;
}
inWord = false;
}
else
{
inWord = true;
}
}
// Check to see if we got out with seeing a word
if (wasInWord)
{
count++;
}
return count;
}
回答by MarkOwen320
We used an adapted form of Yoshi's answer, where we fixed the bug where it would not count the last word in a string if there was no white-space after it:
我们使用了 Yoshi 答案的一种改编形式,我们修复了一个错误,即如果后面没有空格,它不会计算字符串中的最后一个单词:
public static int CountWords(string test)
{
int count = 0;
bool inWord = false;
foreach (char t in test)
{
if (char.IsWhiteSpace(t))
{
inWord = false;
}
else
{
if (!inWord) count++;
inWord = true;
}
}
return count;
}
回答by Anand Sanklipur
public static int WordCount(string str)
{
int num=0;
bool wasInaWord=true;;
if (string.IsNullOrEmpty(str))
{
return num;
}
for (int i=0;i< str.Length;i++)
{
if (i!=0)
{
if (str[i]==' ' && str[i-1]!=' ')
{
num++;
wasInaWord=false;
}
}
if (str[i]!=' ')
{
wasInaWord=true;
}
}
if (wasInaWord)
{
num++;
}
return num;
}
回答by Akbar Khan
You can also do it in this way!! Add this method to your extension methods.
你也可以这样做!!将此方法添加到您的扩展方法中。
public static int WordsCount(this string str)
{
return Regex.Matches(str, @"((\w+(\s?)))").Count;
}
And call it like this.
并这样称呼它。
string someString = "Let me show how I do it!";
int wc = someString.WordsCount();
回答by N Jay
This should work
这应该工作
input.Split(' ').ToList().Count;

